Electronics, Embedded Systems, and Software are my breakfast, lunch, and dinner.
There is an odd famine of information about this particular subject available in text form. If you google "usbasp tpi" and things like it, you'll find posts on the avr forums filled with incorrect information and schematics. This post is an attempt to remedy this. The information in this post should enable to you program cheap AVR microcontrollers such as the ATTiny4, ATTiny5, ATTiny9, ATTiny10, or ATTiny20 using a widely available usbasp AVR programmer.
I intend to show exactly the following, no more and no less:
Wow it has been a while. Between school, work, and another project that I've been working on since last October (which, if ultimately successful, I will post here) I haven't had a lot of time to write about anything cool.
I wanted to share today something cool I wrote for my AVRs. Many of my recent AVR projects have become rather complex in that they usually are split into multiple parts in the software which interact with each other. One project in particular had the following components:
So, why did I use state machines rather than just implementing this whole thing in a giant loop with some interrupts mixed in? The answer is twofold:
None of this, however, is particularly new. Everyone does state machines and they are comparatively easy to implement. In almost every project I have done there has been some form of state machine, whether it was just busy waiting for some flag somewhere and doing something after that flag was set, or doing something much more complex. What I wanted to show today was a different way of dealing with the issues of spaghetti and speed: Building a preempting task scheduler. These are considered a key and central component of Real-Time Operating Systems , so what you see in this article is the beginnings of a real-time kernel for an AVR microcontroller.
I should mention here that there is a certain level of ability assumed with AVRs in this post. I assume that the reader has a good knowledge of how programs work with the stack, the purpose and functioning of the general purpose registers, how function calls actually happen on the microcontroller, how interrupts work, the ability to read AVR assembly (don't worry most of this is in C...but there is some critical code written in assembly), and a general knowledge of the avr-gcc toolchain.
The definition of preemption in computing is being able to interrupt a task, without its cooperation, with the intention of executing it later, in order to run some other task.
Firstly we need to define a task: A task is a unit of execution; something that the program structure defines as a self-contained part of the program which accomplishes some purpose . In our case, a task is basically going to just be a method which never returns. The next thing to define is a scheduler . A scheduler is a software component which can decide from a list of tasks which task needs to run on the processor. The scheduler uses a dispatcher to actually change the code that is executing from the current task to another one. This is called saving and restoring the context of the task.
The cool thing about preemptive scheduling is that any task can be interrupted at any time and another task can start executing. Now, it could be asked, "Well, isn't that what interrupts do? What's the difference?". The difference is that an interrupt in a system using a preemptive scheduler can actually resume in a different place from when it started. Without a scheduler, when an interrupt ends the processor will send the code right back to where it was executing when the interrupt occurred. In contrast, the scheduler actually allows the program to have a little more control and move from one task to another in response to an interrupt or some other stimuli (yes, it doesn't even need to be an interrupt...it could be another task!).
In the state machine example above, I used state machines as a way to break up computations so that the other machines could run at predetermined points. I noticed that in most of these cases, this point was when waiting for some user input or an interrupt. Although I used interrupts extensively in the application, there were a lot of flags to be polled and this happened inside state machine tick functions. When an interrupt occurred and set a flag, it would need to wait for the "main" code to get around to executing the tick function for the particular state machine that listens to the flag before anything could happen. This introduces a lot of latency and jitter (differences in the amount of time it takes the system to respond to an interrupt from one moment to the next). Not good.
Using tasks removes a lot of these latency problems since the interrupt can halt the current task (block ) and begin executing another which was waiting on the interrupt to occur (unblock or resume ). Once the higher priority task is blocked again (through a call to some function asking it to wait for some event), the scheduler will change back to the original task and things go on as usual. Using tasks also has the effect of making the code easier to read. While state machines are easy to write, they are not always easy to follow. A function is invoked over and over again and that requires more thought than simply reading a linear function. Tasks can be very linear since the state machine is embodied in calls which could possibly block the task.
A positive side effect of doing things this way (with a scheduler) is that we can now implement the familiar things such as semaphores and queues to communicate between our tasks in a fine grained manner. At their core these are simply methods that can manipulate the list of tasks and call the scheduler to decide which task to execute next.
Summary: Using a preemptive scheduler can allow for lower latency and jitter between an interrupt occurring and some non-ISR code responding to it when compared to using several state machines. This means it will respond more consistently to interrupts occurring, though not necessarily in a more timely fashion. It also allows for fine-grained priority control with these responses.
Before continuing, I would like to point out some pros and cons that I see of writing a task scheduler lest we fall into the "golden hammer" antipattern. There are certainly more, but here is my list (feel free to comment with comments on this).
So...lots of those are contradictory. What is a pro can also be a con. Anyway, I'm just presenting this as something cool to do, not as the end all be all of ways to structure an embedded program. If you have a microcontroller that is performing a lot of tasks that need to be able to react reliably to an interrupt, this might be the way to go for you. However, if your microcontroller is just toggling some gpios and reacting to some timers, this might be overkill. It all depends on the application.
Mmmkay here's the fun part. At this point you may be asking, "How in the world can we make something that can interrupt during one function and resume into another?" I recently completed a course on Real-Time Operating Systems (RTOS) at my university which opened my eyes into how this can be done (we wrote one for the 8086...so awesome!), so I promptly wrote one for the AVR. For those who come by here who have taken the same course at BYU, they will notice some distinct similarities since I went with what I knew. I've named it KOS, for "Kevin's Operating System", but this was just so I had an easy prefix for my types and function names. If you're going to implement your own based on this article, don't worry about naming it like mine (though a mention of this article somewhere would be cool).
Disclaimer: I have only started to scratch the surface of this stuff myself and I may have made some errors. I appreciate any insight anyone can give me into either suggestions for this or problems with my implementation. Just leave it in the comments :)
All of the code can be found here: `https://github.com/kcuzner/kos-avr <https://github.com/kcuzner/kos-avr>`__
The focus of a scheduler/dispatcher system for tasks is manipulating the stack pointer and the stack itself. "Traditionally," programs written for microcontrollers have a single stack which grows from the bottom of memory up and all code is executed on that stack. The concept here is that we still start out with that stack, but we actually execute the tasks on their own separate stacks. When we want to switch to a task, we point the AVR's stack pointer to the desired task's stack and start executing (its the "start executing" part where things get fun).
First, let's take a look at the structure which represents a task:
1typedef enum { TASK_READY, TASK_SEMAPHORE, TASK_QUEUE } KOS_TaskStatus;
2
3typedef struct KOS_Task {
4 void *sp;
5 KOS_TaskStatus status;
6 struct KOS_Task *next;
7 void *status_pointer;
8} KOS_Task;
The very first item in this struct is the pointer to the stack pointer (*sp). It is a void* because we don't normally access anything on it...we just make the SP register point to it when we want to execute the task.
The next item in the struct is a status enum. This is used by my primitive scheduler to determine if a task is "READY" to execute. If a task is ready to execute, then it is not waiting on anything (i.e. blocked) and it can be resumed at any time. In the case where the task is waiting on something like a semaphore, this status would be changed to SEMAPHORE. The semaphore posting code would then change the status back to READY once somebody posted to the semaphore. This is called "unblocking".
After the status comes the *next pointer. The tasks are arranged in a linked list because they have a priority attached to them. This priority determines which tasks get executed first. At the top of the linked list is the highest priority task and at the end of the list is the lowest priority task.
Finally, we have the *status_pointer. This is used by our functions which can unblock tasks to determine why tasks are blocked in the first place. We will see more about this when we make a primitive semaphore.
Ok, so for the basic task scheduling and dispatching functionality we are going to implement some functions (these are declared in a header):
1typedef void (*KOS_TaskFn)(void);
2
3extern KOS_Task *kos_current_task;
4
5/**
6 * Initializes the KOS kernel
7 */
8void kos_init(void);
9
10/**
11 * Creates a new task
12 * Note: Not safe
13 */
14void kos_new_task(KOS_TaskFn task, void *sp);
15
16/**
17 * Puts KOS in ISR mode
18 * Note: Not safe, assumes non-nested isrs
19 */
20void kos_isr_enter(void);
21
22/**
23 * Leaves ISR mode, possibly executing the dispatcher
24 * Note: Not safe, assumes non-nested isrs
25 */
26void kos_isr_exit(void);
27
28/**
29 * Runs the kernel
30 */
31void kos_run(void);
32
33/**
34 * Runs the scheduler
35 */
36void kos_schedule(void);
37
38/**
39 * Dispatches the passed task, saving the context of the current task
40 */
41void kos_dispatch(KOS_Task *next);
As for source files, we will only have a single C file for the implementation, but there will be some inline assembly because we are going to have to fiddle with registers. Yay! I'll just go through the functions one by one and afterwards I'll go through my design decisions and how they affect things. This is not the only, nor the best, way to do this.
Firstly, we have the kos_init and kos_new_task functions, which come with some baggage:
1static KOS_Task tasks[KOS_MAX_TASKS + 1];
2static uint8_t next_task = 0;
3static KOS_Task *task_head;
4KOS_Task *kos_current_task;
5
6static uint8_t kos_idle_task_stack[KOS_IDLE_TASK_STACK];
7static void kos_idle_task(void)
8{
9 while (1) { }
10}
11
12void kos_init(void)
13{
14 kos_new_task(&kos_idle_task, &kos_idle_task_stack[KOS_IDLE_TASK_STACK - 1]);
15}
16
17void kos_new_task(KOS_TaskFn task, void *sp)
18{
19 int8_t i;
20 uint8_t *stack = sp;
21 KOS_Task *tcb;
22
23 //make space for pc, sreg, and 32 registers
24 stack[0] = (uint16_t)task & 0xFF;
25 stack[-1] = (uint16_t)task >> 8;
26 for (i = -2; i > -34; i--)
27 {
28 stack[i] = 0;
29 }
30 stack[-34] = 0x80; //sreg, interrupts enabled
31
32 //create the task structure
33 tcb = &tasks[next_task++];
34 tcb->sp = stack - 35;
35 tcb->status = TASK_READY;
36
37 //insert into the task list as the new highest priority task
38 if (task_head)
39 {
40 tcb->next = task_head;
41 task_head = tcb;
42 }
43 else
44 {
45 task_head = tcb;
46 }
47}
Here we have two concepts that are embodied. The first is the context . The context the data pushed onto the stack that the dispatcher is going to use in order to restore the task before executing it. This is similar (identical even) to the procedure used with interrupt service routines, except that we store every single one of the 32 registers instead of just the ones that we use. The next concept is that of the idle task . As an optimization, there is a task which has the lowest priority and is never blocked. It is always ready to execute, so when all other tasks are blocked, it will run. This means that we don't have to deal with the case in the scheduler when there is no tasks to execute since there will always be a task.
The kos_init function performs only one operation: Add the idle task to the list of tasks to execute. Notice that there was some space allocated for the stack of the idle task. This stack must be at least as large as the entire context (35 bytes here) plus enough for any interrupts which may occur during program execute. I chose 48 bytes, but it could be as large as you want. Also take note of the pointer that we pass for the stack into kos_new_task: It is a pointer to the end of our array. This is because stacks grow "up" in memory, meaning a push decrements the address and a pop increments it. If we passed the beginning of the array, the first push would make us point before the memory allocated to the stack since arrays are allocated "downwards" in memory.
The kos_new_task function is a little more complex. It performs two operations: setting up the initial context for the function and adding the Task structure to the linked list of tasks. The context needs to be set up initially because from the scheduler's perspective, the new task is simply an unblocked task that was blocked before. Therefore, it expects that some context is stored on that task's stack. Our context is ordered such that the PC (program counter) is first, the 32 registers are next, and the status register is last. Since the stack is last-in first-out, the SREG is popped first, then the 32 registers, and then the PC. We can see at the beginning of the function that we take the function pointer (they are usually 16 bits on most AVRs...the ones with lots of flash do it differently, so consult your datasheets) and set it up to be the program counter. It is arranged LSB-first, so the LSByte is "pushed" before the MSByte. The order here is very important and the reason why will become very apparent when we see the code for the dispatcher. After that, we put 32 0's onto the stack. These are the initial values for the registers and 0 seemed like a sensible value. The very last byte "pushed" is the status register. We set it to 0x80 so that the interrupt flag is set. This is a design decision to prevent problems with forgetting to enable interrupts for every task and having one task where we forgot to enable it prevent all interrupts from executing. Finally, the top of the stack (note the subtraction of 35 bytes from the stack pointer) is stored on the Task struct along with the initial task state. We add it to the task list as the head of the list, so the last task added is the task with the highest priority.
Next we have the kos_run function:
1void kos_run(void)
2{
3 kos_schedule();
4}
Well that's simple: it just calls the scheduler. So, let's look at kos_schedule:
1void kos_schedule(void)
2{
3 if (kos_isr_level)
4 return;
5
6 KOS_Task *task = task_head;
7 while (task->status != TASK_READY)
8 task = task->next;
9
10 if (task != kos_current_task)
11 {
12 ATOMIC_BLOCK(ATOMIC_RESTORESTATE)
13 {
14 kos_dispatch(task);
15 }
16 }
17}
The very first thing to notice is the kos_isr_level reference. This solves a very specific problem that occurs with ISRs which I talk about in the next section. Other than that bit, however, this is also simple. Because our tasks in the linked list are ordered by priority, we can simply start at the top and move along the linked list until we locate the first task that is ready (unblocked). Once that task is found, we will call the dispatcher if the task we found is not the currently executing task.
The purpose of the ATOMIC_BLOCK is to ensure that interrupts are disabled when the dispatcher runs. Since the stack is going to be manipulated, the entire dispatcher is considered to be a critical section of code and must be run atomically. The ATOMIC_BLOCK will restore the interrupt status after kos_dispatch returns (which is after the task has been resumed).
We are faced with a very particular problem when we want to call our scheduler inside of an interrupt. Let's imagine a scenario where we have two tasks, Task A and Task B (Task A has higher priority than Task B), in addition to the idle task. Task A uses waits on two semaphores (semaphores 1 and 2) that is signaled by an ISR. When task A is running, it signals another semaphore that Task B waits on (semaphore 3). Here is what happens:
As straightforward as that may seem, that isn't the intended behavior. Imagine if a task with an even higher priority than A had the ISR occur while it was executing. The sequence above would be totally different because Task A wouldn't be dispatched after the 1st semaphore being posted (item #4). Let's see what happens:
Because of the inconsistency and the fact that the ISR "priority" when viewed by the scheduler is determined by possibly random ISRs (making it non-deterministic), we need fix this. The solution I went with was to make two methods: kos_enter_isr and kos_exit_isr. These should be called when an ISR begins and when an ISR ends to temporarily hold off calling the scheduler until the very end of the ISR. This has the effect of giving an ISR an apparently high priority since it will not switch to another task until it has completely finished. So, although the idle task may be running when the ISR occurs, while the ISR is running no context switches will occur until the very end. Here is some code:
1static uint8_t kos_isr_level = 0;
2void kos_isr_enter(void)
3{
4 kos_isr_level++;
5}
6
7void kos_isr_exit(void)
8{
9 kos_isr_level--;
10 kos_schedule();
11}
As seen in kos_schedule, we use the kos_isr_level variable to indicate to the scheduler whether we are in an ISR or not. When kos_isr_level finally returns to 0, the scheduler will actually perform scheduling when it is called at the end of kos_isr_exit. The second set of events described earlier will now happen every time, even if the idle task is interrupted.
These functions must be run with interrupts disabled since they don't use any sort of locking, but they should support nested interrupts so long as they are called at the point in the interrupt when interrupts have been disabled.
The dispatcher is written basically entirely in inline assembly because it does the actual stack manipulation:
1void kos_dispatch(KOS_Task *task)
2{
3 // the call to this function should push the return address into the stack.
4 // we will now construct saving context. The entire context needs to be
5 // saved because it is very possible that this could be called from within
6 // an isr that doesn't use the call-used registers and therefore doesn't
7 // save them.
8 asm volatile (
9 "push r31 \n\t"
10 "push r30 \n\t"
11 "push r29 \n\t"
12 "push r28 \n\t"
13 "push r27 \n\t"
14 "push r26 \n\t"
15 "push r25 \n\t"
16 "push r24 \n\t"
17 "push r23 \n\t"
18 "push r22 \n\t"
19 "push r21 \n\t"
20 "push r20 \n\t"
21 "push r19 \n\t"
22 "push r18 \n\t"
23 "push r17 \n\t"
24 "push r16 \n\t"
25 "push r15 \n\t"
26 "push r14 \n\t"
27 "push r13 \n\t"
28 "push r12 \n\t"
29 "push r11 \n\t"
30 "push r10 \n\t"
31 "push r9 \n\t"
32 "push r8 \n\t"
33 "push r7 \n\t"
34 "push r6 \n\t"
35 "push r5 \n\t"
36 "push r4 \n\t"
37 "push r3 \n\t"
38 "push r2 \n\t"
39 "push r1 \n\t"
40 "push r0 \n\t"
41 "in r0, %[_SREG_] \n\t" //push sreg
42 "push r0 \n\t"
43 "lds r26, kos_current_task \n\t"
44 "lds r27, kos_current_task+1 \n\t"
45 "sbiw r26, 0 \n\t"
46 "breq 1f \n\t" //null check, skip next section
47 "in r0, %[_SPL_] \n\t"
48 "st X+, r0 \n\t"
49 "in r0, %[_SPH_] \n\t"
50 "st X+, r0 \n\t"
51 "1:" //begin dispatching
52 "mov r26, %A[_next_task_] \n\t"
53 "mov r27, %B[_next_task_] \n\t"
54 "sts kos_current_task, r26 \n\t" //set current task
55 "sts kos_current_task+1, r27 \n\t"
56 "ld r0, X+ \n\t" //load stack pointer
57 "out %[_SPL_], r0 \n\t"
58 "ld r0, X+ \n\t"
59 "out %[_SPH_], r0 \n\t"
60 "pop r31 \n\t" //status into r31: andi requires register above 15
61 "bst r31, %[_I_] \n\t" //we don't want to enable interrupts just yet, so store the interrupt status in T
62 "bld r31, %[_T_] \n\t" //T flag is on the call clobber list and tasks are only blocked as a result of a function call
63 "andi r31, %[_nI_MASK_] \n\t" //I is now stored in T, so clear I
64 "out %[_SREG_], r31 \n\t"
65 "pop r0 \n\t"
66 "pop r1 \n\t"
67 "pop r2 \n\t"
68 "pop r3 \n\t"
69 "pop r4 \n\t"
70 "pop r5 \n\t"
71 "pop r6 \n\t"
72 "pop r7 \n\t"
73 "pop r8 \n\t"
74 "pop r9 \n\t"
75 "pop r10 \n\t"
76 "pop r11 \n\t"
77 "pop r12 \n\t"
78 "pop r13 \n\t"
79 "pop r14 \n\t"
80 "pop r15 \n\t"
81 "pop r16 \n\t"
82 "pop r17 \n\t"
83 "pop r18 \n\t"
84 "pop r19 \n\t"
85 "pop r20 \n\t"
86 "pop r21 \n\t"
87 "pop r22 \n\t"
88 "pop r23 \n\t"
89 "pop r24 \n\t"
90 "pop r25 \n\t"
91 "pop r26 \n\t"
92 "pop r27 \n\t"
93 "pop r28 \n\t"
94 "pop r29 \n\t"
95 "pop r30 \n\t"
96 "pop r31 \n\t"
97 "brtc 2f \n\t" //if the T flag is clear, do the non-interrupt enable return
98 "reti \n\t"
99 "2: \n\t"
100 "ret \n\t"
101 "" ::
102 [_SREG_] "i" _SFR_IO_ADDR(SREG),
103 [_I_] "i" SREG_I,
104 [_T_] "i" SREG_T,
105 [_nI_MASK_] "i" (~(1 << SREG_I)),
106 [_SPL_] "i" _SFR_IO_ADDR(SPL),
107 [_SPH_] "i" _SFR_IO_ADDR(SPH),
108 [_next_task_] "r" (task));
109}
So, a lot is happening here. There are 4 basic steps: Save the current context, update the current task's stack pointer, change the stack pointer to the next task, and restore the next task's context.
Inline assembly has an interesting syntax in GCC. I don't believe it is fully portable into non-GCC compilers, so this makes the code depend more or less on GCC. Inline assembly works by way of placeholders (called Operands in the manual). At the very end of the assembly statement, we see a series of comma-separated statements which define these placeholders/operands and how the assembly is going to use registers and such. First off, we pass in the SREG, SPL, and SPH registers as type "i", which is a constant number known at compile-time. These are simply the IO addresses for these registers (found in avr/io.h if you follow the #include chain deep enough). The next couple parameters are also "i" and are simply bit numbers and masks. The last parameter is the next task pointer passed in as an argument. This is the part where we see the reason why it is more convenient to do this in inline assembly rather than writing it up in an assembly file. While it is possible to look up how avr-gcc passes arguments to functions and discover that the arguments are stored in a certain order in certain registers, it is far simpler and less breakable to allow gcc to fill in the blanks for us. By stating that the _next_task_ placeholder is of type "r" (register), we force GCC to place that variable into some registers of its choosing. Now, if we were using some global variable or a static local, gcc would generate some code before our asm block placing those values into some registers. For this application, that could be quite bad since we depend on no (possibly stack-manipulating) code appearing between the function label and our asm block (more on this in the next paragraph). However, since arguments are passed by way of register, gcc will simply give us the registers by which they are passed in to the function. Since pointers are usually 16 bits on an 8-bit AVR (larger ones will have 3 bytes maybe...but I'm really not sure about this), it fits into two registers. We reference these in the inline assembly by way of "%A[_next_task_]" and "%B[_next_task_]" (note the A and B...these denote the LSB and MSB registers).
Storing the context is pretty straightforward: push all of the registers and push the status register. At this point you may ask, "What about the program counter? Didn't we have to push that earlier during kos_new_task?" When the function was called (using the CALL instruction), the return address was pushed onto the stack as a side-effect of that instruction. So, we don't need to push the program counter because it is already on there. This is also why it would be very bad if some code appeared before our asm block. It is likely that gcc will clear out some space on the stack and so we would end up with some junk between the return address on the stack and our first "push" instruction. This would mess up the task context frame and we will see later in the code that this will prevent this function from dispatching the task correctly when it became time for the task to be resumed.
Updating the stack pointer is slightly more tricky. Interrupts are disabled first because it would really suck if we got interrupt during this part (anytime the stack pointer is manipulated is a critical section). We then get to dereference the kos_current_task variable which contains our current task. If we remember from above, the very first thing in the KOS_Task structure is the stack pointer, so if we dereference kos_current_task, we are left with the address at which to store the stack pointer. From there, its as simple as loading the stack pointer into some registers and saving it into Indirect Register X (set by registers 26 and 27).
I should note here something about clearing the interrupt flag. Normally, we would want to check to see if interrupts were enabled beforehand so that we can know if we need to restore them. This code lacks an explicit check because of the fact that the status register (with interrupts possibly enabled) has already been stored. Later, when the current task is restored, the SREG will be restored and thus interrupts will be turned back on if they need to be. Similarly, if the next task has interrupts enabled, they will turned on in the same fashion.
After updating kos_current_task's stack pointer, we get to move the stack to the next task and set kos_current_task to point to the next task. This is essentially the reverse of the previous operation. Instead of writing to Indirect Register X (which points to the stack pointer of the task), we get to read from it. We also slip in a couple instructions to update the kos_current_task pointer so that it points to the next task. After we have changed the SPL and SPH registers to point to our new stack, the task passed into kos_dispatch is ready to be resumed.
Resuming the next task's context is a little less straightforward than saving it. We need to prevent interrupts from occurring while we restore the context. The reason for this is to ensure that we don't end up storing more than one context on that task's stack (and thereby increase the minimum required stack size to prevent a stack overflow). The problem here is that when we restore the status register, interrupts could be enabled at that point, rather that at the end when the context is done being restored. So, we need to restore in three steps: Restore the status register without the interrupt flag, restore all other registers, and then restore the interrupt flag. This is done by transferring the interrupt flag in the status register into the T (transfer) bit in the status register (that's the "bst" and "bld" instructions), clearing the interrupt flag, and then later executing either the ret or reti instruction based on this flag. The side effect is that we trash the T bit. I am not sure I can actually do this. This is one part that is tricky: The avr-gcc manual states that the T flag is a scratchpad, just like r0, and doesn't need to be restored by called functions. My logic here is that since the only way for a task to become blocked is either it being executed initially or from a call to kos_dispatch, gcc sees the dispatch call as a normal function call and will not assume that the T flag will remain unchanged.
After dancing around with bits and restoring the modified SREG, we proceed to pop off the rest of the registers in the reverse order that they were stored at the beginning of the function. At the very end, we use a T flag branch instruction to determine which return instruction to use. "ret" will return normally without setting the interrupt flag and "reti" will set the interrupt flag.
So, at this point we have implemented a task scheduler and dispatcher. Here is how it weighs in with avr-size when compiled for an ATMega48A running just the idle task:
1avr-size -C --mcu=atmega48a bin/kos.elf
2AVR Memory Usage
3----------------
4Device: atmega48a
5
6Program: 474 bytes (11.6% Full)
7(.text + .data + .bootloader)
8
9Data: 105 bytes (20.5% Full)
10(.data + .bss + .noinit)
Not the best, but its reasonable. The data usage could be taken down by reducing the number of maximum tasks. There are other RTOS available for AVR which can compile smaller. We could do several optimizations which I will discuss in the conclusion
So, we now have a task scheduler. The thing is, although capable of running multiple tasks, it is not possible for multiple tasks to actually run. Why? Because kos_dispatch is never called! We need something that causes the task to become blocked.
As a demonstration, I'm going to implement a simple semaphore. I won't go into huge detail since that isn't the point of this article (and it has been long enough), but here is the code:
Header contents:
1typedef struct {
2 int8_t value;
3} KOS_Semaphore;
4
5/**
6 * Initializes a new semaphore
7 */
8KOS_Semaphore *kos_semaphore_init(int8_t value);
9
10/**
11 * Posts to a semaphore
12 */
13void kos_semaphore_post(KOS_Semaphore *sem);
14
15/**
16 * Pends from a semaphore
17 */
18void kos_semaphore_pend(KOS_Semaphore *sem);
Source contents:
1static KOS_Semaphore semaphores[KOS_MAX_SEMAPHORES + 1];
2static uint8_t next_semaphore = 0;
3
4KOS_Semaphore *kos_semaphore_init(int8_t value)
5{
6 KOS_Semaphore *s = &semaphores[next_semaphore++];
7 s->value = value;
8 return s;
9}
10
11void kos_semaphore_post(KOS_Semaphore *semaphore)
12{
13 ATOMIC_BLOCK(ATOMIC_RESTORESTATE)
14 {
15 KOS_Task *task;
16 semaphore->value++;
17
18 //allow one task to be resumed which is waiting on this semaphore
19 task = task_head;
20 while (task)
21 {
22 if (task->status == TASK_SEMAPHORE && task->status_pointer == semaphore)
23 break; //this is the task to be restored
24 task = task->next;
25 }
26
27 task->status = TASK_READY;
28 kos_schedule();
29 }
30}
31
32void kos_semaphore_pend(KOS_Semaphore *semaphore)
33{
34 ATOMIC_BLOCK(ATOMIC_RESTORESTATE)
35 {
36 int8_t val = semaphore->value--; //val is value before decrement
37
38 if (val <= 0)
39 {
40 //we need to wait on the semaphore
41 kos_current_task->status_pointer = semaphore;
42 kos_current_task->status = TASK_SEMAPHORE;
43
44 kos_schedule();
45 }
46 }
47}
So, our semaphore will cause a task to become blocked when kos_semaphore_pend is called (and the semaphore value was <= 0) and when kos_semaphore_post is called, the highest priority task that is blocked on the particular semaphore will be made ready.
Just so this makes sense, let's go through an example sequence of events:
Here's a program that does just this:
1/**
2 * Main file for OS demo
3 */
4
5#include "kos.h"
6
7#include <avr/io.h>
8#include <avr/interrupt.h>
9
10#include "avr_mcu_section.h" //these two lines are for simavr
11AVR_MCU(F_CPU, "atmega48");
12
13static KOS_Semaphore *sem;
14
15static uint8_t val;
16
17static uint8_t st[128];
18void the_task(void)
19{
20 TCCR0B |= (1 << CS00);
21 TIMSK0 |= (1 << TOIE0);
22 while (1)
23 {
24 kos_semaphore_pend(sem);
25 TCCR0B = 0;
26
27 val++;
28 }
29}
30
31int main(void)
32{
33 kos_init();
34
35 sem = kos_semaphore_init(0);
36
37 kos_new_task(&the_task, &st[127]);
38
39 kos_run();
40
41 return 0;
42}
43
44ISR(TIMER0_OVF_vect)
45{
46 kos_isr_enter();
47 kos_semaphore_post(sem);
48 kos_isr_exit();
49}
Running this with avr-gdb and simavr we can see this in action. I placed breakpoints at the val++ line and the kos_semaphore_post line. Here's the output with me pressing Ctrl-C at the end once it got into and stayed in the infinite loop in the idle task:
1(gdb) break main.c:27
2Breakpoint 1 at 0x35a: file src/main.c, line 27.
3(gdb) break main.c:47
4Breakpoint 2 at 0x38a: file src/main.c, line 47.
5(gdb) continue
6Continuing.
7Note: automatically using hardware breakpoints for read-only addresses.
8
9Breakpoint 2, __vector_16 () at src/main.c:47
1047 kos_semaphore_post(sem);
11(gdb) continue
12Continuing.
13
14Breakpoint 2, __vector_16 () at src/main.c:47
1547 kos_semaphore_post(sem);
16(gdb) continue
17Continuing.
18
19Breakpoint 2, __vector_16 () at src/main.c:47
2047 kos_semaphore_post(sem);
21(gdb) continue
22Continuing.
23
24Breakpoint 1, the_task () at src/main.c:27
2527 val++;
26(gdb) continue
27Continuing.
28
29Breakpoint 1, the_task () at src/main.c:27
3027 val++;
31(gdb) continue
32Continuing.
33
34Breakpoint 1, the_task () at src/main.c:27
3527 val++;
36(gdb) continue
37Continuing.
38^C
39Program received signal SIGTRAP, Trace/breakpoint trap.
40kos_idle_task () at src/kos.c:27
4127 {
You may have noticed that the interrupt was called three times before we even got to val++. The reason for this is that timer0 is an 8-bit timer and I used no prescaler for its clock, so the interrupt will happen every 255 cycles. Given that the dispatcher is nearly 100 instructions and the scheduler isn't exactly short either, the interrupt could easily be called three times before it manages to resume the task after it blocks (including the time it takes to block it).
Before I finish up I want to mention a few things about debugging with avr-gdb. This project was the first time I had ever needed to use an simulator and debugger to even get the program to run. It would have been impossible to write this using an actual device since very little is revealed when operating the device. Here are a few things I learned:
This has been a long post, but it is a complicated topic. Writing something like this is actually considered writing an operating system (albeit just the kernel portion and a small one at that) and the debug along for just this post took me a while. One must have a good knowledge of how exactly the processor works. I found my knowledge lacking, actually, and I learned a lot about how the AVR works. The other thing is that things like concurrency and interrupts must be considered from the very beginning. They can't be an afterthought.
The scheduler and dispatcher I have described here are not perfect nor are they the most optimal efficient design. For one thing, my design uses a huge amount of RAM compared to other RTOS options. My scheduler and dispatcher are also inefficient, with the scheduler having an O(N) complexity depending on the number of tasks. My structure does, however, allow for O(1) time when suspending a task (although I question the utility of this...it worked better with the 8086 scheduler I made for class than with the AVR). Another problem is that kos_dispatch will not work with avr-gdb if the program is stopped during this function (it has a hard time decoding the function prologue because of the large number of push instructions). I haven't found a solution to this problem and it certainly made debugging a little more difficult.
So, now that I've told you some of what's wrong with the above, here are two RTOS which can be used with the AVR and are well tested:
Anyway, I hope that this article is useful and as usual, any suggestions and such can be left in the comments. As mentioned before, the code for this article can be found on github here: https://github.com/kcuzner/kos-avr
Every time I do one of these bus emulation projects, I tell myself that the next time I do it I will use an oscilloscope or DLA. However, I never actually break down and just buy one. Once more, I have done a bus emulation project flying blind. This is the harrowing tale:
Code & Schematics (kicad): https://github.com/kcuzner/pop-n-music-controller
A couple of days ago, I was asked to help do some soldering for a modification someone was trying to do to a PS1 controller. He informed me that it was for the game Pop 'n Music and that it required a special controller to be played properly. Apparently, official controllers can sell for $100 or more, so modifying an existing controller was the logical thing to do. After much work and pain, it was found that while modifying an existing controller was easy, it wasn't very robust and could easily fall apart and so I built one using an ATMega48 and some extra components I had lying around. The microcontroller emulates the PSX bus which is used to communicate between the controller and the playstation/computer. As my reference for the bus, I used the following two web pages:
The complete schematics and software can be found on my github.
The concept behind the controller mod was simple: Run wires from the existing button pads to some arcade-style buttons arranged in the pattern needed for the controller. It worked well at first, but after a little while we began to have problems:
So, we began exploring other options. I found this site detailing emulation of these controllers using either 74xx logic or a microcontroller. It is a very good resource, and is mostly correct about the protocol. After looking at the 74xx logic solution and totaling up the cost, I noticed that my $1.75 microcontroller along with the required external components would actually come out to be cheaper than buying 4 chips and sockets for them. Even better, I already had a microcontroller and all the parts on hand, so there was no need for shipping. So, I began building and programming.
PSX controllers communicate using a bus that has a clock, acknowledge, slave select, psx->controller (command) line, and controller->psx (data) line. Yes, this looks a lot like an SPI bus. In fact, it is more or less identical to a SPI Mode 3 bus with the master-in slave-out line driven open collector. I failed to notice this fact until later, much to my chagrin. Communication is accomplished using packets that have a start signal followed by a command and waiting for a response from the controller. During the transaction, the controller declares its type, the number of words that it is going to send, and the actual controller state. I was emulating a standard digital controller, so I had to tell it that my controller type was 0x41, which is digital with 1 word data. Then, I had to send a 0x5A (start data response byte) and two bytes of button data. My initial approach involved writing a routine in C that would handle pin changes on INT0 and INT1 which would be connected to the command and clock lines. However, I failed to anticipate that the bus would be somewhere in the neighborhood of 250Khz-500Khz and this caused some serious performance problems and I was unable to complete a transaction with the controller host. So, I decided to try writing the same routine in assembly to see if I could squeeze every drop of performance out of it possible. I managed to actually get it to complete a transaction this way, but without sending button data. To make matters worse, every once in a while it would miss a transaction and this was quite noticeable when I made an LED change state with every packet received. It was very inconsistent and that was without even sending button data. I eventually realized the problem was with the fact that making the controller do so much between cycles of the clock line actually caused it to miss bits. So, I looked at the problem again. I noticed that the ATMega48A had an SPI module and that the PSX bus looked similar, but not exactly like, an SPI bus. However, running the bus in mode 3 with the data order reversed and the MISO driving the base of a transistor operating in an open-collector fashion actually got me to be able to communicate to the PSX bus on almost the first try. Even better, the only software change that had to be made was inverting the data byte so that the signal hitting the base of the transistor would cause the correct changes on the MISO line. So, I hooked up everything as follows:
After doing that, suddenly I got everything to work. It responded correctly to the computer when asked about its inputs and after some optimization, stopped skipping packets due to taking too much time processing button inputs. It worked! Soon after getting the controller to talk to the computer, I discovered an error in the website I mentioned earlier that detailed the protocol. It mentioned that during transmission of the data about the buttons that the control line was going to be left high. While its a minor difference, I thought I might as well mention this site, which lists the commands correctly and was very helpful. As I mentioned before, one problem that was encoutered was that in order for the controller to be recognized as a pop-n-music controller by an actual playstation, the left, right, and down buttons must be pressed. However, it seems that the PSX->USB converter that we were using was unable to handle having those 3 pressed down at once. So, there needed to be a mode switch. The way for switching modes I came up with was to hold down both start and select at the same time for 3 seconds. After the delay, the modes would switch. The UI interaction for this is embodied in two LEDs. One LED is lit for when it is in PSX mode and the other when it is in emulator mode. When both buttons are pressed, both LEDs light up until the one for the previous mode shuts off. At first, I had the mode start out every time the controller was started in the same mode, no matter what the previous mode was before it was shut off. It soon became apparent that this wouldn't do, and so I looked in to using the EEPROM to store the flag value I was using to keep the state of the controller. Strangely, it worked on the first try, so the controller will stay in the same mode from the last time it was shut off. My only fear is that switching the mode too much could degrade the EEPROM. However, the datasheet says that it is good for 100,000 erase/write cycles, so I imagine it would be quite a while before this happens and other parts of the controller will probably fail first (like the switches).
I next began assembly. I went the route of perfboard with individual copper pads around each hole because that's what I have. Here are photos of the assembly, sadly taken on my cell phone because my camera is broken. Sorry for the bad quality...
So, with the controller in the box and everything assembled, it seems that all will be well with the controller. It doesn't seem to miss keypresses or freeze and is able to play the game without too many hiccups (the audio makes it difficult, but that's just a emulator tweaking issue). The best part about this project is that in terms of total work time, it probably took only about 16 hours. Considering that most of my projects take months to finish, this easily takes the cake as one of my quickest projects start to finish.
Recently, I got my hands on a Raspberry Pi and one of the first things I wanted to do with it was to turn it into my complete AVR development environment. As part of that I wanted to make avrdude be able to program an AVR directly from the Raspberry Pi with no programmer. I know there is this linuxgpio programmer type that was recently added, but it is so recent that it isn't yet included in the repos and it also requires a compile-time option to enable it. I noticed that the Raspberry Pi happens to expose its SPI interface on its expansion header and so I thought to myself, "Why not use this thing instead of bitbanging GPIOs? Wouldn't that be more efficient?" Thus, I began to decipher the avrdude code and write my addition. My hope is that things like this will allow the Raspberry Pi to be used to explore further embedded development for those who want to get into microcontrollers, but blew all their money on the Raspberry Pi. Also, in keeping with the purpose that the Raspberry Pi was originally designed for, using it like this makes it fairly simple for people in educational surroundings to expand into different aspects of small computer and embedded device programming.
As my addition to avrdude, I created a new programmer type called "linuxspi" which uses the userspace SPI drivers available since around Linux ~2.6 or so to talk to a programmer. It also requires an additional GPIO to operate as the reset. My initial thought was to use the chip select as the reset output, but sadly, the documentation for the SPI functions mentioned that the chip enable line is only held low so long as the transaction is going. While I guess I could compress all the transactions avrdude makes into one giant burst of data, this would be very error prone and isn't compatible with avrdude's program structure. So, the GPIO route was chosen. It just uses the sysfs endpoints found in /sys/class/gpio to manipulate a GPIO chosen in avrdude.conf into either being in a hi-z input state or an output low state. This way, the reset can be connected via a resistor to Vcc and then the Raspberry Pi just holds reset down when it needs to program the device. Another consequence which I will mention here of choosing to use the Linux SPI drivers is that this should actually be compatible with any Linux-based device that exposes its SPI or has an AVR connected to the SPI; not just the Raspberry Pi.
So, down to the nitty gritty: How can I use it? Well, at the moment it is in a github repository at https://github.com/kcuzner/avrdude. As with any project that uses the expansion header on the Raspberry Pi, there is a risk that a mistake could cause your Raspberry Pi to die (or let out the magic smoke, so to speak). I assume no responsibility for any damage that may occur as a result of following these directions or using my addition to avrdude. Just be careful when doing anything involving hooking stuff up to the expansion port and use common sense. Remember to measure twice and cut once. So, with that out of the way, I will proceed to outline here the basic steps for installation and usage.
The best option here until I bother creating packages for it is to do a git clone directly into a directory on the Raspberry Pi and build it from there on the Raspberry Pi itself. I remember having to install the following packages to get it to compile (If I missed any, let me know):
Also, if your system doesn't have a header at "linux/spi/spidev.h" in your path, you probably need to install that driver. I was using Arch Linux and it already had the driver there, so for all I know its always installed. You also should take a look to make sure that "/dev/spidev0.0" and "/dev/spidev0.1" or something like that exist. Those are the sort of endpoints that are to be used with this. If they do not exist, try executing a "sudo modprobe spi_bcm2708". If the endpoints still aren't there after that, then SPI support probably isn't installed or enabled for your kernel.
After cloning the repo and installing those packages, run the "./boostrap" script which is found in the avrdude directory. This will run all the autoconf things and create the build scripts. The next step is to run "./configure" and wait for it to complete. After the configure script, it should say whether or not "linuxspi" is enabled or disabled. If it is disabled, it was not able to find the header I mentioned before. Then run "make" and wait for it to complete. Remember that the Raspberry Pi is a single core ARM processor and so building may take a while. Afterwards, simply do "sudo make install" and you will magically have avrdude installed on your computer in /usr/local. It would probably be worthwhile to note here that you probably want to uninstall any avrdude you may have had installed previously either manually or through a package manager. The one here is built on top of the latest version (as of May 26th, 2013), so it should work quite well and be all up to date and stuff for just using it like a normal avrdude. I made no changes to any of the programmer types other than the one I added.
To check to see if the avrdude you have is the right one, you should see an output similar to the following if you run this command (tiny-tim is the name of my Raspberry Pi until I think of something better):
1kcuzner@tiny-tim:~/avrdude/avrdude$ avrdude -c ?type
2
3Valid programmer types are:
4 arduino = Arduino programmer
5 avr910 = Serial programmers using protocol described in application note AVR910
6 avrftdi = Interface to the MPSSE Engine of FTDI Chips using libftdi.
7 buspirate = Using the Bus Pirate's SPI interface for programming
8 buspirate_bb = Using the Bus Pirate's bitbang interface for programming
9 butterfly = Atmel Butterfly evaluation board; Atmel AppNotes AVR109, AVR911
10 butterfly_mk = Mikrokopter.de Butterfly
11 dragon_dw = Atmel AVR Dragon in debugWire mode
12 dragon_hvsp = Atmel AVR Dragon in HVSP mode
13 dragon_isp = Atmel AVR Dragon in ISP mode
14 dragon_jtag = Atmel AVR Dragon in JTAG mode
15 dragon_pdi = Atmel AVR Dragon in PDI mode
16 dragon_pp = Atmel AVR Dragon in PP mode
17 ftdi_syncbb = FT245R/FT232R Synchronous BitBangMode Programmer
18 jtagmki = Atmel JTAG ICE mkI
19 jtagmkii = Atmel JTAG ICE mkII
20 jtagmkii_avr32 = Atmel JTAG ICE mkII in AVR32 mode
21 jtagmkii_dw = Atmel JTAG ICE mkII in debugWire mode
22 jtagmkii_isp = Atmel JTAG ICE mkII in ISP mode
23 jtagmkii_pdi = Atmel JTAG ICE mkII in PDI mode
24 jtagice3 = Atmel JTAGICE3
25 jtagice3_pdi = Atmel JTAGICE3 in PDI mode
26 jtagice3_dw = Atmel JTAGICE3 in debugWire mode
27 jtagice3_isp = Atmel JTAGICE3 in ISP mode
28 linuxgpio = GPIO bitbanging using the Linux sysfs interface (not available)
29 linuxspi = SPI using Linux spidev driver
30 par = Parallel port bitbanging
31 pickit2 = Microchip's PICkit2 Programmer
32 serbb = Serial port bitbanging
33 stk500 = Atmel STK500 Version 1.x firmware
34 stk500generic = Atmel STK500, autodetect firmware version
35 stk500v2 = Atmel STK500 Version 2.x firmware
36 stk500hvsp = Atmel STK500 V2 in high-voltage serial programming mode
37 stk500pp = Atmel STK500 V2 in parallel programming mode
38 stk600 = Atmel STK600
39 stk600hvsp = Atmel STK600 in high-voltage serial programming mode
40 stk600pp = Atmel STK600 in parallel programming mode
41 usbasp = USBasp programmer, see http://www.fischl.de/usbasp/
42 usbtiny = Driver for "usbtiny"-type programmers
43 wiring = http://wiring.org.co/, Basically STK500v2 protocol, with some glue to trigger the bootloader.
Note that right under "linuxgpio" there is now a "linuxspi" driver. If it says "(not available)" after the "linuxspi" description, "./configure" was not able to find the "linux/spi/spidev.h" file and did not compile the linuxspi programmer into avrdude.
There is a little bit of configuration that happens here on the Raspberry Pi side before proceeding to wiring it up. You must now decide which GPIO to sacrifice to be the reset pin. I chose 25 because it is next to the normal chip enable pins, but it doesn't matter which you choose. To change which pin is to be used, you need to edit "/usr/local/etc/avrdude.conf" (it will be just "/etc/avrdude.conf" if it wasn't built and installed manually like above). Find the section of the file that looks like so:
1programmer
2 id = "linuxspi";
3 desc = "Use Linux SPI device in /dev/spidev*";
4 type = "linuxspi";
5 reset = 25;
6;
The "reset = " line needs to be changed to have the number of the GPIO that you have decided to turn into the reset pin for the programmer. The default is 25, but that's just because of my selfishness in not wanting to set it to something more generic and having to then edit the file every time I re-installed avrdude. Perhaps a better default would be "0" since that will cause the programmer to say that it hasn't been set up yet.
After setting up avrdude.conf to your desired configuration, you can now connect the appropriate wires from your Raspberry Pi's header to your microchip. A word of extreme caution: The Raspberry Pi's GPIOs are NOT 5V tolerant, and that includes the SPI pins . You must do either one of two things: a) Run the AVR and everything around it at 3.3V so that you never see 5V on ANY of the Raspberry Pi pins at any time (including after programming is completed and the device is running) or b) Use a level translator between the AVR and the SPI. I happen to have a level translator lying around (its a fun little TSSOP I soldered to a breakout board a few years back), but I decided to go the 3.3V route since I was trying to get this thing to work. If you have not ever had to hook up in-circuit serial programming to your AVR before, perhaps this would be a great time to learn. You need to consult the datasheet for your AVR and find the pins named RESET (bar above it), MOSI, MISO, and SCK. These 4 pins are connected so that RESET goes to your GPIO with a pullup resistor to the Vcc on your AVR, MOSI goes to the similarly named MOSI on the Raspberry Pi header, MISO goes to the like-named pin on the header, and SCK goes to the SPI clock pin (named SCLK on the diagram on elinux.org). After doing this and double checking to make sure 5V will never be present to the Raspberry Pi , you can power on your AVR and it should be able to be programmed through avrdude. Here is a demonstration of me loading a simple test program I made that flashes the PORTD LEDs:
1kcuzner@tiny-tim:~/avrdude/avrdude$ sudo avrdude -c linuxspi -p m48 -P /dev/spidev0.0 -U flash:w:../blink.hex
2[sudo] password for kcuzner:
3
4avrdude: AVR device initialized and ready to accept instructions
5
6Reading | ################################################## | 100% 0.00s
7
8avrdude: Device signature = 0x1e9205
9avrdude: NOTE: "flash" memory has been specified, an erase cycle will be performed
10 To disable this feature, specify the -D option.
11avrdude: erasing chip
12avrdude: reading input file "../blink.hex"
13avrdude: input file ../blink.hex auto detected as Intel Hex
14avrdude: writing flash (2282 bytes):
15
16Writing | ################################################## | 100% 0.75s
17
18avrdude: 2282 bytes of flash written
19avrdude: verifying flash memory against ../blink.hex:
20avrdude: load data flash data from input file ../blink.hex:
21avrdude: input file ../blink.hex auto detected as Intel Hex
22avrdude: input file ../blink.hex contains 2282 bytes
23avrdude: reading on-chip flash data:
24
25Reading | ################################################## | 100% 0.56s
26
27avrdude: verifying ...
28avrdude: 2282 bytes of flash verified
29
30avrdude: safemode: Fuses OK
31
32avrdude done. Thank you.
There are two major things to note here:
Other than that, usage is pretty straightforward and should be the same as if you were using any other programmer type.
As issues crop up, I hope to add improvements like changing the clock frequency and maybe someday adding TPI support (not sure if necessary since this is using the dedicated SPI and as far as I know, TPI doesn't use SPI).
I hope that those using this can find it helpful in their fun and games with the Raspberry Pi. If there are any issues compiling and stuff, either open an issue on github or mention it in the comments here.