Bootloader for ARM Cortex-M0: No VTOR

In my most recent project I selected an ARM Cortex-M0 microcontroller (the STM32F042). I soon realized that there is a key architectural piece missing from the Cortex-M0 which the M0+ does not have: The vector table offset register (VTOR).

I want to talk about how I overcame the lack of a VTOR to write a USB bootloader which supports a semi-safe fallback mode.

The source for this post can be found here (look in the “bootloader” folder):

Table of contents:

What is the VTOR?

Near the heart of the ARM Cortex is the NVIC, or Nested Vector Interrupt Controller. This is used for prioritizing peripheral interrupts (I2C byte received, USB transaction complete, etc) and core signals (hard fault, system timer tick, etc) while managing the code which is executed in response. The NVIC works by using a lookup table at a specific location to determine what code to execute. As an example, the interrupt table for the STM32F042 looks something like this:

Address Description
0x00000000 Address of initial stack offset in RAM
0x00000004 Reset handler address
0x00000008 NMI handler address
0x0000000C HardFault handler address
0x00000010-0x00000028 Reserved (other Cortex-M processors have more items here)
0x0000002C SVCall handler address
0x00000030-0x00000034 Reserved (same as other reserved fields)
0x00000038 PendSV handler address
0x0000003C System tick handler address
0x00000040 STM32 WWDG handler address
0x00000044 STM32 PVD_VDDIO2 handler address
0x00000048 STM32 RTC handler address
0x0000004C STM32 FLASH handler address

When an interrupt occurs, the NVIC will examine this table, read the handler address from it, push some special information onto the stack (the exception frame), and then execute the handler. This exact sequence is fairly complex, but here are some resources if you’re interested in learning more:

For any program meant to run on an ARM Cortex processor there’ll be some assembly (or maybe C) that looks like this (this one was provided by ST’s CMSIS implementation for the STM32F042):

Then in my linker script I have the “SECTIONS” portion start out like this:

The assembly snippet creates the table for the NVIC (g_pfnVectors in this example) and assigns it to the “.isr_vector” section. The linker script then locates this section right at the beginning of the flash (the “KEEP(*(.isr_vector))” right at the beginning after some variable declarations). When the program is compiled what I end up with it something that looks like this (this is an assembly dump of the beginning of one of my binaries):

For the first several 32-bit words I have created a bunch of function pointers which make up the table that the NVIC will read. After that table, the actual code starts.

So, what is the VTOR? In some ARM Cortex architectures (I know at least the ARM Cortex-M0+, ARM Cortex-M3, and ARM Cortex-M4 support this) there is a register located at address 0xE000ED08 called the “Vector Table Offset Register”. This is a 7-bit aligned address (so its 7 LSBs must be zero) which points to the location of this interrupt vector table. On boot this register contains 0x00000000 and so when power comes up, the handler whose address lives at 0x00000004 is executed to handle the reset. Later on, the program might modify the VTOR so that it points at some other location in memory. For an example, let’s say 0x08008000. After that point, the NVIC will look up the addresses for each handler relative to that address. So if an SVCall exception occurred the NVIC would read 0x0800802C to determine the address of the handler to call.

One thing you may have noticed at this point is that my assembly dump earlier had everything living relative to address 0x08000000. However, I said that that the VTOR’s reset value was 0x00000000. So, how does the STM32’s ARM core know where to find the table? All STM32’s I’ve seen so far implement a “boot remapping” feature which uses the physical “BOOT0” pin to map the flash (which starts at 0x08000000) onto the memory space starting at 0x00000000 like so (may vary slightly by STM32):

BOOT0 pin Result
0 0x08000000 (Main Flash Memory) mapped onto 0x00000000
1 System Memory (which is a ROM usually containing some bootloader supplied by ST) is mapped onto 0x00000000

Some STM32s have support for extra modes like mapping the SRAM (address 0x20000000) onto 0x00000000. So although the VTOR’s default value is 0x00000000, since the STM32 is remapping 0x08000000 into that space the ARM Cortex core sees the contents of the flash when it loads information from locations relative to 0x00000000 if the BOOT0 pin is tied low.

Bootloaders and the VTOR

At this point we can talk about how bootloaders would use the VTOR. In my last post on the subject, I didn’t really talk extensively about interrupts beyond mentioning that the VTOR is overwritten as part of the process of jumping to the user program. The reason this is done is so that after the bootloader has decided to transfer execution to the user program that interrupts executed in the program are directed to the handlers dictated by the user program. Ideally, the user program doesn’t even need to worry about the fact that its running in a boot-loaded manner.

On a microcontroller with a separate bootloader and user program the flash is partitioned into two segments: The bootloader which always lives right at the beginning of flash so that the STM32 boots into the bootloader and the user program which lives much further down in the flash. I usually put my user programs at around the 8KB mark since the (inefficient and clumsy) hobbyist bootloaders i write tend to use just a little over 4K of the flash. When the bootloader runs it performs the following sequence:

  1. Determine if a user program exists. If the user program does not exist, start running the main bootloader program and abort this sequence.
  2. Disable interrupts (important!)
  3. Set the VTOR register to the start of the user program (which just so happens to be the location of the user program’s vector table, since the table lives right at the beginning of the flash image of the program).
  4. Read the address of the stack pointer from the first word of the user program.
  5. Read the reset handler address from the second word of the user program.
  6. Set the stack pointer and jump to the reset handler.

So long as the user program doesn’t go and mess with the VTOR, any interrupts that occur after the user program re-enables interrupts will cause the NVIC to use the user program’s table to determine where the handlers are. Isn’t that awesome?

There is one step that the user program has to do, however. It needs to properly offset all of its addresses in the flash. As I mentioned in my previous post about bootloaders this is pretty easy to do in the linker script by just tricking it into thinking that the flash starts at the beginning of the user program partition (example on a 32K microcontroller):

The user program is now tricked into thinking that flash starts at 0x08002000 and is only 24K. We can see that this was successful if we take a look at the beginning of the disassembly of a compiled program:

All the addresses are offset by 0x08002000. Now all the bootloader has to do is set the VTOR to 0x08002000 and this user program will execute normally, interrupts and all.

Dealing with an absent VTOR

After I purchased the microcontroller for my project (an STM32F042) I discovered that it was a Cortex-M0 and did not have a VTOR. This was a rather unpleasant surprise and now I know that the M0 sucks compared to the M0+. Nonetheless, I was able to overcome this with a fairly simple software shim and that’s what I want to share.

There are two main issues that the VTOR addresses:

  • Determining the address of an interrupt when it isn’t relative to 0x00000000.
  • Forwarding execution of the interrupt routine to that custom address.

Since I don’t have a VTOR all of my interrupts will be executed from the bootloader by default. This is obviously unacceptable since things like a USB interrupt occurring would cause the user program to suddenly revert back to being the bootloader program (and probably into some undefined state since the SRAM would be all different).

To address the first problem, I had to make some changes to my bootloader and to the user program:

  1. I designated a certain area of SRAM in the bootloader program as holding data that will be valid while the processor is running.
  2. The user program’s linker script had its SRAM startpoint moved beyond this reserved section.

I implemented this with these linker script memory modifications:

Bootloader linker script:

Device linker script:

And this section addition in the bootloader linker script:

Now I have some reserved memory that the user program won’t touch. I use this area to store a psuedo-VTOR:

When the bootloader starts it will set this “bootloader_vtor” variable to the location of the bootloader’s vector table (the “extern uint32_t *g_pfnVectors” is linked to that table defined in assembly earlier).

Then, if the bootloader determines that the user program exists it overwrites bootloader_vtor with the following:

Ok, so that solves the issue of “where do the user’s interrupts live”. The next issue is actually jumping to those. Turns out, that’s not a hard problem to solve now. A quick change to the interrupt handlers makes short work of that:

What this does is determine which interrupt number is executing, multiply that number by 4, adds it to bootloader_vtor, and jumps to that location. This does naively what the VTOR does from the perspective of a program. This routine does stomp all over r0, r1, and r2, but since those registers are part of the ARM Exception Context, the original values have already been pushed onto the stack. Since we haven’t modified the stack at all (no pushes or pops here), the actual interrupt handler should be none the wiser that something happened before it (and it shouldn’t care what’s in r0, r1, and r2 as well).

The bootloader also gets a rather non-trivial change to its interrupt vector table:

All the interrupts point to this new Bootloader_IRQHandler except Reset. We now have another problem: What about the interrupts for when we actually need to execute the bootloader program instead of the user program. Well, that’s fairly simple now. We just move the g_pfnVectors table so that it is just like any other table:

I placed it in its own section for fun, but you’ll see that it now lives in “.text”. This means that it ends up in flash just like any other read only variable would and I don’t really care where it ends up. I suppose I could also have put it into the “rodata” section and that would probably be more correct, but it hasn’t caused a problem yet. Anyway, as we saw during bootloader_init the address of the bootloader’s g_pfnVectors is loaded into bootloader_vtor and if there’s no user program it will remain there.

With those two pieces together, we have effectively emulated the VTOR functionality. There are a few corner cases that this doesn’t handle very well (such as exceptions before the bootloader_vtor value is initialized) which likely result in Hard Faults, but I haven’t encountered an issue there yet.

Debugging the user program

With my other bootloader which relied on the VTOR, the presence of the bootloader was not only transparent to the user program, it was also transparent to the debugger. If I needed to run a stack trace during an interrupt or exception, it knew the names of all the symbols it would find in the trace. But now that we’ve mixed together the bootloader and user program, that makes things less straightfoward since the elf file from the user program won’t have any knowledge of the code executed by the bootloader.

While I didn’t overcome this issue completely and stack traces can be a little awkward if they are interrupted at just the right time, I did manage to massage gdb enough to make it somewhat usable:

The “add-symbol-file” directive points gdb towards my bootloader’s elf file and informs it about any symbols it might find if we just so happen to break while inside the bootloader’s program space. It also knows about the names of symbols inside the bootloader’s reserved SRAM space.


Here we’ve seen how the VTOR works, why it’s useful to bootloaders, and one way to overcome the issue of not having a VTOR in certain architectures like the Cortex-M0. If you have any questions or comments, feel free to leave a comment on this post. This isn’t the most robust way of fixing the problem, but for my hacking around it works just fine. I only hope that this post is useful and maybe sparks some idea with someone who is trying to overcome a similar problem.

8 thoughts on “Bootloader for ARM Cortex-M0: No VTOR

  1. Tim Bates

    You can reduce your Bootloader_IRQHandler by 2 instructions, from 10 to 8:

    ” mov r2,#63\n” // Prepare to mask SCB_ICSC_VECTACTIVE (6 bits, Cortex-M0)
    ” and r1, r2\n” // Mask the ICSR, r1 now contains the vector number
    ” lsl r1, #2\n” // Multiply vector number by sizeof(function pointer)

    can be replaced with

    ” lsl r1, #26\n” // Mask SCB_ICSR_VECTACTIVE (6 bits, Cortex-M0)
    ” lsl r1, #24\n” // Multiply vector number by sizeof(function pointer)


    ” add r0, r1\n” // Apply the offset to the table base
    ” ldr r0,[r0]\n” // Read the function pointer value

    can be replaced with

    ” ldr r0,[r0, r1]\n” // Read the function pointer value from the table

    1. Tim Bates


      ” lsl r1, #24\n” // Multiply vector number by sizeof(function pointer)

      should be

      ” lsr r1, #24\n” // Multiply vector number by sizeof(function pointer)

  2. Raz

    A big thank you for this article!

    For those using Keil:
    – Add in the Project options / Target, a small IRAM section to keep the pseudo VTOR; and the second one for the application
    – Then you define the VTOR variable:
    volatile uint32_t VTOR __attribute__((at(0x20000000)));
    or whatever start of the RAM address you have

    In the startup.s file you adapt the vector table and define your Bootloader_IRQHandler
    Bootloader_IRQHandler PROC
    EXPORT Bootloader_IRQHandler [WEAK]
    LDR R0, =VTOR ; Read the fake VTOR into R0
    LDR R0, [R0]
    LDR R1, =0xE000ED04 ; Prepare to read the ICSR
    LDR R1, [R1] ; Load the ICSR
    LSLS R1, #26 ; Mask SCB_ICSR_VECTACTIVE (6 bits, Cortex-M0)
    LSRS R1, #24 ; Multiply vector number by sizeof(function pointer)
    LDR R0, [R0, R1] ; Read the function pointer value from the table
    BX R0 ; Aaaannd branch!

    Note that if you use in the bootloader the function __disable_irq() you need to use in your application __enable_irq() else the interrupts will stay disabled even if you set correctly your NVIC->ISER

    All credits to Kevin Cuzner. Thank you Kevin.

  3. Maxim

    Kevin, thanks for the post.
    I’m confused why we need different vector table for bootloader and main app.
    you mention we need to disable interrupts for bootloader, so why don’t use initial (without remapping) vector table for main app only?
    Can you give a simple example to understand?

    1. admin Post author

      The issue is that the bootloader needs to use some of the interrupts, such as the USB interrupt. The two separate vector tables allow both the bootloader and app to use the interrupts without restriction. The application also doesn’t need to actually be “aware” that it is being executed by a bootloader other than the slight shifting done for addresses at link-time by the linker script.

      If there was one vector table, it would be difficult to have both the app and bootloader use the same interrupt. It could be done, but would require some crazy linker gymnastics since the app would need to be compiled against the bootloader. Supporting upgrades in those kinds of situations is difficult since the bootloader may vary if I ever update it, and I’d have to compile the apps against all versions of the bootloader I had ever used.

  4. Matthias U

    Can’t you tell the STM to remap the address at 0x000 to RAM? Then all you’d need to do is copy the vector table from the beginning of flash to it.

    1. admin Post author

      Yes that is possible, though it is somewhat less flexible than this solution in my opinion. However, it probably would perform slightly better since there are no delays due to the interrupt shim re-dispatching the interrupt. At the time I wrote this I was porting over a bootloader that depended on the VTOR and I hadn’t explored the SYSCFG_CFGR1 register thoroughly enough. I might have chosen that method if I would have noted that at the time.

Comments are closed.