Building a USB bootloader for an STM32

As my final installment for the posts about my LED Wristwatch project I wanted to write about the self-programming bootloader I made for an STM32L052 and describe how it works. So far it has shown itself to be fairly robust and I haven’t had to get out my STLink to reprogram the watch for quite some time.

The main object of this bootloader is to facilitate reprogramming of the device without requiring a external programmer. There are two ways that a microcontroller can accomplish this generally:

  1. Include a binary image in every compiled program that is copied into RAM and runs a bootloader program that allows for self-reprogramming.
  2. Reserve a section of flash for a bootloader that can reprogram the rest of flash.

Each of these ways has their pros and cons. Option 1 allows for the user program to use all available flash (aside from the blob size and bootstrapping code). It also might not require a relocatable interrupt vector table (something that some ARM Cortex microcontrollers lack). However, it also means that there is no recovery without using JTAG or SWD to reflash the microcontroller if you somehow mess up the switchover into the bootloader. Option 2 allows for a fairly fail-safe bootloader. The bootloader is always there, even if the user program is not working right. So long as the device provides a hardware method for entering bootloader mode, the device can always be recovered. However, Option 2 is difficult to update (you have to flash it with a special program that overwrites the bootloader), wastes unused space in the bootloader-reserved section, and also requires some features that not all microcontrollers have.

Because the STM32L052 has a large amount of flash (64K) and implements the vector-table-offset register (allowing the interrupt vector table to be relocated), I decided to go with Option 2.

Example code for this post can be found here:


Parts of a bootloader

There’s a few pieces to the bootloader that I’m going to describe here which are necessary for its function.

  • Since the bootloader runs first: The ability to detect whether or not the bootloader should run. Also a way for the application to enter bootloader mode.
  • The ability to write to flash. And since this bootloader allows any program to be written:
  • Some way to transfer the program into the bootloader.

Bootloader Entry and Exit

When the watch first boots, the bootloader is going to be the first thing that runs. Not all bootloaders work like this, but this is one of the simplest ways to get things rolling.

First, there’s a few #defines and global variables that it would be good to know about for some context:

There are a few things that can be gathered from this:

  • We are going to be using the EEPROM. I made a convenient _EEPROM macro that makes a variable be placed into the EEPROM portion of memory.
  • There are some reset conditions which will cause the bootloader to enter bootloader mode no matter what. These reset conditions are checked by masking the CSR register with this mask.
  • We have some persistent state that consists of a “magic code” and the user program’s VTOR register value. This is all stored to EEPROM.

The first thing that the bootloader does is ask the following question to determine if it should run the user application:

Reading here, we can see that if there is a user_vtor value and there was either no reset condition forcing an entry into bootloader mode or the magic number was programmed to our state, we’re going to continue and load the user program rather than staying in bootloader mode.

The most important part here is the CSR check. This is what gives this bootloader some “recoverability” facilities. Basically if there’s any reset except a power-on reset, it will assume that there’s a problem with the application program and that it shouldn’t execute it. It will stay in bootloader mode. This aids in writing application firmware since a hard fault followed by a WDT reset will result in the microcontroller safely entering bootloader mode. The downside to this is that it could make debugging difficult if you are trying to figure out why something like a hard fault occurred in the first place (though I could argue that you should be using the SWD dongle anyway to debug your program).

The next thing to explain here is probably the purpose of this magic_code value. The idea here is to have some number that is highly unlikely to appear randomly in the EEPROM which we will use to “override” the CSR check. This occurs when the program is finished being flashed for the first time. The bootloader itself will execute a soft-reset to start the newly flashed user program (which is something that the CSR check will abort execution of the user program for).

After the bootloader determines that it needs to run the user’s program, it will execute the following:

The first step here is to reset the magic_code value, since this is a one-time CSR-check override. Next, interrupts are disabled and some steps are taken to start executing the user program:

  1. The user_vtor value is dereferenced and we read values directly from the previously programmed user application. For Cortex-M binaries, the interrupt table’s first two words are the initial stack pointer and the location of the reset interrupt. By dereferencing the VTOR value we read the user program like an array, extracting the first and second words to store as the future stack pointer and future program counter (since we want to start at the user program’s reset entry point).
  2. The actual VTOR register is written.
  3. Some inline assembly sets the stack pointer and then branches to the user program’s reset vector.

After these steps are performed, the user program will begin to run. Since this whole process occurs from the initial reset state of the processor and doesn’t modify any clock enable values, the user program runs in the same environment that it would if it were the program being executed as reset.

In summary, the bootloader is entered immediately upon device reset. It then decides to either run the user program (exiting the bootloader) or continue on in bootloader mode based on the value of the CSR register.

Self-programming via USB

One main goal I had with this bootloader is that it should be driverless and cross-platform. To facilitate this, the bootloader enumerates as a USB Human Interface Device. Here is my report descriptor for the bootloader:

Our reports are very simple: We have a 64-byte IN report and a 64-byte OUT report. Although the report descriptor only describes these as simple arrays, the bootloader will actually type-pun them into something a little more structured as follows:

To program the device, this bootloader implements a state machine that interprets sequences of OUT reports and issues IN reports as follows:

  • The status report: At certain points, the bootloader will issue IN reports back to the host which contain the last command received, any error flags, and some CRC32 values which are used to ensure we don’t swap upper and lower pages when transferring flash pages back to the host.
  • The reset command: The host issues an OUT report just containing 0x00000000 as its first four bytes. This resets the bootloader state machine and the bootloader will issue a single status report. In general, this command is to be executed three times in a row, since that will reset the bootloader state machine, even if it is in the middle of a programming cycle.
  • The write command: The host issues an OUT report with the command word set to 0x00000080. It also contains an address (the 6 lowest bits are ignored since flash writes always occur in groups (“pages”) of 128 bytes) and two CRC32s. The host will then issue two OUT reports, each containing 64 bytes of data to be written to the flash. The CRC32 previously sent are then used to verify that the two OUT reports were received in the correct order. The reason for this stems from how most OS’s implement USB HID devices: There is no concept of exclusive access. Two separate host programs could be issuing reports (or reading reports) to the device. If this somehow occurs, the bootloader state machine could see interleaved OUT reports for unrelated commands. The CRC32 check aims to prevent this by asserting that the two reports following the initial OUT report are the ones intended to be interpreted as pages to be written to the flash. Once two valid OUT reports are received, the bootloader will erase the user_vtor value (basically invalidating the previously programmed user application) and begin the writing process. Once the flash write process is complete, the bootloader will issue an status IN report.
  • The read command: The host issues an OUT report with the command word set to 0x00000040. It also contains the address to read (again, the lowest 6 bits are ignored). The bootloader will then issue two IN reports containing the contents of the page. A status IN report will immediately follow.
  • The exit command: The host issues an OUT report with the command word set to 0x000000C3. The address field is set to the location of the interrupt table at the start of the program. This is programmed to the persistent structure in the EEPROM so that the bootloader knows where to start programming. If everything is successful, the magic word is programmed and the bootloader resets into the user program.
  • The abort command: The host issues an OUT report with the command word set to 0x0000003E. If the user_vtor value hasn’t been erased (i.e. a write command hasn’t been issued yet), this programs the magic word and resets into the user program.

A more detailed description of this protocol can be found at

I’ll cover briefly the process for writing the flash on the STM32. On my particular model, flash pages are 128 bytes and writes are always done in 64-byte groups. This is fairly standard for NOR flash that is seen in microcontrollers. When self-programming, one of the main issues I ran into was that the processor is not allowed to access the flash memory while a flash write is occurring. This is a problem since the flash write process requires the program to poll registers and wait for events to finish. Since this code by default resides in the flash memory, that will cause the write to fail. The solution to this is fairly straightforward: We have to ensure that the code that actually performs flash writes lives in RAM. Since RAM is executable on the STM32, this is just as simple as requesting the linker to locate the functions in RAM. Here’s my code that does flash erases and writes:

The other thing to discuss about self-programming is the way the STM32 protects itself against erroneous writes. It does this by “locking” and “unlocking” using writes of magic values to certain registers in the FLASH module. The idea is that the flash should only be unlocked for just the amount of time needed to actually program the flash and then locked again. This prevents program corruption due to factors like incorrect code, ESD causing the microcontroller to wig out, power loss, and other things that really can’t be predicted. I do the following to actually execute writes to the flash (note how the following code uses the _RAM-located functions I noted earlier):

More information about these magic numbers and the unlock-lock sequencing can be found in the documentation for the PRGKEYR register in the FLASH module on the STM32L052.

By combining the bootloader state machine with these methods for writing the flash, we can build a self-programming bootloader. Internally, it also checks to make sure we aren’t trying to overwrite anything we shouldn’t by ensuring that the write only applies to areas of user flash, not to the bootloader’s reserved segment. In addition, it also verifies every page written against the original data to be programmed.

I do recommend reading through the code for the bootloader state machine (just bootloader.c in the bootloader directory). The state machine is table-based (see the “fsm” constant table variable and the “bootloader_tick” function) and I find that to be a very maintainable model for writing state machines in C.

Considerations for linking the application

One big thing we haven’t yet covered is how exactly the user application needs to be changed in order to be compatible with the bootloader. Due to how the bootloader is structured (it just lives in the first bit of flash) and how it is entered (any reset other than power-on will enter bootloader mode), the only real change needed to make a user program compatible is to relocate where the linker script places the user program in flash (leaving the first section of it blank). In my linker script for the LED watch, I changed the MEMORY directive to read as follows:

The flash segment has been shorted from 64K to 56K and the ORIGIN has been moved up to 0x08002000. The first 8KB of flash are now reserved for the bootloader. The bootloader is linked just like any other program, with the ORIGIN at 0x08000000, but its LENGTH is set to 8K instead.

When the user program wishes to enter bootloader mode, it just needs to issue a soft reset. The LED watch has a command for this that is issued over USB and just executes the following when it receives that command:

Very simple, very easy.

Host software

The host software is written in python and uses pyhidapi to talk to the bootloader. It really is nothing complicated, since it just reads intel hex files and dumps them into the watch by operating the state machine. When it is finished, it tells the bootloader the location of the start of the program so that it can read the initial stack pointer and the address of the reset function by issuing the “exit” command. This also boots into the user program. Pretty much all the heavy lifting and “interesting” stuff for a bootloader happens in the bootloader itself, rather than in host software.

One small hack is that the host software does hardcode where it believes the program should start (address 0x08002000). One possible resolution for this hack is to take elf files instead of intel hex files, or just assume the lowest address in the hex file is the starting point.


This is my first bootloader that I’ve written for one of my projects. There were challenges getting it to work at first, but I hope that I’ve shown it isn’t an incredibly complex thing to write. I actually got better performance flashing over USB than over SWD, so that is an additional win for writing this and if I didn’t use the SWD for debugging so much I would probably always use a bootloader like this on my projects.

I hope this has been a useful read and I do encourage actually checking out the source code, since I’ve been pretty brief about some parts of the bootloader.

2 thoughts on “Building a USB bootloader for an STM32

  1. Anonymous

    Great article, but you need to clarify some things.

    1A) You didn’t say what you used to compile the ARM code, or version number of the tools either.

    2A) You need to clarify that the STM32L052 already has a factory ROM’ed bootloader that supports UART & SPI bus, and bootloaders in other STM32 parts may support USB and I2C. ARM microcontrollers from other ARM chip makers may not have any ROM’ed bootloader, nor does 8-bit AVR chips used in the Arduino. The STM32 factory ROM’ed bootloader is an important detail, because lots of people aren’t aware they exist. See STM32 App Note AN2606 for more details.

    3A) Since you said you got better performance from USB than SWD, then it’s important to clarify some details that can affect the SWD speed.

    3B) You didn’t say how fast the ARM core is running. Though the STM32L052 can run up to 32 MHz, it doesn’t mean that you configured it to run at 32 MHz, because you didn’t state it. Depending on an ARM core, typically the maximum SWD clock speed is derived from the speed of the core, for example it might be 1/4 of the core clock, but I’m not sure how the limit is derived on the STM32L052.

    3C) Though you are using a STLink for SWD speed comparison, it doesn’t mean another SWD adapter wouldn’t be faster. I’m not sure what is the maximum SWD clock speed of the STLink, but the maximum SWD clock speed can vary from model to model from other vendors. For example, the Segger JLink family has different clock rate limitations across the JLink family.

    1. admin Post author

      Well, I suppose if I “need” to:

      1) Please see the readme here, it contains a list of software dependencies and build steps: I use the system packages in Arch, so the version is the latest version at the time the commits were made. Not the most reproducible, but I haven’t invested a lot of time in my home build system.

      2) I know about the built in bootloader from reading the documentation on the BOOT0 pin and app note it directs to (AN2606) while designing the LED watch PCB, but I chose to make my own because on this blog I take the path of most fun, rather than greatest efficiency. I reserve efficient behavior for work :). It is also a good exercise since having at least built a working bootloader once might allow me to build a better bootloader in the future if called upon to do so. If people out there usually don’t know about the bootloader, I think that’s a sign that more people need to read the documentation. Aside from the occasional spelling/grammar error and small omissions I think ST’s documentation is pretty good.

      3) I used a cheap ebay clone STLink v2. Its definitely not comparable to a production-ready Segger JLink and likely has far lower performance, but the cost is substantially less. At work I would spring for the JLink. At home on my own dime I live with low performance. Clock speed details can be found in the source for the bootloader, but I believe in the entire LED watch project I keep things to 16MHz or below. The speed increase was simply my experience here and a happy surprise. YMMV.


Leave a Reply

Your email address will not be published. Required fields are marked *