Rust on ARM Cortex-M microcontrollers

Published on: 2017-12-04

Home | Archive

Jorge Aparicio is doing some amazing work with Rust and microcontrollers. In this article, he describes a procedure for programming microcontrollers using Rust which can be successfully adapted to a wide variety of Cortex-M processors. This article describes how I used the instructions provided by Jorge to program a Texas Instruments Stellaris/Tiva launchpad.

Installing the required tools on a GNU/Linux system

First, install the GNU ARM toolchain from The installed tools are compiled for 32 bit systems and you will most probably be running a 64 bit system. You can follow the instructions here to make your system capable of running 32 bit binaries.

Or, to make things easier, just get Linux 64 bit binaries from here!

You will now need to install the latest stable and nightly versions of Rust:

$ curl -sSf | sh
$ rustup update nightly
$ rustup default nightly

The xargo tool makes it easy to cross compile Rust code to targets like ARM microcontrollers; it can be installed using cargo:

$ cargo install xargo

Next, install the rust-src component:

$ rustup component add rust-src

Finally, install lm4flash, a small utility which is used to write to the flash memory of the microcontroller on the Stellaris/Tiva launchpad boards:

$ sudo apt-get install lm4flash

You are now ready to program your launchpad development board!

Blinking LED’s

$ git clone
$ cd launchpad-quickstart
$ make debug
$ make flash

output from Pramode C E on Vimeo.

The code


This file is critical. You need to have the proper values for RAM/Flash start address/size here.

  /* NOTE K = KiBi = 1024 bytes */
  /* TODO Adjust these memory regions to match your device memory layout */
  FLASH : ORIGIN = 0x00000000, LENGTH = 256K
  RAM : ORIGIN = 0x20000000, LENGTH = 32K


The volatile_register crate provides abstractions for reading and writing to I/O ports. For example:

const RCGCGPIO: *const RW<u32> = (0x400FE000 + 0x608) as *const RW<u32>;

pub fn portf_init() {
    unsafe {
        (*RCGCGPIO).modify(|val| val | (1 << PORT_F));
        (*RCGCGPIO).read(); // wait for clk to start

RCGCGPIO is a raw pointer to a RW object. This object has a modify method which takes a closure as argument. When the modify method is invoked, a 32 bit value is read from the associated I/O port, the closure is executed with this value as parameter and whatever the closure returns is written back to the I/O port (a read/modify/write operation).

Each LED is itself modelled as a simple struct which holds the associated enable, direction and data registers together with the pin number. This structure has a constructor which performs the required initializations and on and off methods to put the LED ON and OFF.

pub struct Led {
    pin: u32,
    enable_reg: *const RW<u32>,
    dir_reg: *const RW<u32>,
    data_reg: *const RW<u32>


The main function uses a cute trick to create a running led’s effect (a slightly modified version of the code given here: Suppose you have 5 LED’s numbered 1 to 5.

Let’s create a list like this: [(1,2), (2,3), (3,4), (4,5), (5,1)]

Now, here is what you need to do: (OFF 1, ON 2), (OFF 2, ON 3), (OFF 3, ON 4), (OFF 4, ON 5), (OFF 5, ON 1) and so on …

The list [(1,2), (2,3), …] can be obtained by zipping: [1, 2, 3, 4, 5] and [2, 3, 4, 5, 1]

The stellaris launchpad has an RGB LED, which we can consider as 3 independent LED’s.

Here is the code which blinks each LED in rotation:

 let leds = [red_led(), green_led(), blue_led()];

    loop {
            .for_each(|(current, next)| {

But isn’t this code horribly inefficient?

This is the kind of high level code which makes sense in languages like Python, Haskell, Scala etc. But really, won’t those iterators and closures make the code bloated and slow?

Absolutely not! If you do an optimized build of the code and dis-assemble it, you will see something like this:

 4e8:   6801            ldr     r1, [r0, #0]
 4ea:   f041 0108       orr.w   r1, r1, #8
 4ee:   6001            str     r1, [r0, #0]
 4f0:   6801            ldr     r1, [r0, #0]
 4f2:   f021 0102       bic.w   r1, r1, #2
 4f6:   6001            str     r1, [r0, #0]
 4f8:   6801            ldr     r1, [r0, #0]
 4fa:   f041 0104       orr.w   r1, r1, #4
 4fe:   6001            str     r1, [r0, #0]
 500:   6801            ldr     r1, [r0, #0]
 502:   f021 0108       bic.w   r1, r1, #8
 506:   6001            str     r1, [r0, #0]
 508:   6801            ldr     r1, [r0, #0]
 50a:   f041 0102       orr.w   r1, r1, #2
 50e:   6001            str     r1, [r0, #0]
 510:   6801            ldr     r1, [r0, #0]
 512:   f021 0104       bic.w   r1, r1, #4
 516:   6001            str     r1, [r0, #0]
 518:   e7e6            b.n     4e8 <_ZN11cortex_m_rt13reset_handler4main17h013016c728a33fddE+0x60>

This is as good as hand-written assembler! You don’t see any of the high-level abstractions in the raw assembly code, it is just a sequence of bit set and clear operations on an I/O port! That’s the magic of Rust/LLVM!

[Note: we have not written the delay routine in such a way as to prevent it from getting optimized away]