Philosophy

Dead Code Elimination is a False Promise

Published 14 Sep 2025. Written by Jakob Kastelic.

Go through the source of any nontrivial program and chances are, most of the code in there is used very rarely, and a lot if it may never be used. This is the more so if a program is supposed to be very portable, such as in the Linux kernel. Of course, compilers will eliminate it, so there’s no problem?

The Problem

Compilers are supposed to detect what’s not being used, and remove the code. They don’t do that. For example, processing one “compilation unit” at a time, a C compiler has no idea which functions will be referenced to from other units and which are entirely dead. (If the function is declared static, this does work, so declare as many of them static.)

Surely by the time the linker is invoked, all the function calls are clear and the rest can be stripped away? Also not likely. For example, the function calls could be computed during runtime as casts of integers into function pointers. If the linker were to remove them, this mechanism would fail. So long as several functions are compiled into the same section, the linker will always include all of them so long as at least one of them is used.

What if we instead explicitly mark which things we would like excluded?

Conditional Compilation

With conditional compilation, you can include/exclude whatever you want. When a program has these conditional compilation switches, dead code does get entirely deleted before the compiler even sees it. Most often, the result is a myriad of poorly-documented (more likely: entirely undocumented) switches that you don’t know what you’re allowed to disable.

For example, the Linux kernel provides the amazing menuconfig tool to manage these inclusions and exclusions. Still, it can take days of work trying out disabling and re-enabling things, till you give up and wisely conclude that this “premature optimization” is not worth your time and leave everything turned on as it is by default.

“Packages”

The sad reality of modern scripting languages, and even compiled ones like Rust, is that their robust ecosystems of packages and libraries encourage wholesale inclusion of code whose size is entirely out of proportion to the task they perform. (Don’t even mention shared libraries.)

As an example, let’s try out the popular Rust GUI library egui. According to its Readme, it is modular, so you can use small parts of egui as needed, and comes with “minimal dependencies”. Just what we need to make a tiny app! Okay, first we need Rust itself:

$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
$ du -hs .rustup .cargo
1.3G    .rustup
20M     .cargo

So far so good—the entire compiler and toolchain fits inside 1.3G, and we start with 20M of packages. Now let’s clone the GUI library and compile its simple example with a couple really simple widgets:

$ git clone git@github.com:emilk/egui.git
$ cd egui/examples/hello_world_simple
$ cargo run -p hello_world_simple
$ cd && du -hs .rustup .cargo
2.6G    .rustup
210M    .cargo

Oops! How many gigabytes of code does it take to show a couple characters and rectangles on the screen? Besides, the above took more than 20 min to complete on a machine vastly superior to the Cray-2 supercomputer. The compiled program was 236M in size, or 16M after stripping. Everyday We Stray Further …

This is far from being a “freak” example; even the simplest tasks in Rust and Python and pretty much anything else considered “modern” will pull in gigabytes of “essential” packages.

Packages get inextricably linked with the main program, resulting in an exponential explosion of complexity (besides the linear growth in size). Once linked, the program and its libraries/packages are no longer separate modules; you cannot simply replace a library for a different one, despite the host of false promises from the OOP crowd.

This is because the interfaces between these modules are very complex: hundreds or thousands of function calls, complex object operations, &c.

The Solution

The only way I know of that works is to not have dead code to begin with. Extra features should be strictly opt-in, not opt-out. These should be implemented with separate compilation and linking; in other words, each feature is a new program, not a library.

The objection may be raised that we’re advocating an extremely inefficient paradigm, increasing the already significant overhead of function calls with the much greater one of executing new programs. As an “extreme” example, a typical Unix shell will parse each command (with few exceptions) as the name of a new program to execute. How inefficient!?

Maintainable, replaceable code reuse can only happen when the interfaces are well specified and minimal, such as obtain between cooperating independent programs in a Unix pipeline.

The key to problem-solving on the UNIX system is to identify the right primitive operations and to put them at the right place. UNIX programs tend to solve general problems rather than special cases. In a very loose sense, the programs are orthogonal, spanning the space of jobs to be done (although with a fair amount of overlap for reasons of history, convenience or efficiency). Functions are placed where they will do the most good: there shouldn’t be a pager in every program that produces output any more than there should be filename pattern matching in every program that uses filenames.

One thing that UNIX does not need is more features. It is successful in part because it has a small number of good ideas that work well together. Merely adding features does not make it easier for users to do things — it just makes the manual thicker.[1]


  1. Pike, Rob, and Brian Kernighan. “Program design in the UNIX environment.” AT&T Bell Laboratories Technical Journal 63.8 (1984): 1595-1605. See also UNIX Style, or cat -v Considered Harmful. ↩︎

Linux

STM32MP135 Without U-Boot (TF-A Falcon Mode)

Published 11 Sep 2025, modified 20 Sep 2025. Written by Jakob Kastelic.

This is Part 3 in the series: Linux on STM32MP135. See other articles.

In this article, we use Arm Trusted Firmware (TF-A) to load the Linux kernel directly, without using U-Boot.[1] I have seen the idea of omitting the Secondary Program Loader (SPL) referred to as “falcon mode”, since it makes the boot process (slightly) faster. However, I am primarily interested in it as a way of reducing overall complexity of the software stack.

In this article, we will implement this in two ways. First, we modify the files as needed manually. At the end of the article, we provide an alternative method: directly integrate the changes into Buildroot.

Prerequisites

To get started, make sure to have built the default configuration as per the first article of this series. Very briefly, this entails cloning the official Buildroot repository, selecting a defconfig, and compiling:

$ git clone https://gitlab.com/buildroot.org/buildroot.git --depth=1
$ cd buildroot
$ make stm32mp135f_dk_defconfig
$ make menuconfig # add the STM32MP_USB_PROGRAMMER=1 flag to TF-A build
$ make

It is also recommended to learn how to flash the SD card without removing it via a USB connection, as explained in the second article.

Tutorial

The procedure is pretty simple. All we need to do is to modify some files, adjust some build parameters, recompile, and the new SD card image is ready to test.

  1. Before making any modifications, make a backup of the file containing U-Boot.

    $ cd output/images
    $ cp fip.bin fip_uboot.bin
    

    Double check that the above fip.bin was built using the additional ATF build variable STM32MP_USB_PROGRAMMER=1, otherwise USB flashing will not work!

    Open flash.tsv, and update the fip.bin to fip_uboot.bin there as well.

    (Despite removing U-Boot from the boot process, we are still going to use it to flash the SD card image via USB using the STM32CubeProg.)

  2. Two TF-A files need to be modified, so navigate to the TF-A build directory:

    $ cd ../build/arm-trusted-firmware-lts-v2.10.5
    

    Since the kernel is much bigger than U-Boot, it takes longer to load. We need to adjust the SD card reading timeout. In drivers/st/mmc/stm32_sdmmc2.c, find the line

    timeout = timeout_init_us(TIMEOUT_US_1_S);
    

    and replace it with

    timeout = timeout_init_us(TIMEOUT_US_1_S * 5);
    

    Next, we would like to load the kernel deep enough into the memory space so that relocation of the compressed image is not necessary. In file plat/st/stm32mp1/stm32mp1_def.h, find the line

    #define STM32MP_BL33_BASE              STM32MP_DDR_BASE
    

    and replace it with

    #define STM32MP_BL33_BASE              (STM32MP_DDR_BASE + U(0x2008000))
    

    Finally, in order to allow loading such a big BL33 as the kernel image, we adjust the max size. In the same file, find the line

    #define STM32MP_BL33_MAX_SIZE          U(0x400000)
    

    and replace it with

    #define STM32MP_BL33_MAX_SIZE          U(0x3FF8000)
    
  3. Next, we need to modify a couple build parameters. Open the make menuconfig and navigate to Bootloaders ---> ARM Trusted Firmware (ATF).

    • Under BL33, change from U-Boot to None.

    • Under Additional ATF build variables, make sure that U-Boot is not present and add the following key-value pairs:

      BL33=$(BINARIES_DIR)/zImage BL33_CFG=$(BINARIES_DIR)/stm32mp135f-dk.dtb
      

    Select “Ok” and “Esc” out of the menus, making sure to save the new configuration.

    Next, open the file board/stmicroelectronics/common/stm32mp1xx/genimage.cfg.template and increase the size of the fip partition, for example:

    partition fip {
    	image = "fip.bin"
    	size = 8M
    }
    

    Finally, since U-Boot will no longer be around to pass the Linux command line arguments, we can instead pass them through the device tree source. Open the file output/build/linux-6.12.22/arch/arm/boot/dts/st/stm32mp135f-dk.dts (you may have a different Linux version, just modify the path as appropriate) and add the bootargs into the chosen section, as follows:

    chosen {
    	stdout-path = "serial0:115200n8";
    	bootargs = "root=/dev/mmcblk0p4 rootwait";
    };
    
  4. Now we can rebuild the TF-A, the device tree blob, and regenerate the SD card image. Thanks to the magic of Buildroot, all it takes is:

    $ make linux-rebuild
    $ make arm-trusted-firmware-rebuild
    $ make
    

    Keep in mind that rebuilding TF-A is needed any time the Linux kernel or DTS or TF-A sources change, since the kernel gets packaged into the fip by the TF-A build process. In this case, the first make rebuilds the DTB, the second packages it in the fip, and the third makes sure it gets into the SD card.

  5. Set DIP switch to serial boot (press in the upper all of all rockers) and flash to SD card:

    $ sudo ~/cube/bin/STM32_Programmer_CLI -c port=usb1 -w output/images/flash.tsv
    

    Then reconfigure the DIP switches for SD card boot (press the bottom side of the second rocker switch from the left), and press the black reboot button.

If you watch the serial monitor carefully, you will notice that we transition from TF-A directly to OP-TEE and Linux. Success! No U-Boot in the boot process:

NOTICE:  Model: STMicroelectronics STM32MP135F-DK Discovery Board
NOTICE:  Board: MB1635 Var1.0 Rev.E-02
NOTICE:  BL2: v2.10.5(release):lts-v2.10.5
NOTICE:  BL2: Built : 20:58:52, Sep 10 2025
NOTICE:  BL2: Booting BL32
I/TC: Early console on UART#4
I/TC: 
I/TC: Embedded DTB found
I/TC: OP-TEE version: Unknown_4.3 (gcc version 14.3.0 (Buildroot 2025.08-rc3-87-gbbb0164de0)) #1 Thu Sep  4 03:06:46 UTC 2025 arm
...
(more OP-TEE messages here)
...
[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Linux version 6.12.22 (jk@Lutien) (arm-buildroot-linux-gnueabihf-gcc.br_real (Buildroot 2025.08-rc3-87-gbbb0164de0) 14.3.0, GNU ld (GNU Binutils) 2.43.1) #1 SMP PREEMPT Wed Sep  3 20:23:46 PDT 2025
[    0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=10c5387d

Buildroot integration

Instead of following the above instructions, we can automate the build process by integrating it into Buildroot. To this end, I provide the GitHub repository stm32mp135_simple that can be used as follows.

Clone the Buildroot repository. To make the procedure reproducible, let’s start from a fixed commit (latest at the time of this writing):

$ git clone https://gitlab.com/buildroot.org/buildroot.git
$ cd buildroot
$ git checkout 5b6b80bfc5237ab4f4e35c081fdac1376efdd396

Obtain this repository with the patches we need. Copy the defconfig and the board-specific files into the Buildroot tree.

$ git clone git@github.com:js216/stm32mp135_simple.git
$ cd buildroot # NOT stm32mp135_simple
$ git apply ../stm32mp135_simple/patches/add_falcon.patch
$ git apply ../stm32mp135_simple/patches/increase_fip.patch
$ cp ../configs/stm32mp135_simple/stm32mp135f_dk_falcon_defconfig configs
$ cp -r ../board/stm32mp135_simple/stm32mp135f-dk-falcon board/stmicroelectronics

Build as usual, but using the new defconfig:

$ make stm32mp135f_dk_falcon_defconfig
$ make

Flash to the SD card and boot into the new system. You should reach the login prompt exactly as in the default configuration—but without involving U-Boot

Discussion

To port the “default” STM32MP135 setup[2] to a new board design, one is expected to be comfortable writing and modifying the drivers and device tree sources that work with

That is a tall order for a new embedded developer trying to get started integrating Linux in their products. To make things worse, there is at present almost no literature to be found suggesting that a simpler, saner method exists. Certainly the chip vendors themselves do not encourage it.[3]

With this article, we have began chipping away at the unnecessary complexity. We have removed U-Boot from the boot chain. (We still use it for copying the SD card image via USB. One thing at a time!) Since our goal is to run Linux, the list above gives us a blueprint for the work that remains to be done: get rid of everything that is not Linux.

The software that you do not run is software you do not have to understand, test, debug, maintain, and be responsible for when it breaks down ten years down the line in some deeply embedded application, perhaps in outer space.

Upstreaming Status

19/12/2024: original Buildroot mailing list submission (1/1)

16/12/2025: response by Arnout Vandecappelle (link)

17/9/2025: amended submission (v2 0/2, 1/2, 2/2)

All Articles in This Series


  1. This approach is inspired by the ST wiki article How to optimize the boot time, under “Optimizing boot-time by removing U-Boot”. (cited 09/11/2025) ↩︎

  2. See the ST Wiki, OpenSTLinux distribution (cited 09/11/2025) ↩︎

  3. As per the ST forum, (cited 09/11/2025) the approach outlined in the present article is officially not supported by ST. ↩︎

Embedded

Bugs in NXP Kinetis Ethernet Driver

Published 10 Sep 2025. Written by Jakob Kastelic.

The SDK[1] drivers provided by NXP for use on the Kinetis K64 platform are extensive, well-tested and … not perfect. This article shows three bugs found in the ethernet driver. Note that none of this is original content; I merely put it together here for my future reference.

Forgetting to check for zero-length buffers

I have only seen this bug happen once in two years and have not found a way to reproduce it at will. So the analysis below may or may not be correct.

The symptom was that the firmware froze upon triggering the assertion in lwip/port/enet_ethernetif_kinetis.c:

“Buffer returned by ENET_GetRxFrame() doesn’t match any RX buffer descriptor”

After some Googling I found this forum thread, which suggests, in a roundabout way, that there is a missing check in fsl_enet.c. We have to add following to ENET_GetRxFrame():

if (curBuffDescrip->length == 0U)
{
    /* Set LAST bit manually to let following drop error frame
       operation drop this abnormal BD.
    */
    curBuffDescrip->control |= ENET_BUFFDESCRIPTOR_RX_LAST_MASK;
    result = kStatus_ENET_RxFrameError;
    break;
}

The NXP engineer on the forum explains: “I didn’t use this logic because I never meet this corner case and consider it a redundant operation.” I was curious if this “corner case” every happens, so I added a breakpoint, which got triggered after about two days of constant testing.

ChatGPT seems to think this check is necessary (but then again, I seem to be able to convince it of just about anything I do or do not believe in):

If you omit the check and DMA ever delivers a BD with length == 0: Your code will think it’s still in the middle of assembling a frame. It will not see the LAST bit yet, so it will happily advance to the next BD. That means the logic walks into an inconsistent state: rxBuffer may point to nothing, your rxFrame bookkeeping goes out of sync, and later you’ll crash on a buffer underrun, invalid pointer, or corrupted frame queue.

It remains to be seen if this check was behind my original crash, and if the body of the if statement is appropriate to handle the condition of unexpected zero-length buffer descriptor.

Credit: User pjanco first reported the error, while AbnerWang posted the solution. [source]

Incorrect memory deallocation

In fsl_enet.c, the function ENET_GetRxFrame() tries to deallocate the pointer of the receive buffer:

while (index-- != 0U)
{
    handle->rxBuffFree(base, &rxFrame->rxBuffArray[index].buffer,
        handle->userData, ringId);
}

First need to unpack some definitions to understand what the above means.

  1. If we dig into the rxBuffFree() function, we discover it in the file lwip/port/enet_ethernetif_kinetis.c. The buffer to be deallocated is passed as a pointer void * buffer, and freed

    int idx = ((rx_buffer_t *)buffer) - ethernetif->RxDataBuff;
    ethernetif->RxPbufs[idx].buffer_used = false;
    
  2. Next, what are rxFrame and rxBuffArray? The first one is of type enet_rx_frame_struct_t, which is defined in fsl_enet.h:

    typedef struct _enet_rx_frame_struct
    {
        enet_buffer_struct_t *rxBuffArray;
        ...
    } enet_rx_frame_struct_t;
    

    This allows us to see what is the type of rxBuffArray:

    typedef struct _enet_buffer_struct
    {
        void *buffer;
        uint16_t length;
    } enet_buffer_struct_t;
    
  3. Finally, what is ethernetif->RxDataBuff? We find it declared in lwip/port/enet_ethernetif_kinetis.c as the static array in the function ethernetif0_init():

    SDK_ALIGN(static rx_buffer_t rxDataBuff_0[ENET_RXBUFF_NUM],
        FSL_ENET_BUFF_ALIGNMENT);
    ethernetif_0.RxDataBuff = &(rxDataBuff_0[0]);
    

    More precisely, RxDataBuff is a pointer to the first element of this array. This pointer therefore has the type rx_buffer_t*.

    That type itself is declared at the top of the same file as an aligned version of a uint8_t buffer:

    typedef uint8_t rx_buffer_t[SDK_SIZEALIGN(ENET_RXBUFF_SIZE,
        FSL_ENET_BUFF_ALIGNMENT)];
    

Now we can take a step back and think whether the idx calculation would be best done with the buffer itself, or a pointer to it. The calculation subtracts the following:

The corrected code should pass the buffer pointer stored in .buffer, not the address of the .buffer field (omit the &):

handle->rxBuffFree(base, rxFrame->rxBuffArray[index].buffer,
    handle->userData, ringId);

Credit: This bug was found by KC on 7/31/2024.

Buffers not zero-initialized

Another bug in ethernetif0_init() in enet_ethernetif_kinetis.c: the ethernet buffer descriptor structs are declared static:

AT_NONCACHEABLE_SECTION_ALIGN(
    static enet_rx_bd_struct_t rxBuffDescrip_0[ENET_RXBD_NUM],
    FSL_ENET_BUFF_ALIGNMENT);
AT_NONCACHEABLE_SECTION_ALIGN(
    static enet_tx_bd_struct_t txBuffDescrip_0[ENET_TXBD_NUM],
    FSL_ENET_BUFF_ALIGNMENT);

The assumption is that since they are declared static, the descriptors will be zero-initialized at system startup. However, the macro AT_NONCACHEABLE_SECTION_ALIGN potentially places these descriptor in a special section that can bypass the zero-initialization, depending on the startup code and linker script.

In that case, we need to manually zero out these buffers. I put the following at the top of ethernetif_enet_init() in enet_ethernetif_kinetis.c:

// Buffer descriptors must be initialized to zero
memset(&ethernetif->RxBuffDescrip[0], 0x00, ENET_RXBD_NUM*sizeof(ethernetif->RxBuffDescrip[0]));
memset(&ethernetif->TxBuffDescrip[0], 0x00, ENET_TXBD_NUM*sizeof(ethernetif->TxBuffDescrip[0]));

Credit: This bug was also found by KC.


  1. I am using SDK version 2.11.0 for the MK64FN1M0xxx12. ↩︎