The Mathematics of Optimal Struct Field Ordering

If you’ve spent time writing Go (or C/C++), you’ve probably come across a common piece of advice: “Arrange struct fields from largest to smallest to save memory.”. Does this advice actually work? If so, why?

Its easy to test this advice with a simple example

package main  
  
import (  
    "fmt"  
    "unsafe"  
)  
  
type InefficientOrder struct {  
    A bool  // 1 byte  
    B int16 // 2 bytes  
    C int32 // 4 bytes  
    D int64 // 8 bytes  
    E bool  // 1 byte  
}  
  
type EfficientOrder struct {  
    D int64 // 8 bytes  
    C int32 // 4 bytes  
    B int16 // 2 bytes  
    A bool  // 1 byte  
    E bool  // 1 byte  
}  
  
func main() {  
    fmt.Printf("Size of InefficientOrder struct: %d bytes\n", unsafe.Sizeof(InefficientOrder{}))  
    fmt.Printf("Size of EfficientOrder struct: %d bytes\n", unsafe.Sizeof(EfficientOrder{}))  
}

When you run this code, you’ll see

Size of InefficientOrder struct: 24 bytes
Size of EfficientOrder struct: 16 bytes

A 33% saving in memory - just by reordering fields! So the advice definitely works in this case. But how can we be certain that placing fields in decreasing order of their sizes (actually, alignment requirements) is the optimal way to pack fields all the time?

We’ll take the help of Mathematics here. While there are several ways to approach this proof, the following less ‘pure’ solution is the one that clicked for me.

Prove adjacent pair optimality: Show that swapping any two adjacent fields (where the smaller one comes first) reduces or never increases padding.
Apply Adjacent Pair Optimisation Repeatedly : Show that repeatedly applying this swap leads to an optimally packed struct.

Step 1: Prove Adjacent Pair Optimality

We’re trying to prove that:

If any two adjacent fields $A$ and $B$ with alignment requirements $a_A$ and $a_B$ are out of order ( $a_A \lt a_B$ ), then swapping them reduces or at least does not increase the total padding

Proof

First, lets introduce some notations and definitions

$M$ is the arbitrary offset in memory from which the struct is ‘laid out’
$a_x$ is the alignment requirement of field $x$ . Typically this equals the size $s_x$ of the field, but for our proof we’ll consider them as potentially different.
$X$ (and $Y$ ) represents the current offset in memory
$round(X,a)$ is the function that returns the next offset value (including $X$ ) at which a field with $a$ alignment requirement can be placed.

Mathematically, this is $round(X,a) = X + (a - (X \mod a)) \mod a$ . In other words, the smallest number greater than or equal to $X$ that is a multiple of $a$

Now let’s consider the two possible orderings [A, B], [B, A] (remember, $a_A \lt a_B$ )

Case 1: [A, B]

Field $A$ is placed at $X_A = round(M, a_A)$

Field $B$ is placed after A, but since its offset needs to satisfy the alignment requirement $a_B$ , its actual offset is:

X_B = round(X_A + s_A, a_B) = X_A + s_A + \Delta_{B}

where

\Delta_{B} = (a_B - ((X_A + s_A) \mod a_B)) \mod a_B

Note that $0 \leq \Delta_{B} \lt a_B$

The total end offset is

T_{AB} = X_B + s_B = X_A + s_A + \Delta_{B} + s_B

Case 2: [B, A]

Field $B$ is now at $Y_B = round(M, a_B)$

Field $A$ is placed after $B$ satisfying its alignment requirement $a_A$ , so it should be placed at:

Y_A = round(Y_B + s_B, a_A) = Y_B + s_B + \Delta_{A}

where

\Delta_{A} = (a_A - ((Y_B + s_B) \mod a_A)) \mod a_A

Here $0 \leq \Delta_{A} \lt a_A$

The total end offset is

T_{BA} = Y_A + s_A = Y_B + s_B + \Delta_{A} + s_A

Comparing the two orders

T_{AB} - T_{BA} = (X_A - Y_B) + (\Delta{B} - \Delta{A})

At this point, we can’t make a general conclusion about the equation. All we know is that $\Delta_B \lt a_B$ and $\Delta_A \lt a_A$

However, we can leverage a key insight: alignment requirements are powers of 2. This implies that:

If $a_A < a_B$ , then $a_B$ is actually a multiple of $a_A$ . Specifically, $a_B = k \cdot a_A$ for some integer $k > 1$ .
An address aligned to $a_B$ is automatically aligned to $a_A$ as well. (Because if a number is divisible by 8, it’s also divisible by 4, 2, and 1.)

Using these insights:

$Y_B \geq X_A$ :

Since $a_B$ is a multiple of $a_A$ , any address that is a multiple of $a_B$ is also a multiple of $a_A$ . This means $Y_B$ is also a multiple of $a_A$ , but it might be farther from $M$ than $X_A$ is.
The worst-case padding in Case 1 is larger than worst-case padding in Case 2

The maximum possible value of $\Delta_{B}$ is $(a_B - 1)$ , which occurs when $(X_A + s_A)$ is just one byte past a multiple of $a_B$ . The maximum possible value of $\Delta_{B}$ is $(a_B - 1)$ , which occurs when $(X_A + s_A)$ is just one byte past a multiple of $a_B$ .

Now, for power-of-2 alignments where $a_B = k \cdot a_A$ :

The worst-case increase in starting position $(Y_B - X_A)$ is $(a_B - a_A)$
The worst-case reduction in padding is $(a_B - 1)$

Since $(a_B - 1) \geq (a_B - a_A)$ for all $a_A \geq 1$ , we can generally expect:

(\Delta_B - \Delta_A) \geq (Y_B - Y_A)

which means

T_{AB} - T_{BA} = (X_A - Y_B) + (\Delta{B} - \Delta{A}) \geq 0

or equivalently, $T_{BA} \leq T_{AB}$ . In other words, swapping the two fields (when they are out of order) does not worsen—and typically improves—the overall packing.

Step 2: Apply Adjacent Pair Optimisation Repeatedly

Step 1 proves something powerful: swapping any adjacent pair of fields where a smaller alignment comes before a larger alignment will either improve memory usage or leave it unchanged. Never worse, always better or the same.

So what happens if we want to optimally pack all fields in a struct? We simply need to apply this adjacent pair optimization repeatedly across all fields. For each pair that’s out of order (smaller alignment before larger), we swap them.

If that repeated pair comparison and swapping reminds you of bubble sort - well, you are right.

Just as bubble sort guarantees a sorted array, our repeated swapping guarantees we’ll end up with struct fields arranged in strictly descending order of alignment requirements. And since each swap either reduces the memory footprint or keeps it the same (never increases it), this final arrangement must be optimal. This is why placing larger struct fields first is always the most memory-efficient ordering.

Remember, premature optimisation is evil - or in this case, can lead to less readable code, so apply this technique judiciously where memory efficiency genuinely matters for your application’s performance characteristics.

Abhilash Meesala