Variable-Length Data

Throughout DWARF, there is a large amount of information represented using integer values, from data segment offsets to array or structure sizes, and so on. Since most integer values are small numbers that can be represented with just a few bits, this means the data consists mainly of zeros, and the corresponding bits are effectively wasted.

DWARF defines a variable-length integer encoding scheme called Little Endian Base 128 (LEB128), which can compress the actual number of bytes used and reduce the size of the encoded data. LEB128 has two variants:

ULEB128: Used for encoding unsigned integers
SLEB128: Used for encoding signed integers

ULEB128 Encoding Scheme

UELB128 encoding algorithm:

MSB ------------------ LSB
      10011000011101100101  In raw binary
     010011000011101100101  Padded to a multiple of 7 bits
 0100110  0001110  1100101  Split into 7-bit groups
00100110 10001110 11100101  Add high 1 bits on all but last (most significant) group to form bytes
    0x26     0x8E     0xE5  In hexadecimal

→ 0xE5 0x8E 0x26            Output stream (LSB to MSB)

To summarize in plain language:

Convert the number to binary representation, with these bytes arranged in little-endian order (i.e., least significant byte first)
Split the integer into groups of 7 bits
Store each 7-bit group in a byte, using the highest bit (8th bit) as a flag:
- If there are more groups to follow, set this bit to 1
- If this is the last group, set this bit to 0

For example, the ULEB128 encoding of the number Uint64 624485 is:

624485 = 0x98765

Hexadecimal number, where each digit is represented by 4 binary bits, converted to binary representation:

1001 1000 0111 0110 0101

Considering splitting into groups of 7 bits, first pad to a multiple of 7 bits:

0 1001 1000 0111 0110 0101

Then split into groups of 7 bits, note that it's little-endian byte order, so split from the right:

0 1001 10 / 00 0111 0 / 110 0101

Group 1: 110 0101, because there are more to follow, set the 8th bit to 1, final result is 1110 0101 = 0xe5
Group 2: 00 0111 0, 000 1110, set the 8th bit to 1, final result is 1000 1110 = 0x8e
Group 3: 0 1001 10, 010 0110, set the 8th bit to 0, final result is 0010 0110 = 0x26

The encoded byte sequence is: []byte{0xe5 0x8e 0x26}, using only 3 bytes in total, while using the original data type uint64 would require 8 bytes.

SLEB128 Encoding Scheme

The SLEB128 encoding rules are similar, but when handling negative numbers, sign bit extension needs to be considered. There is no difference when handling positive numbers.

SLEB128 negative number encoding algorithm:

MSB ------------------ LSB
         11110001001000000  Binary encoding of 123456
     000011110001001000000  As a 21-bit number
     111100001110110111111  Negating all bits (ones' complement)
     111100001110111000000  Adding one (two's complement)
 1111000  0111011  1000000  Split into 7-bit groups
01111000 10111011 11000000  Add high 1 bits on all but last (most significant) group to form bytes
    0x78     0xBB     0xC0  In hexadecimal

In plain language:

If it's a negative number, first convert to two's complement representation
Then split into groups of 7 bits, still following little-endian order
Set the 8th bit to 1 for all groups except the last one
Get the final result

Summary

This article introduced the ULEB128 and SLEB128 encoding schemes that are storage-friendly for small integers, which can save storage space. For positive numbers, ULEB128 and SLEB128 encoding results are the same, but for negative numbers, SLEB128 needs to convert to its two's complement before encoding. Through this encoding method, DWARF can effectively compress integer data and reduce the size of debugging information. In practical applications, most integer values can be represented using 1-2 bytes, saving significant space compared to fixed 4-byte or 8-byte storage methods.

8.5.5 Variable Length Data

Variable-Length Data

ULEB128 Encoding Scheme

SLEB128 Encoding Scheme

Summary

results matching ""

No results matching ""