Accelerated Access

More Efficient Queries

Looking up data objects or functions by name: During debugging, when the tracee pauses execution, the debugger often needs to look up debug information for corresponding data objects or functions based on symbol names. This information may be distributed across the current or other compilation units. Sometimes the debugger only knows the name of a program construct (such as variables, functions, types, etc.), and sometimes it only has the address. If looking up by name only through DWARF Debug Information Entries (DIEs), the debugger would need to traverse all DIEs in each compilation unit, which is very time-consuming.

Looking up types by name: In some programming languages (such as C++), type names must always refer to the same concrete type. In this case, the compiler can choose to eliminate duplicate type definitions across all compilation units. Therefore, the debugger needs an efficient method to quickly locate specific type definitions by name. Similar to looking up global data objects, this also requires searching through type definition related DIEs in all compilation units of the program.

Looking up by address: When debug information for a subroutine needs to be found by address, the debugger can use the upper and lower pc attributes of the compilation unit's CIE to quickly narrow down the search range. However, these attributes only cover the code segment address ranges associated with the compilation unit entry. To find debug information for data objects by address, a complete search is needed. Additionally, when searching for debug information entries across different compilation units in large programs, it may be necessary to access many memory pages, which can significantly impact the debugger's performance.

To achieve more efficient lookup of program entities (including data objects, functions, and types) by name and address, the DWARF information generator can additionally generate three specialized tables. These tables contain information about debug information entries owned by specific compilation unit entries and use a more compact data format.

Lookup by Name

To support efficient lookup by name, DWARF maintains two additional tables: .debug_pubnames and .debug_pubtypes. Among these, .debug_pubnames describes the names of global objects and functions, while .debug_pubtypes describes global types. These two tables are essentially mapping tables from names to specific DIE (Debug Information Entry) locations.

Theoretically, debuggers could indeed implement similar functionality by pre-analyzing all DIE information in .debug_info and building their own name-to-DIE mapping tables. However, the .debug_pubnames and .debug_pubtypes sections still have their unique value:

  1. Avoid duplicate work - These tables are already optimized and generated by the compiler, so debuggers don't need to repeat this time-consuming process
  2. Memory efficiency - These tables use a more compact format, saving space compared to fully parsing .debug_info and maintaining mappings in memory
  3. On-demand loading - Debuggers can load only relevant parts of these tables as needed, without having to load and parse all .debug_info at once
  4. Standardization - Provides a unified query interface, eliminating the need for different debuggers to implement different indexing mechanisms

Therefore, these sections serve as "optional" optimization mechanisms that can help debuggers achieve a better balance between performance and resource consumption. Their existence greatly improves the efficiency of name-based lookups during debugging.

The data organization of the .debug_pubnames and .debug_pubtypes sections is as follows. Each compilation unit in the program has a corresponding unit in .debug_pubnames, and each unit contains:

  1. Header Information
    • unit_length: Total length of the unit (excluding the length field itself)
    • version: Version number (2 or 3)
    • debug_info_offset: Offset of the corresponding compilation unit in .debug_info
    • debug_info_length: Length of the corresponding compilation unit in .debug_info
  2. Name Entry list, each entry contains:
    • offset: Offset of the DIE within the compilation unit
    • name: Null-terminated string representing the name of the global object or function
  3. End marker
    • An offset of 0 indicates the end of the name entry list for that compilation unit

This organization allows debuggers to: quickly locate specific compilation unit information corresponding to a name, without needing to parse complete DIEs from the beginning, achieving on-demand loading.

Lookup by Address

To support efficient lookup by address, DWARF maintains a specialized acceleration lookup table in the .debug_aranges section. This table consists of a series of variable-length entries, each corresponding to a compilation unit, recording the address range information occupied by that compilation unit in the program's address space. Since different compilation units occupy non-overlapping regions in the program's address space, this table can quickly locate the compilation unit containing a specific address.

Although debuggers could also implement similar functionality by pre-analyzing all compilation unit DIEs and building their own address range index, the .debug_aranges section still has its value:

  1. It provides a standardized, optimized data structure, avoiding the need for each debugger to implement its own indexing mechanism
  2. For large programs, pre-loading and analyzing all compilation unit DIEs would consume a lot of memory and time, while .debug_aranges can be loaded on demand
  3. The compiler can apply specific optimizations when generating this table to make it more compact and efficient

Therefore, the .debug_aranges section, as part of the DWARF standard, provides debuggers with an optional but valuable performance optimization mechanism.

The data organization of the .debug_aranges section is as follows. Each compilation unit in the program has a corresponding unit in .debug_aranges, and each unit includes:

  1. Header Information
    • unit_length: Total length of the unit (excluding the length field itself)
    • version: Version number (2)
    • debug_info_offset: Offset of the corresponding compilation unit in .debug_info
    • address_size: Address size of the target machine (in bytes)
    • segment_size: Size of the segment selector (in bytes)
  2. Address Range Descriptor list, each descriptor contains:
    • segment: Segment selector (if segment_size is non-zero)
    • address: Starting address of the range
    • length: Length of the range Descriptors are sorted by starting address, with no overlap between adjacent descriptors. A descriptor with all zeros (both address and length are 0) indicates the end of the list.
  3. Alignment padding
    • Add necessary padding bytes before the descriptor list so that the address of the first descriptor is aligned to (2 * address_size) bytes

This organization allows debuggers to: quickly locate the compilation unit containing a specific address through binary search, without needing to parse complete DIEs from the beginning, achieving on-demand loading.

Summary

The auxiliary tables for accelerated lookup proposed in DWARF v4 are optional optimization schemes, meaning that the compilation toolchain doesn't necessarily have to generate them, and debuggers don't necessarily have to read them. In DWARF v5, .debug_pubnames and .debug_pubtypes have been merged into .debug_names. In practice, the compilation toolchain may not generate these tables, and even if they are generated, debuggers may not use them. It's good to be aware of them.

results matching ""

    No results matching ""