Improved debugging of a Rust program with GDB

spannerAndScrewdriver.jpg

Introduction

Over the last couple of years, I've been using Rust for my work and my pet projects. It's inevitable to come across bugs and luckily there are some tools for debugging. GDB started supporting Rust a fair bit of time ago. In addition, the Rust Foundation maintains the plugin rust-gdb. However, there is something to improve. GDB supports a scripting language and Python API. Using them I've been creating tools from project to project over the time. After a few projects, I noticed that I copy some of the tools. Those ones eventually shaped the project RustToolsGDB. The article shows how I came to each of the tool from the project.

Project structure

There are 3 kinds of the tools:

  1. Commands
  2. Convenience functions
  3. Frame filters

GDB allows a user to define a command with the internal scripting language. Convenience functions and filters are created with Python. As a result, RustToolsGDB has the file RustToolsGDB.gdb and the folder pyscripts with the Python scripts.

Testing routine

The first thing, which I wanted to improve the case when a test fails and the stack trace is emitted. Every time I had to create a GDB script with a breakpoint at the place of the failure, then run the test again. To avoid the second run, I developed a habit to run the test suite with this GDB script:

break ::core::panicking::panic_fmt

I named it init.gdb.

Unfortunately, the test suit required two commands to run it. The first command obtains the paths of the binaries:

cargo test --no-run

An example of the output:

...
Executable unittests src/lib.rs (target/debug/deps/calendar-39104aab2d2eb1a3)

The second commands run the binary within GDB:

rust-gdb -x init.gdb  --args target/debug/deps/calendar-39104aab2d2eb1a3

If test fails, the execution stops at panic_fmt. As a result, I see stack trace and able go up to the stack frame of the failed test to see the data without the need to run the tests again. The necessity of having to run two functions can be solved in a couple of ways:

  1. To make a shell script to make a shortcut.
  2. Make the shortcut with cargo-xtask.
  3. Use the project cargo-nextest which supports running with gdb.

The second thing, I wanted to improve was the stack trace itself:

#0  calendar::render::find_free_lane<calendar::render::atasco::ShortEvent> (new_event_begin=250, atasco=0x7ffff7c82da8,
    compute_end=0x55555559f480 <calendar::render::Rectangle::compute_bottom_edge>) at crates/calendar/src/render.rs:480
#1  0x000055555559fc75 in calendar::render::short_event_rectangles (short_events=&[calendar::Event](size=5) = {...}, first_date=0x7ffff7c833fc, arguments=0x7ffff7c833ec)
    at crates/calendar/src/render.rs:563
#2  0x0000555555597c40 in calendar::render::tests::atasco::test_collision () at crates/calendar/src/render.rs:797
#3  0x0000555555599087 in calendar::render::tests::atasco::test_collision::{closure#0} () at crates/calendar/src/render.rs:764
#4  0x00005555555999e6 in core::ops::function::FnOnce::call_once<calendar::render::tests::atasco::test_collision::{closure_env#0}, ()> ()
    at /home/laladrik/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250
#5  0x00005555555debab in core::ops::function::FnOnce::call_once<fn() -> core::result::Result<(), alloc::string::String>, ()> () at library/core/src/ops/function.rs:250
#6  test::__rust_begin_short_backtrace<core::result::Result<(), alloc::string::String>, fn() -> core::result::Result<(), alloc::string::String>> () at library/test/src/lib.rs:648
#7  0x00005555555dde2e in test::run_test_in_process::{closure#0} () at library/test/src/lib.rs:671
#8  core::panic::unwind_safe::{impl#23}::call_once<core::result::Result<(), alloc::string::String>, test::run_test_in_process::{closure_env#0}> ()
    at library/core/src/panic/unwind_safe.rs:272
#9  std::panicking::catch_unwind::do_call<core::panic::unwind_safe::AssertUnwindSafe<test::run_test_in_process::{closure_env#0}>, core::result::Result<(), alloc::string::String>> ()
    at library/std/src/panicking.rs:589
#10 std::panicking::catch_unwind<core::result::Result<(), alloc::string::String>, core::panic::unwind_safe::AssertUnwindSafe<test::run_test_in_process::{closure_env#0}>> ()
    at library/std/src/panicking.rs:552
#11 std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<test::run_test_in_process::{closure_env#0}>, core::result::Result<(), alloc::string::String>> ()
    at library/std/src/panic.rs:359
#12 test::run_test_in_process () at library/test/src/lib.rs:671
#13 test::run_test::{closure#0} () at library/test/src/lib.rs:592
#14 0x00005555555a19c4 in test::run_test::{closure#1} () at library/test/src/lib.rs:622
#15 std::sys::backtrace::__rust_begin_short_backtrace<test::run_test::{closure_env#1}, ()> () at library/std/src/sys/backtrace.rs:152
#16 0x00005555555a532a in std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure#0}<test::run_test::{closure_env#1}, ()> () at library/std/src/thread/mod.rs:559
#17 core::panic::unwind_safe::{impl#23}::call_once<(), std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<test::run_test::{closure_env#1}, ()>> ()
    at library/core/src/panic/unwind_safe.rs:272
#18 std::panicking::catch_unwind::do_call<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<test::run_test::{closure_env#1}, ()>>, ()> () at library/std/src/panicking.rs:589
#19 std::panicking::catch_unwind<(), core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<test::run_test::{closure_env#1}, ()>>> () at library/std/src/panicking.rs:552
#20 std::panic::catch_unwind<core::panic::unwind_safe::AssertUnwindSafe<std::thread::{impl#0}::spawn_unchecked_::{closure#1}::{closure_env#0}<test::run_test::{closure_env#1}, ()>>, ()> () at library/std/src/panic.rs:359
#21 std::thread::{impl#0}::spawn_unchecked_::{closure#1}<test::run_test::{closure_env#1}, ()> () at library/std/src/thread/mod.rs:557
#22 core::ops::function::FnOnce::call_once<std::thread::{impl#0}::spawn_unchecked_::{closure_env#1}<test::run_test::{closure_env#1}, ()>, ()> ()
    at library/core/src/ops/function.rs:250
#23 0x000055555561b7ff in alloc::boxed::{impl#28}::call_once<(), dyn core::ops::function::FnOnce<(), Output=()>, alloc::alloc::Global> () at library/alloc/src/boxed.rs:1966
#24 std::sys::pal::unix::thread::{impl#2}::new::thread_start () at library/std/src/sys/pal/unix/thread.rs:107
#25 0x00007ffff7cf8724 in start_thread (arg=<optimized out>) at pthread_create.c:448
#26 0x00007ffff7d7c80c in __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

It contains 26 lines, but only 4 from my project. The rest is the internals of the test runner. Most of the time you don't need the information, but it consumes the recourse which is very limited – attention span. I improved this creating a frame filter. Frame filters allow the information of a frame to be inspected and manipulated. I realized that I can create a filter which removes those frames from the standard library. The file path of those frames starts with library. Given that, the problem boiled down to removing the frames which are from those files. In addition, I didn't want to see the addresses of the frames. They are useful, but only in certain situations. I implemented the hiding of them with the filter. However, I do think that these two filters have to be separated. Eventually, after applying the filter, the stack trace became this:

#0  calendar::render::find_free_lane<calendar::render::atasco::ShortEvent>
    (new_event_begin=250, atasco=0x7ffff7c82da8, compute_end=0x55555559f480 <calendar::render::Rectangle::compute_bottom_edge>) at crates/calendar/src/render.rs:480
#1  calendar::render::short_event_rectangles (short_events=&[calendar::Event](size=5) = {...}, first_date=0x7ffff7c833fc, arguments=0x7ffff7c833ec)
    at crates/calendar/src/render.rs:563
#2  calendar::render::tests::atasco::test_collision () at crates/calendar/src/render.rs:797
#3  calendar::render::tests::atasco::test_collision::{closure#0} () at crates/calendar/src/render.rs:764
#4  core::ops::function::FnOnce::call_once<calendar::render::tests::atasco::test_collision::{closure_env#0}, ()> ()
    at /home/laladrik/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:250
#25 start_thread (arg=<optimized out>) at pthread_create.c:448
#26 __GI___clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

The last two lines survived the filter, because their paths don't match the pattern, but the stack trace became much more comprehensible, didn't it?

I named the filter RustToolsGDB.FrameFilter. GDB provides the list of the frame filters with the command:

info frame-filter

If the addresses and the functions from the standard library are needed, the filter can be disabled:

disable frame-filter global RustToolsGDB.FrameFilter

The last thing, it takes a few moments to remember the path to the function panic_fmt. As a result, I created the custom command:

define breakpanic
    break ::core::panicking::panic_fmt
end

Now, every debugging script I create for a Rust program starts with the command breakpanic.

Setting breakpoints

It happens frequently that you have to set a breakpoint which triggers for certain data. For example, a bug which is caused when some variable has some file name. The program executes the code which I want to set a breakpoint a lot of times with different values in the variable. Normally, I would set a breakpoint which triggers if the variable has the file name which cause the bug.

I had a case with one of my projects which is not under NDA. I had the structure Time and the code which parsed a string from some JSON data to create an instance of the structure:

impl nanoserde::DeJson for Time {
    fn de_json(
        state: &mut nanoserde::DeJsonState,
        input: &mut std::str::Chars,
    ) -> Result<Self, nanoserde::DeJsonErr> {
        if let nanoserde::DeJsonTok::Str = &mut state.tok {
            let s = core::mem::take(&mut state.strbuf);
            match s.as_str() {
                //  <----------------------------------I had to be here
            }
        } else {
            Err(state.err_token("time"))
        }
    }
}

The code rejected a valid string. Certainly a bug! I don't remember the root cause and the string; therefore let's assume it was "23:59". Given that, I just needed a breakpoint triggering when s equals "23:59". The GDB script to set such a breakpoint contains the following:

break crates/calendar/src/date.rs:169 if s == "23:59" # the line I pointed above

However, the breakpoint didn't trigger! It compelled me to create a test where this is the only time in the JSON data, because I wasn't ready to grind a dozen of objects. All in all, I found that the breakpoint doesn't trigger. It appeared, that GDB doesn't support string comparison for Rust. As a result, I had to create a convenience function.

Convenience functions in GDB are written in Python. Python used to be my main language for years – no problem here. However, I had to figure out how to get the internals of the strings. I knew that a string (&str) is a wide pointer. However, I didn't know how to dereference it and make a string out of the data. Luckily, the rust-gdb solved the problem. As a result, I created this helping function:

def make_string(value: gdb.Value):
    value_data_ptr = value['data_ptr']
    value_len = value['length']
    value_str = value_data_ptr.lazy_string(encoding="utf-8", length=value_len)
    return value_str.value().string()

And the definition of the convenience function:

class RustStreq(gdb.Function):
    """Compare Rust strings (&str)"""

    def __init__ (self):
        super().__init__('rust_streq')

    def invoke(self, left, right):
        if left.type.name != '&str':
            raise gdb.GdbError("The first argument must be &str")

        if right.type.name != '&str':
            raise gdb.GdbError("The second argument must be &str")

        return make_string(left) == make_string(right)

The class had two methods. The first defines the name of the convenience function. The second is essentially the body of the function.

After calling the class – sounds somewhat awkward but it's Python – the convenience function rust_streq appeared. That allowed me to set the conditional breakpoint

break crates/calendar/src/date.rs:169 if $rust_streq(s, "23:59")

The project includes a few more functions addressing the string inspection:

$rust_strpos(haystack, needle) - returns the position of needle in haystack. It returns -1 if the needle is not found.

$rust_streq(str1, str2) - returns true if str1 equals str2, otherwise returns false.

$rust_len(str) - the function returns the number of bytes in str. It initially was created for strings only, but later adapted for vectors also.

Inspecting elements

It's possible to see the entire vector with rust-gdb. However, there were a few times when I needed a single element to see in a big vector. I had results of SQL SELECT queries with LIMIT 1000, a batch of syscalls with their arguments from bpftrace, events in the calendar, dimensions of the hit boxes from the entire level. It was very easy to overlook something. I could've used the pointer of the underlying buffer of a vector, but GDB does not see the type pointer; therefore it's impossible to process the pointer as an artificial array and use the binary operator @.

GDB supports sending its output through a pipe to an external program. Example:

pipe print events | rg '\[0\]'

or the with the shortcut for "pipe"

| print events | rg '\[0\]'

It was inconvenient, because the search pattern came up in the data sometimes. I created the function $rust_get to solve the inconvenience. It takes two arguments: the vector and the index of the element to be inspected. Example of obtaining of the first element:

print $rust_get(events, 0)

It works fine. After a few times typing it, I realized that it can be improved with a command. So I added it, and it looks like this:

rust_get events 0

Typing "print" and the parenthesis is no longer necessary. Also, a command can have an alias in GDB. However, the command has a disadvantage: when a vector's element is a structure, it's impossible to get its field. It's not even clear how it would've worked. Like this?

(rust_get week_schedule.short, $i).title

But the command prints the element, it returns the result of print instead of the content it has printed.

All in all, the convenience function alleviated the debugging process a lot. There was another case, when I had a bug in the fetching of the events in my calendar. I wanted to see the title of every event from the input. With this convenience function it was simple enough:

b crates/calendar/src/obtain.rs:217
commands
    set $i = 0
    while $i < week_schedule.short.len
        p $rust_get(week_schedule.short, $i).title
        set $i += 1
    end
end

Which returned a comprehensible list of strings

$1 = "Network and awk exercises"
$2 = "Cocinar y comer"
$3 = "Práctica de lectura español"
$4 = "Read about Thread State Analysis"
$5 = "Make the notes about Thread State Analysis"
$6 = "Buscar trabajo"
$7 = "Desayuno"
$8 = "Tocar la batería"
$9 = "Buy a gift"
$10 = "Ejercicios"
$11 = "Practica de escritura táctil"
$12 = "Cocinar y comer"
$13 = "Revise the Linux Performance Tools Talk"
$14 = "Práctica de lectura inglés"
$15 = "Hosting payment"
$16 = "Clase de inglés"

Printing the entire structures in the vector would return the output which is too large to be shown within the article. To give a picture about the size of it, I piped it to wc to calculate the number of the words and characters in the output

(gdb) | print week_schedule.short | wc -wc
    752    6917

Instead of a conclusion

The state of the debugging of Rust programs is something to be improved. The authors of the language created rust-gdb which is a nice starting point. My project RustToolsGDB is a little step, I hope I'll implement things like memory comparison (alternative to $_memeq), element obtaining from hash maps and tree maps, stack frames filters for popular frameworks like Tokio, Egui, Bevy, Serde.

References

  • The picture of the article is based on this.