Extended Reading: How to Track Already Created Threads
Implementation Goal: Tracking Already Created Threads
The process being debugged is a multi-threaded program, and when we are ready to start debugging, these threads have already been created and are running. When we perform the debugger attach operation, we do not enumerate all threads and manually attach each one. For convenience, we only manually attach the process and hope that the program side can handle the attach operations for other threads within the process, except for the main thread.
Take Delve as an example, it may not immediately enumerate all threads and attach them one by one after dlv attach <pid>
, but it should have this capability. For instance, when a debugger wants to track a specific thread, we can easily execute this operation, such as using dlv>threads
to view the thread list, and then dlv> thread <n>
to specifically track a particular thread.
Go programs are inherently multi-threaded, and they provide developers with goroutine concurrency interfaces, not thread-related interfaces. Therefore, even if Delve has this capability, it may not frequently use thread-related debugging commands. Due to the GMP scheduling model, you cannot be certain what is executing on the same thread, as the goroutines it executes will switch back and forth. Instead, dlv> goroutines
and dlv> goroutine <n>
are used more frequently.
Anyway, we must emphasize that we still hope to understand the underlying details of multi-threaded debugging. You might develop a debugger for another language in the future, right? It doesn't have to be Go. If that language is thread-oriented concurrency, the practical value of this knowledge still exists.
Basic Knowledge
How do we obtain all threads within a process? We can execute top -H -p <pid>
to list all thread information of the specified process and parse to get all thread IDs. However, the Linux /proc
virtual file system provides a more convenient way. In fact, we just need to traverse all directory names under /proc/<pid>/task
. The Linux kernel maintains task information corresponding to threads in the above directory, and each directory name is a thread LWP's PID. Each directory's content contains some information about this task.
For example, let's look at some information for the process with PID=1:
root🦀 ~ $ ls /proc/1/task/1/
arch_status clear_refs environ io mounts oom_score_adj sched stack uid_map
attr cmdline exe limits net pagemap schedstat stat wchan
auxv comm fd maps ns personality setgroups statm
cgroup cpuset fdinfo mem oom_adj projid_map smaps status
children cwd gid_map mountinfo oom_score root smaps_rollup syscall
The /proc
virtual file system is an interface provided by the kernel to interact with the kernel, which can be read and written. This is not a hack but a very standard method. Common tools like top
, vmstat
, cgroup
, etc., also achieve related functions by accessing /proc
.
OK, for our debugger, we currently only need to know:
- To enumerate all threads of a process, we traverse the directories under
/proc/<pid>/task
; - To read its complete instruction data, we read the
exe
file in the directory; - To read its startup parameter data, for convenience in restarting the debugged process or restarting debugging, we read the
cmdline
file in the directory;
OK, we can ignore the others for now.
Design Implementation
The implementation code for this part can be found in hitzhangjie/golang-debugger-lessons / 21_trace_old_threads.
First, for convenience in testing, we prepare a test program testdata/fork_noquit.c
, similar to the previous section's testdata/fork.c
. It creates threads and prints PID and TID information, but the difference is that the threads here never exit, mainly to give us more time for debugging and avoid tracking failures due to thread exit.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <pthread.h>
pid_t gettid(void);
void *threadfunc(void *arg) {
printf("process: %d, thread: %u\n", getpid(), syscall(SYS_gettid));
while (1) {
sleep(1);
}
}
int main() {
printf("process: %d, thread: %u\n", getpid(), syscall(SYS_gettid));
pthread_t tid;
for (int i = 0; i < 100; i++)
{
if (i % 10 == 0) {
int ret = pthread_create(&tid, NULL, threadfunc, NULL);
if (ret != 0) {
printf("pthread_create error: %d\n", ret);
exit(-1);
}
}
sleep(1);
}
while(1) {
sleep(1);
}
}
This program can be compiled with gcc -o fork_noquit fork_noquit.c -lpthread
, and then run ./fork_noquit
to observe its output.
Next, let's look at the debugger's code logic. This is mainly to demonstrate how to track already created threads in the process being debugged, and how to switch from tracking one thread to another.
The core logic of the program is as follows:
- We execute
./21_trace_old_threads $(pidof fork_noquit)
, which checks if the process exists. - Then, we enumerate the threads already created in the process by reading information from
/proc
and output all thread IDs. - We prompt the user to input a target thread ID to track, and after input, we start tracking this thread.
- When tracking a thread, if there was a previously tracked thread, we need to stop tracking the old thread before continuing to track the new thread.
package main
import (
"fmt"
"os"
"os/exec"
"runtime"
"strconv"
"syscall"
"time"
)
var usage = `Usage:
go run main.go <pid>
args:
- pid: specify the pid of process to attach
`
func main() {
runtime.LockOSThread()
if len(os.Args) != 2 {
fmt.Println(usage)
os.Exit(1)
}
fmt.Fprintf(os.Stdout, "===step1===: check target process existed or not\n")
// pid
pid, err := strconv.Atoi(os.Args[1])
if err != nil {
panic(err)
}
if !checkPid(int(pid)) {
fmt.Fprintf(os.Stderr, "process %d not existed\n\n", pid)
os.Exit(1)
}
// enumerate all threads
fmt.Fprintf(os.Stdout, "===step2===: enumerate created threads by reading /proc\n")
// read dir entries of /proc/<pid>/task/
threads, err := readThreadIDs(pid)
if err != nil {
panic(err)
}
fmt.Fprintf(os.Stdout, "threads: %v\n", threads)
// prompt user which thread to attach
var last int64
// attach thread <n>, or switch thread to another one thread <m>
for {
fmt.Fprintf(os.Stdout, "===step3===: supposing running `dlv> thread <n>` here\n")
var target int64
n, err := fmt.Fscanf(os.Stdin, "%d\n", &target)
if n == 0 || err != nil || target <= 0 {
panic("invalid input, thread id should > 0")
}
if last > 0 {
if err := syscall.PtraceDetach(int(last)); err != nil {
fmt.Fprintf(os.Stderr, "switch from thread %d to thread %d error: %v\n", last, target, err)
os.Exit(1)
}
fmt.Fprintf(os.Stderr, "switch from thread %d thread %d\n", last, target)
}
// attach
err = syscall.PtraceAttach(int(target))
if err != nil {
fmt.Fprintf(os.Stderr, "thread %d attach error: %v\n\n", target, err)
os.Exit(1)
}
fmt.Fprintf(os.Stdout, "process %d attach succ\n\n", target)
// check target process stopped or not
var status syscall.WaitStatus
var rusage syscall.Rusage
_, err = syscall.Wait4(int(target), &status, 0, &rusage)
if err != nil {
fmt.Fprintf(os.Stderr, "process %d wait error: %v\n\n", target, err)
os.Exit(1)
}
if !status.Stopped() {
fmt.Fprintf(os.Stderr, "process %d not stopped\n\n", target)
os.Exit(1)
}
fmt.Fprintf(os.Stdout, "process %d stopped\n\n", target)
regs := syscall.PtraceRegs{}
if err := syscall.PtraceGetRegs(int(target), ®s); err != nil {
fmt.Fprintf(os.Stderr, "get regs fail: %v\n", err)
os.Exit(1)
}
fmt.Fprintf(os.Stdout, "tracee stopped at %0x\n", regs.PC())
last = target
time.Sleep(time.Second)
}
}
// checkPid check whether pid is valid process's id
//
// On Unix systems, os.FindProcess always succeeds and returns a Process for
// the given pid, regardless of whether the process exists.
func checkPid(pid int) bool {
out, err := exec.Command("kill", "-s", "0", strconv.Itoa(pid)).CombinedOutput()
if err != nil {
panic(err)
}
// output error message, means pid is invalid
if string(out) != "" {
return false
}
return true
}
// reads all thread IDs associated with a given process ID.
func readThreadIDs(pid int) ([]int, error) {
dir := fmt.Sprintf("/proc/%d/task", pid)
files, err := os.ReadDir(dir)
if err != nil {
return nil, err
}
var threads []int
for _, file := range files {
tid, err := strconv.Atoi(file.Name())
if err != nil { // Ensure that it's a valid positive integer
continue
}
threads = append(threads, tid)
}
return threads, nil
}
Code Testing
- First, let's look at
testdata/fork_noquit.c
. This program creates a pthread thread every few seconds.
The main thread and other threads will print the PID and TID (where TID is the corresponding LWP's PID) of the thread.
Note: The difference between
fork_noquit.c
andfork.c
is that each thread continuouslysleep(1)
and never exits. The purpose is that our test takes a long time, and keeping the threads from exiting can avoid failures when we input a thread ID to executeattach thread
orswitch thread1 to thread2
due to the thread already exiting.
Below is the execution of the program waiting to be debugged:
zhangjie🦀 testdata(master) $ ./fork_noquit
process: 12368, thread: 12368
process: 12368, thread: 12369
process: 12368, thread: 12527
process: 12368, thread: 12599
process: 12368, thread: 12661
...
- We will simultaneously observe the execution of
./21_trace_old_threads <fork_noquit program process pid>
.
zhangjie🦀 21_trace_old_threads(master) $ ./21_trace_old_threads 12368
===step1===: check target process existed or not
===step2===: enumerate created threads by reading /proc
threads: [12368 12369 12527 12599 12661 12725 12798 12864 12934 13004 13075] <= created thread IDs
===step3===: supposing running `dlv> thread <n>` here
12369
process 12369 attach succ <= prompt user input and attach thread
process 12369 stopped
tracee stopped at 7f06c29cf098
===step3===: supposing running `dlv> thread <n>` here
12527
switch from thread 12369 thread 12527
process 12527 attach succ <= prompt user input and switch thread
process 12527 stopped
tracee stopped at 7f06c29cf098
===step3===: supposing running `dlv> thread <n>` here
- Above, we input two thread IDs, the first one was 12369, and the second one was 12527. Let's see how the thread states changed during these two inputs.
Initially, without input, the thread states were all S, indicating Sleep, because the threads were continuously doing while(1) {sleep(1);}
, which is understandable.
$ top -H -p 12368
top - 00:54:17 up 8 days, 2:10, 2 users, load average: 0.02, 0.06, 0.08
Threads: 7 total, 0 running, 7 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 0.1 sy, 0.0 ni, 99.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 31964.6 total, 26011.4 free, 4052.5 used, 1900.7 buff/cache
MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 27333.2 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12368 zhangjie 20 0 55804 888 800 S 0.0 0.0 0:00.00 fork_noquit
12369 zhangjie 20 0 55804 888 800 S 0.0 0.0 0:00.00 fork_noquit
12527 zhangjie 20 0 55804 888 800 S 0.0 0.0 0:00.00 fork_noquit
12599 zhangjie 20 0 55804 888 800 S 0.0 0.0 0:00.00 fork_noquit
12661 zhangjie 20 0 55804 888 800 S 0.0 0.0 0:00.00 fork_noquit
12725 zhangjie 20 0 55804 888 800 S 0.0 0.0 0:00.00 fork_noquit
12798 zhangjie 20 0 55804 888 800 S 0.0 0.0 0:00.00 fork_noquit
...
After we input 12369, the state of thread 12369 changed from S to t, indicating that the thread is now being debugged by the debugger (traced state).
12369 zhangjie 20 0 88588 888 800 t 0.0 0.0 0:00.00 fork_noquit
After we input 12527, the debugging behavior switched from tracking thread 12369 to tracking 12527. We saw that thread 12369 switched back from t to S, and 12527 switched from S to t.
12369 zhangjie 20 0 88588 888 800 S 0.0 0.0 0:00.00 fork_noquit
12527 zhangjie 20 0 88588 888 800 t 0.0 0.0 0:00.00 fork_noquit
OK, press Ctrl+C to kill the ./21_trace_old_threads
process, and then we continue to observe the thread states. They will automatically change from t to S, because the kernel is responsible for cleanup, i.e., resuming all tracees after the tracer exits.
Further Discussion
When debugging multi-threaded programs, you might only track one thread or track multiple threads simultaneously. The final implementation form depends on the debugger's interaction design. For example, command-line debuggers often tend to track one thread due to interface interaction reasons, but some graphical IDEs might prefer to provide the ability to track multiple threads simultaneously (I often did this when debugging Java multi-threaded programs with Eclipse). We demonstrated how to implement this capability here, and readers should be able to implement tracking multiple threads simultaneously on their own.