Inscrutable Coredump v. Unmoveable Grad Student

Situation: we have a core dump that is shy to reveal its inner workings. The goal is to extract some more information from this core dump, using fancier analyses. The core dump is 1.2GB in size, so I know that there is some insight in there, it is just buried.

(gdb) bt
#0  0x00007fbaa4653c30 in ?? ()
#1  0x0000000000000000 in ?? ()

Since we have an intact $pc (which refers to %rip), we can figure out the instructions it was executing.

(gdb) disassemble $pc-20,$pc+20
Dump of assembler code from 0x7fbaa4653c1c to 0x7fbaa4653c44:
   0x00007fbaa4653c1c:  add    %al,(%rax)
   0x00007fbaa4653c1e:  jmp    0x7fbaa4653a51
   0x00007fbaa4653c23:  nopl   0x0(%rax,%rax,1)
   0x00007fbaa4653c28:  mov    0x98(%r12),%rax
=> 0x00007fbaa4653c30:  cmpb   $0x48,(%rax)
   0x00007fbaa4653c33:  jne    0x7fbaa4653b90
   0x00007fbaa4653c39:  movabs $0x50f0000000fc0c7,%rdx
   0x00007fbaa4653c43:  cmp    %rdx,0x1(%rax)
End of assembler dump.

Let us inspect the registers.

(gdb) info reg
rax            0x323338342034342e  3617296722238387246
rbx            0x7fff678ebb50      140734930795344
rcx            0x7fba82086120      140439022231840
rdx            0x1                 1
rsi            0x1                 1

Okay so our %rax was clearly a gibberish address, no wonder dereferencing it failed. Now the question is what source file/line was mapped to $pc. ChatGPT says that the following can work:

(gdb) list *$pc
<no output>
(gdb) info symbol $pc
No symbol matches $pc.

ChatGPT also says that we can also dereference addresses using these, but first we need to know what library is laid out in our memory, and at what offset. Noting these down for later.

$ addr2line -e /path/to/your/executable 0xADDRESS
$ objdump -d -S /path/to/your/executable

Some more useful information, saving for later.

(gdb) info frame 0
Stack frame at 0x7fff678ebaf8:
 rip = 0x7fbaa4653c30; saved rip = 0x0
 called by frame at 0x7fff678ebb00
 Arglist at 0x7fff678ebae8, args:
 Locals at 0x7fff678ebae8, Previous frame's sp is 0x7fff678ebaf8
 Saved registers:
  rip at 0x7fff678ebaf0
(gdb) info frame 1
Stack frame at 0x7fff678ebb00:
 rip = 0x0; saved rip = 0x0
 caller of frame at 0x7fff678ebaf8
 Arglist at 0x7fff678ebaf0, args:
 Locals at 0x7fff678ebaf0, Previous frame's sp is 0x7fff678ebb00
 Saved registers:
  rip at 0x7fff678ebaf8
(gdb) info frame 2
No frame at level 2.

Let’s look at shared memory mappings now.

(gdb) info shared
No shared libraries loaded at this time.
(gdb) info proc mappings
Mapped address spaces:

          Start Addr           End Addr       Size     Offset objfile
      0x5585a6034000     0x5585a604b000    0x17000        0x0 /some/bin
      0x5585a604b000     0x5585a64ef000   0x4a4000    0x17000 /some/bin
      0x5585a64ef000     0x5585a6582000    0x93000   0x4bb000 /some/bin
      0x7fba3ad28000     0x7fba3c000000  0x12d8000        0x0 /dev/shm/...
      0x7fba40d29000     0x7fba40f41000   0x218000        0x0 /dev/shm/...

Alright, getting somewhere. I have no idea why info shared failed but info proc mappings did not. We want to find a mapping around the address 0x00007fbaa4653c30.

Found something.

0x7fbaa4647000     0x7fbaa4659000    0x12000     0x3000 /usr/lib/x86_64-linux-gnu/libgcc_s.so.1

The difference between the base address and our $pc is 0xcc30. Add the offset 0x3000 to get 0xfc30.

$ addr2line -e /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 0xfc30
??;0

Okay well thx.

$ objdump -d -S /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
...
fc30:       80 38 48                cmpb   $0x48,(%rax)
    fc33:       0f 85 57 ff ff ff       jne    fb90 <_Unwind_GetTextRelBase@@GCC_3.0+0xe40>
    fc39:       48 ba c7 c0 0f 00 00    movabs $0x50f0000000fc0c7,%rdx
    ...

Okay this wasn’t super useful. This is just some GCC unwinding utility function after a segfault. At this point, I just decide to examine the entire stack.

(gdb) x/128xg 0x7fff678eba00
...
0x7fff678eba00: 0x00007fba82086040      0x00007fbaa465000b
0x7fff678eba10: 0x000000000000002e      0x0000000000000000
0x7fff678eba20: 0x0000000000000000      0x0000000000000000
0x7fff678eba30: 0x00007fff678eced8      0xb741446eb7f0e800
0x7fff678eba40: 0x00007fff678eceb0      0x00007fff678ebb50
0x7fff678eba50: 0x00007fff678ebbf8      0x00007fff678ebb50
0x7fff678eba60: 0x00007fff678ebe00      0x323338342034342d
...

Some patterns start to emerge. All values starting with 0x7fff are pointers to things on the stack. Things in the range of 0x7fbaa.. are probably related to instructions. We can also see the junk value 0x3233 that was implicated in the segfault.

Two hours later …

My approach was to examine the stack visually, find pointers with prefixes that I knew to map to code I wore, and try and dereference them to get an idea of where my program was when it crashed.

This is doable, but it is not as straightforward as you might think. The why requires going into how ELF binaries/shared libraries are loaded in the memory.

There is a /proc/<pid>/map corresponding to info proc mappings that we saw earlier.
Each ELF file is divided into segments, which are further divided into sections. Mapping happens at the granularity of a segment.
The mapped segment will have a different offset than the on-disk segment. This may have something to do with alignment and/or ASLR requirements. But the segment sizes are also different for me, between what is reported by gdb, and what is shown by readelf/objdump.

As a result, I was unable to map symbol addresses from the core dump to symbols in libraries effectively. There is theoretically no reason why gdb should not be able to do this automatically, and it does, for more benign cases. But it does not seem to load the shared libraries for me for this particular crash.

Wait …

Okay, I ran gdb with this specific sequence, and suddenly it chose to load shared libraries.

$ gdb
(gdb) set auto-solib-add off # do not auto-load solibs
(gdb) set substitute-path /dev/shm /dev/null # something for shm maps
(gdb) set solib-search-path /path/to/lib
(gdb) file /path/to/my/binary
(gdb) target core /path/to/core-file
(gdb) info sharedlibrary
(gdb) info sharedlibrary
From                To                  Syms Read   Shared Object Library
0x00007fbaa4e98350  0x00007fbaa4eaccd1  No          /lib/libx.so 
0x00007fbaa4e21a00  0x00007fbaa4e72bc9  No          /lib/liby.so
...
(gdb) sharedlibrary /path/to/libmycode.so
Reading symbols from ...

I have no idea which of the above did the trick. Consider it a magic sequence of commands for now.

The game plan now is to go through the stack with x/64xg $pc and beyond to look for familiar addresses and try to resolve them via the symbol table. I tried a bunch of random symbols, and finally hit jackpot.

(gdb) info symbol 0x00005585a6470c80
Serialize[...] in section .text of /my/binary

It was a buffer overflow in a serialization routine.

Conclusions

The battle between you and a coy-acting core dump is a battle of wills. Do not blink.