Home

Awesome

Ghidra FidDb generator

How do I use it?

Pre-requisites:

Only tested with CentOS 7. Requires:

Everything should be already installed (even on a minimal install) except:

yum install epel-release
yum install p7zip p7zip-plugins

To generate fidb/el7-x86.LE.32.default.fidb and fidb/el7-x86:LE:64:default run:

./00-el-get-rpms.sh
./01-el-unpack-all-rpms.sh
./02-unpack-libs.sh lib/el7
./03-ghidra-import.sh lib/el7
./04-checklog.sh lib/el7
./05-ghidra-fidb.sh lib/el7

To generate additionally fidb/el6-x86.LE.32.default.fidb and fidb/el6-x86:LE:64:default you only need to run:

./02-unpack-libs.sh lib/el6
./03-ghidra-import.sh lib/el6
./04-checklog.sh lib/el6
./05-ghidra-fidb.sh lib/el6

You can manually add analysis to the lib-fidb Ghidra project. Then to regenerate the new fidb/el7-x86.LE.32.default.fidb and fidb/el7-x86:LE:64:default you run:

rm fidb/el7-*.fidb
./05-ghidra-fidb.sh lib/el7

How does this work?

How can I manually add libraries?

Add your .lib files into the lib folder as follows:

+-- lib
|   |-- provider-name
|   |   |-- library-name
|   |   |   `-- version
|   |   |       `-- variant
|   |   |           |-- lib1.a
|   |   |           `-- lib2.lib

To extract the .a and/or .lib files run ./02-unpack-libs.sh lib/provider-name.

After this the folders should be:

+-- lib
|   |-- provider-name
|   |   |-- library-name
|   |   |   `-- version
|   |   |       `-- variant
|   |   |           |-- lib1
|   |   |           |   |-- foo.o
|   |   |           |   `-- bar.o
|   |   |           `-- lib2
|   |   |           |   |-- this.obj
|   |   |           |   `-- that.obj

(You can also a .o files directly.

Then run ./03-ghidra-import.sh lib/provider-name to import this folder structure into the Ghidra project lib-fidb. After the import run ./04-checklog.sh lib/provider-name this will read the lib/provider-name-headless.log file written during 03-ghidra-import.sh and generate lib/provider-name-langids.txt from it. lib/provider-name-langids.txt is used by 05-ghidra-fidb.sh to know for which processor architectures Function ID datasets should be generated.

Add the file lib/provider-name-common.txt. This is a file with common function names, which will be excluded from the Function ID signatures. Currently, the file is simply empty, so you can simply do a touch lib/provider-name-common.txt.

Last run ./05-ghidra-fidb.sh lib/provider-name to generate fidb/provider-name-PROC.ENDIAN.SIZE.VARIANT.fidb.

Can I just download the .fidb files?

Yes: https://github.com/threatrack/ghidra-fidb-repo

How much disk space and time will this take?

As an example, look at el7.x86_64.fidb. It includes:

The object files in el/el7.x86_64 were 192MB. The resulting Ghidra project after running 02-ghidra-import.sh (which took 4h on a i5-2520M) was 16GB. Running 03-ghidra-fidb.sh (which took 15min) resulted in a 6.6MB fidb/el7.x86_64.fidb file. Using RepackFid.java the final size is 5.9M.

Stats

Here are the stats for (some) of the Function ID datasets in https://github.com/threatrack/ghidra-fidb-repo:

.fidb# .odu .o02-ghidra-import.shdu .gpr03-ghidra-fidb.shdu .fidb# Entries
el7.x86_64.fidb13036195M~ 4h~ 16GB~ 15min6.6M57966
el7.i686.fidb12600132M~ 8h~ 16GB~ 26min6.6M53823
el6.x86_64.fidb569553M~ 3h~ 8GB~ 3min2.2M16912
el6.i686.fidb570945M~ 2h~ 8GB~ 4min2.5M21612

(These are only ballpark figures, as the measurements may have been impacted by thermal throttling or concurrent tasks running on the system.)

Known issues

Program has different compiler spec than already established

In case you received an error like (when running 05-ghidra-fidb.sh):

ERROR REPORT SCRIPT ERROR:  /home/user/github/threatrack/ghidra-fid-generator/ghidra_scripts/AutoCreateMultipleLibraries.java : Program x86_64cpuid.o has different compiler spec (windows) than already established (gcc) (HeadlessAnalyzer) java.lang.IllegalArgumentException: Program x86_64cpuid.o has different compiler spec (windows) than already established (gcc)

You can fix it by going into Ghidra and in the project view right clicking (in this case x86_64cpuid.o) and change its Language to gcc (or what ever the error complains it should be).

The cause of this problem seems to be that Ghidra on import identified the compiler wrongly and then on generating the .fidb complains about it.

You can use ghidra_scripts/SearchFalseCspecsInPrograms.py to search for programs in a project that do not match a desired compiler spec. You can use ghidra_scripts/SetCspecForPrograms.py to automatically force a compiler spec for all programs under a root folder.

The AutoImporter could not successfully load...

On libsodium there was a problem with the auto importer:

2019-10-06 16:13:15 ERROR (HeadlessAnalyzer) The AutoImporter could not successfully load /home/ghidra/ghidra-fid-generator/lib/libsodium/libsodium/1.0.17/stable-msvc/libsodium/Win32/Debug/v100/ltcg/libsodium/D/a/1/s/obj/libsodium/Win32/Debug/v100/ltcg/stream_salsa2012.obj with the provided import parameters. Please ensure that any specified processor/cspec arguments are compatible with the loader that is used during import and try again.  
2019-10-06 16:13:15 ERROR (HeadlessAnalyzer) REPORT: Import failed for file: /home/ghidra/ghidra-fid-generator/lib/libsodium/libsodium/1.0.17/stable-msvc/libsodium/Win32/Debug/v100/ltcg/libsodium/D/a/1/s/obj/libsodium/Win32/Debug/v100/ltcg/stream_salsa2012.obj

However, only some files werre affected. So the files that could not be imported were ignored ... for now.

TODO