Awesome
Ghidra FidDb generator
How do I use it?
Pre-requisites:
- Set
GHIDRA_HOME
env var to your Ghidra installation, e.g. viaexport GHIDRA_HOME=/home/user/ghidra/ghidra_9.0.4
- Set
GHIDRA_PROJ
env var to your Ghidra project directory, e.g. viaexport GHIDRA_PROJ=/home/user/ghidra_projects
- Must use
ghidra-9.1-DEV
or later due to a bug in X86_64 relocation handling (https://github.com/NationalSecurityAgency/ghidra/pull/910)
Only tested with CentOS 7. Requires:
- wget
- grep
- sed
- sort
- gzip
- 7z
- find
- rpm2cpio
- cpio
- unzip
- tar
- ar
- tee
- (maybe others; please open an issue if you have problems)
Everything should be already installed (even on a minimal install) except:
yum install epel-release
yum install p7zip p7zip-plugins
To generate fidb/el7-x86.LE.32.default.fidb
and fidb/el7-x86:LE:64:default
run:
./00-el-get-rpms.sh
./01-el-unpack-all-rpms.sh
./02-unpack-libs.sh lib/el7
./03-ghidra-import.sh lib/el7
./04-checklog.sh lib/el7
./05-ghidra-fidb.sh lib/el7
To generate additionally fidb/el6-x86.LE.32.default.fidb
and fidb/el6-x86:LE:64:default
you only need to run:
./02-unpack-libs.sh lib/el6
./03-ghidra-import.sh lib/el6
./04-checklog.sh lib/el6
./05-ghidra-fidb.sh lib/el6
You can manually add analysis to the lib-fidb
Ghidra project. Then to regenerate
the new fidb/el7-x86.LE.32.default.fidb
and fidb/el7-x86:LE:64:default
you run:
rm fidb/el7-*.fidb
./05-ghidra-fidb.sh lib/el7
How does this work?
00-el-get-rpms.sh
: Downloads RPMs fromhttp://mirror.centos.org/centos/
into folderrpms
01-el-unpack-all-rpms.sh
: Unpacks all the RPMs fromrpms
tolib/el{6,7}.{i686,x86_64}/libname/version/release/*.o
(calls01-unpack-rpm.sh
)02-unpack-libs.sh <library>: Unpack
.libfiles to
.o` files.03-ghidra-import.sh <library>
: Import (and analyze) from folder<library>
into Ghidra projectlib-fidb
.04-checklog.sh <library>
: Check the analysis log and genratelib/library-langids.txt
. Generatinglibrary-langids.txt
is important!05-ghidra-fidb.sh <library>
: Generates.fidb
files (one for each Language ID inlib/library-langids.txt
) intofidb/
with signatures for the libraries in<library>
folder
How can I manually add libraries?
Add your .lib
files into the lib
folder as follows:
+-- lib
| |-- provider-name
| | |-- library-name
| | | `-- version
| | | `-- variant
| | | |-- lib1.a
| | | `-- lib2.lib
provider-name
: The name of the provider of the libraries. This will also be the filename of the generated.fidb
files.library-name
: The name of the library.version
: Version.variant
: Variant or release string.
To extract the .a
and/or .lib
files run ./02-unpack-libs.sh lib/provider-name
.
After this the folders should be:
+-- lib
| |-- provider-name
| | |-- library-name
| | | `-- version
| | | `-- variant
| | | |-- lib1
| | | | |-- foo.o
| | | | `-- bar.o
| | | `-- lib2
| | | | |-- this.obj
| | | | `-- that.obj
(You can also a .o
files directly.
Then run ./03-ghidra-import.sh lib/provider-name
to import this folder structure into the Ghidra project lib-fidb
.
After the import run ./04-checklog.sh lib/provider-name
this will read the lib/provider-name-headless.log
file written during 03-ghidra-import.sh
and generate lib/provider-name-langids.txt
from it. lib/provider-name-langids.txt
is used by 05-ghidra-fidb.sh
to know for which processor architectures
Function ID datasets should be generated.
Add the file lib/provider-name-common.txt
. This is a file with common function names, which will be excluded from the Function ID signatures. Currently, the file is simply empty, so you can simply do a touch lib/provider-name-common.txt
.
Last run ./05-ghidra-fidb.sh lib/provider-name
to generate fidb/provider-name-PROC.ENDIAN.SIZE.VARIANT.fidb
.
Can I just download the .fidb files?
Yes: https://github.com/threatrack/ghidra-fidb-repo
How much disk space and time will this take?
As an example, look at el7.x86_64.fidb. It includes:
boost-static/1.53.0/27.el7.x86_64
glibc-static/2.17/260.el7_6.3.x86_64
glibc-static/2.17/260.el7_6.6.x86_64
glibc-static/2.17/260.el7.x86_64
glibc-static/2.17/292.el7.x86_64
libgo-static/4.8.5/36.el7_6.1.x86_64
libgo-static/4.8.5/36.el7.x86_64
libstdc++-static/4.8.5/36.el7.x86_64
lua-static/5.1.4/15.el7.x86_64
openssl-static/1.0.2k/16.el7_6.1.x86_64
openssl-static/1.0.2k/16.el7.x86_64
openssl-static/1.0.2k/19.el7.x86_64
protobuf-lite-static/2.5.0/8.el7.x86_64
protobuf-static/2.5.0/8.el7.x86_64
zlib-static/1.2.7/18.el7.x86_64
The object files in el/el7.x86_64
were 192MB.
The resulting Ghidra project after running 02-ghidra-import.sh
(which took 4h on a i5-2520M) was 16GB.
Running 03-ghidra-fidb.sh
(which took 15min) resulted in a 6.6MB fidb/el7.x86_64.fidb
file.
Using RepackFid.java
the final size is 5.9M.
Stats
Here are the stats for (some) of the Function ID datasets in https://github.com/threatrack/ghidra-fidb-repo:
.fidb | # .o | du .o | 02-ghidra-import.sh | du .gpr | 03-ghidra-fidb.sh | du .fidb | # Entries |
---|---|---|---|---|---|---|---|
el7.x86_64.fidb | 13036 | 195M | ~ 4h | ~ 16GB | ~ 15min | 6.6M | 57966 |
el7.i686.fidb | 12600 | 132M | ~ 8h | ~ 16GB | ~ 26min | 6.6M | 53823 |
el6.x86_64.fidb | 5695 | 53M | ~ 3h | ~ 8GB | ~ 3min | 2.2M | 16912 |
el6.i686.fidb | 5709 | 45M | ~ 2h | ~ 8GB | ~ 4min | 2.5M | 21612 |
(These are only ballpark figures, as the measurements may have been impacted by thermal throttling or concurrent tasks running on the system.)
Known issues
Program has different compiler spec than already established
In case you received an error like (when running 05-ghidra-fidb.sh
):
ERROR REPORT SCRIPT ERROR: /home/user/github/threatrack/ghidra-fid-generator/ghidra_scripts/AutoCreateMultipleLibraries.java : Program x86_64cpuid.o has different compiler spec (windows) than already established (gcc) (HeadlessAnalyzer) java.lang.IllegalArgumentException: Program x86_64cpuid.o has different compiler spec (windows) than already established (gcc)
You can fix it by going into Ghidra and in the project view right clicking (in this case x86_64cpuid.o
) and change its Language
to gcc
(or what ever the error complains it should be).
The cause of this problem seems to be that Ghidra on import identified the compiler wrongly and then on generating the .fidb complains about it.
You can use ghidra_scripts/SearchFalseCspecsInPrograms.py
to search for programs in a project that do not match a desired compiler spec.
You can use ghidra_scripts/SetCspecForPrograms.py
to automatically force a compiler spec for all programs under a root folder.
The AutoImporter could not successfully load...
On libsodium there was a problem with the auto importer:
2019-10-06 16:13:15 ERROR (HeadlessAnalyzer) The AutoImporter could not successfully load /home/ghidra/ghidra-fid-generator/lib/libsodium/libsodium/1.0.17/stable-msvc/libsodium/Win32/Debug/v100/ltcg/libsodium/D/a/1/s/obj/libsodium/Win32/Debug/v100/ltcg/stream_salsa2012.obj with the provided import parameters. Please ensure that any specified processor/cspec arguments are compatible with the loader that is used during import and try again.
2019-10-06 16:13:15 ERROR (HeadlessAnalyzer) REPORT: Import failed for file: /home/ghidra/ghidra-fid-generator/lib/libsodium/libsodium/1.0.17/stable-msvc/libsodium/Win32/Debug/v100/ltcg/libsodium/D/a/1/s/obj/libsodium/Win32/Debug/v100/ltcg/stream_salsa2012.obj
However, only some files werre affected. So the files that could not be imported were ignored ... for now.
TODO
- De-duplicate .o files. Going from one minor version to the next some .o files in a package don't change at all. Analyzing the same file multiple times wastes time.
- Re-do
el{6,7}
with new system. - FIXME: libsodium exhibited
AutoImporter could not successfully load...
error. Needs to be figured out and fixed. - Add
ghidra_scripts/MergeFidb.py
to merge multiple Function ID datasets.