Awesome
Table of Contents
Introduction
This project analyzes open source projects for malware.
Due to the high demand of the community, we decide to open source the code as it is now, to allow collaboration. The majority of the code is updated until May 2019, which indicates that some components may not work any more. Especially the components that depends on external tools (e.g. Sysdig, Airflow) or APIs (e.g. Npm).
We are actively working on the testing and improvements. Please find the todo list here. For how to run commands, please refer to howto section. For how to deploy on machines, please refer to deploy instructions. For how to request access to the supply chain attack samples, please refer to request instructions
This repository is open sourced under MIT license. If you find this repository helpful, please cite our paper:
@inproceedings{duan2021measuring,
title={Towards Measuring Supply Chain Attacks on Package Managers for Interpreted Languages},
author={Duan, Ruian and Alrawi, Omar and Kasturi, Ranjita Pai and Elder, Ryan and Saltaformaggio, Brendan and Lee, Wenke},
booktitle = {28th Annual Network and Distributed System Security Symposium, {NDSS}},
month = Feb,
year = {2021},
url = {https://www.ndss-symposium.org/wp-content/uploads/ndss2021_1B-1_23055_paper.pdf}
}
Prerequisite
Basics
- docker
- basic setup for ubuntu
sudo ./setup.sh
- for other OS (i.e. MacOS and Windows), please look at
setup.sh
and figure out their equivalencies
Dependencies
- To test and run the project locally, you need dependencies. There are two ways to prepare dependencies
- build the maloss docker image and test inside it
- build docker image
sudo docker build -t maloss .
- re-build docker image without cache (used when re-building image)
sudo docker build -t maloss . --no-cache
- run the docker image and map your local source root to it
sudo docker run -it --rm -v $(pwd):/code maloss /bin/bash
- change to the mapped mounted source root and start making changes
cd /code
- build docker image
- install dependencies locally and test it
- the instructions are for ubuntu 16.04. if you find them not working on other systems, please fix and commit the necessary changes. these instructions are simply copied from the Dockerfile, look into it for troubleshooting.
- for js and python static analysis development
pip install -r src/requirements.txt --user
- for the others (TODO: simplify this giant list)
sudo apt-get install -yqq curl php git ruby-full rubygems-integration nuget python python-pip python3-pip npm jq strace
sudo ./src/install_dep.sh
Development
Structure
- registries folder contains source code for mirroring package managers. To run the program, you would need 10TB for Npm, 5TB for PyPI and 5TB for RubyGems.
- src folder contains source code for static, dynamic and metadata analysis.
- main folder contains source code for dynamic orchestration.
- airflow folder contains source code for static Orchestration.
- sysdig folder contains setup and config for dynamic tracing.
- data contains honeypot setup and statistics.
- config contains config for static analysis.
- doc contains manually labeled APIs which is used to derive config.
- testdata contains test samples.
- ref contains related work.
- benignware contains some benign packages.
- malware contains the list of malicious samples, which can be used for protection.
- maloss-samples is a private repo that contains the supply chain attack samples and are updated periodically. Please fill out the Google Form to request access. We will respond ASAP.
Instructions
- In this project, we are currently using celery + rabbitmq to run our metadata and dynamic analyses in a distributed manner. we are using airflow + celery to run our static analyses.
- The src/ folder contains the code for each individual analyses and should be minimized and self-contained.
- In particular, for static/dynamic/metadata analysis, the jobs in src/ folder should be handling only one package and one versoin.
- Each individual analyses should be developed and contained in this folder.
- The main/ folder handles distributed computing for metadata and dynamic analyses.
- The master node load the list of jobs (packages and their versions to analyze), send them to the rabbitmq broker.
- The slave nodes connect to the broker and fetches jobs from broker.
- Each individual analyses may need to change .env in this folder.
- The airflow/ folder handles distributed computing for static analyses.
- The master node loads the DAG of jobs (packages connected by dependency relations), send them to the redis broker.
- The slave nodes connect to te broker and fetches jobs from broker.
- Each individual analyses may need to change .env in this folder.
- The src/ folder contains the code for each individual analyses and should be minimized and self-contained.
- In this project, we run each analysis using docker. The following steps show how to start or debug the distributed jobs for metadata and dynamic analyses.
- on worker
- create customized
main/config
frommain/config.tmpl
- build docker image
sudo docker build -t maloss .
- re-build docker image without cache (used when re-building image)
sudo docker build -t maloss . --no-cache
- for testing, run docker image and attach to it
sudo docker run -it --rm --cap-add=SYS_PTRACE -v /tmp/result:/home/maloss/result -v /tmp/metadata:/home/maloss/metadata maloss /bin/bash
- for production, refer to DEPLOY.md
- create customized
- on master
- create customized
main/config
frommain/config.tmpl
- start rabbitmq
cd main && sudo docker-compose --compatibility -f docker-compose-master.yml up -d
- add jobs to the queue
python detector.py install -i ../data/pypi.csv
- create customized
- debugging
- comment out the
QUEUING = Celery
line inmain/config
, and then the jobs should be running locally and sequentially. - the entry point for celery works is
main/celery_tasks.py
and the entry point for master itmain/detector.py
.
- comment out the
- on worker
- TODO: how to debug static analyses
HowTo
select_pm
- select the package managers to inspect based on num_pkg threshold
python main.py select_pm
select_pkg
- select popular packages based on specified criteria, such as downloads or uses
python main.py select_pkg ../data/pypi.with_stats.csv ../data/pypi.with_stats.popular.csv -n 10000
python main.py select_pkg ../data/maven.csv ../data/maven.popular.csv -n 10000 -f use_count
crawl
- crawl the specified package manager and save the package names
python main.py crawl $package_manager $outfile
- crawl the specified package manager for package names, lookup download stats, and save to file
python main.py crawl $package_manager $outfile -s -p 24
edit_dist
- run edit distance for package names
python main.py edit_dist $source -t $target $outfile
python main.py edit_dist ../data/pypi.with_stats.csv ../data/edit_dist/pypi_edist_dist.out -a c_edit_distance_batch -p 16
python main.py edit_dist ../data/pypi.with_stats.popular.csv ../data/edit_dist/pypi_pop_vs_all.out -t ../data/pypi.with_stats.csv -a c_edit_distance_batch -p 16 --pair_outfile ../data/edit_dist/pypi_pop_vs_all.csv
download
- download tarball file using pip, link
pip download --no-binary :all: --no-deps package
- download tgz file using npm, link
npm pack package
- download php packages using composer
composer require -d ../testdata/php --prefer-source --no-scripts package
- download ruby packages using gem
gem fetch package
- download java packages using maven
mvn dependency:get -Dartifact=com.google.protobuf:protobuf-java:3.5.1 -Dtransitive=false && cp ~/.m2/repository/com/google/protobuf/protobuf-java/3.5.1/protobuf-java-3.5.1.jar ./
get_versions
- run get_versions job to get major versions for list of packages
python main.py get_versions ../data/pypi.with_stats.popular.csv ../data/pypi.with_stats.popular.versions.csv -l python -c /data/maloss/info/python
python main.py get_versions ../data/maven.popular.csv ../data/maven.popular.versions.csv -c /data/maloss/info/java -l java
- run get_versions job to get all versions for list of packages
python main.py get_versions ../data/2019.07/pypi.csv ../data/2019.07/pypi.versions.csv -c /data/maloss/info-2019.07/python -l python --max_num -1
- run get_versions job to get all versions for list of packages and include their time as well
python main.py get_versions ../data/2019.07/pypi.csv ../data/2019.07/pypi.versions.csv -c /data/maloss/info-2019.07/python -l python --max_num -1 --with_time
- run get_versions job to get recent versions for list of packages
python main.py get_versions ../data/2019.07/pypi.csv ../data/2019.07/pypi.versions.csv -c /data/maloss/info-2019.07/python -l python --max_num 100 --min_gap_days 1
get_author
- run get_author job to the author for list of packages
python main.py get_author ../data/pypi.with_stats.popular.csv ../data/pypi.with_stats.with_author.popular.csv -l python -c /data/maloss/info/python
get_dep
- run get_dep job to list dependencies for python packages
python main.py get_dep -l python -n protobuf -c ../testdata
python main.py get_dep -l python -n scrapy -c ../testdata
- run get_dep job to list dependencies for javascript packages
python main.py get_dep -l javascript -n eslint -c ../testdata
- run get_dep job to list dependencies for ruby packages
python main.py get_dep -l ruby -n protobuf -c ../testdata
- run get_dep job to list dependencies for php packages
python main.py get_dep -l php -n designsecurity/progpilot -c ../testdata
- run get_dep job to list dependencies for java packages
python main.py get_dep -l java -n com.google.protobuf/protobuf-java -c ../testdata
get_stats
- get the stats for specified packages
python main.py get_stats ../malware/npmjs-mal-pkgs.june2019.txt ../malware/npmjs-mal-pkgs.june2019.with_stats.txt.new -m npmjs
- get the stats for specified packages
- `python main.py get_stats ../malware/pypi-mal-pkgs.txt ../malware/pypi-mal-pkgs.with_stats.txt -m pypi
build_dep
- build the dependency graph
python main.py build_dep -c /data/maloss/info/python -l python ../data/pypi.with_stats.csv ../airflow/data/pypi.with_stats.dep_graph.pickle
- build the dependency graph with versions (the --record_version option)
python main.py build_dep -c /data/maloss/info/python -v -l python ../data/pypi.with_stats.popular.versions.csv ../airflow/data/pypi.with_stats.popular.versions.dep_graph.pickle
build_author
- build the author package graph for popular packages in pypi/npmjs/rubygems/packagist
python main.py build_author ../data/author_pkg_graph.popular.pickle -i ../data/pypi.with_stats.with_author.popular.csv ../data/npmjs.with_stats.with_author.popular.csv ../data/rubygems.with_stats.with_author.popular.csv ../data/packagist.with_stats.with_author.popular.csv -l python javascript ruby php -t ../data/top_authors.popular.json
- build the author package graph for all packages in pypi/npmjs/rubygems/packagist/maven
python main.py build_author ../data/author_pkg_graph.pickle -i ../data/pypi.with_stats.with_author.csv ../data/npmjs.with_stats.with_author.csv ../data/rubygems.with_stats.with_author.csv ../data/packagist.with_stats.with_author.csv ../data/maven.with_author.csv -l python javascript ruby php java -t ../data/top_authors.json
split_graph
- split the dependency graph
- unzip the pickle files first
tar -zxf ../airflow/data/pypi.with_stats.dep_graph.pickle.tgz
- split into N copies
python main.py split_graph ../airflow/data/pypi.with_stats.dep_graph.pickle ../airflow/pypi_dags/ -d ../airflow/data/pypi_static.py -n 20
python main.py split_graph ../airflow/data/pypi.with_stats.popular.versions.dep_graph.pickle ../airflow/pypi_version_dags/ -d ../airflow/data/pypi_static_versions.py -n 10
python main.py split_graph ../airflow/data/maven.dep_graph.pickle ../airflow/maven_dags/ -d ../airflow/data/maven_static.py -n 20
- split into N copies and K folders
python main.py split_graph ../airflow/data/maven.popular.versions.dep_graph.pickle.tgz ../airflow/maven_version_dags/ -d ../airflow/data/maven_static_versions.py -n 80 -k 4
- split out the subgraph that contains seed nodes
python main.py split_graph ../airflow/data/pypi.with_stats.dep_graph.pickle ../airflow/pypi_dags/ -d ../airflow/data/pypi_static.py -s ../data/pypi.with_stats.popular.csv
- unzip the pickle files first
install
- run install job to install python packages and capture traces
python main.py install -n protobuf -l python -c ../testdata -o ../testdata
- run install job to install javascript packages and capture traces
python main.py install -n eslint -l javascript -c ../testdata -o ../testdata
- run install job to install ruby packages and capture traces
python main.py install -n protobuf -l ruby -c ../testdata -o ../testdata
- run install job to install php packages and capture traces
python main.py install -n designsecurity/progpilot -l php -c ../testdata -o ../testdata
- run install job to install java packages and capture traces
python main.py install -n com.google.protobuf/protobuf-java -l java -c ../testdata -o ../testdata
astgen
- run astgen job to compute ast for python and python3 packages
python main.py astgen ../testdata/test-eval-exec.py ../testdata/test-eval-exec.py.out -c ../config/test_astgen_python.config
python main.py astgen ../testdata/html5lib-1.0.1.tar.gz ../testdata/html5lib-1.0.1.tar.gz.out -c ../config/test_astgen_python.config
python main.py astgen ../testdata/python-taint-0.40.tar.gz ../testdata/python-taint-0.40.tar.gz.out -c ../config/test_astgen_python.config
- run astgen job to compute ast for javascript packages
python main.py astgen ../testdata/test-eval.js ../testdata/test-eval.js.out -c ../config/test_astgen_javascript.config -l javascript
python main.py astgen ../testdata/urlgrey-0.4.4.tgz ../testdata/urlgrey-0.4.4.tgz.out -c ../config/test_astgen_javascript.config -l javascript
- run astgen job to compute ast for php packages
cd static_proxy && php astgen.php -c ../../config/test_astgen_php.config.bin -i ../../testdata/test-eval-exec.php -o ../../testdata/test-eval-exec.php.out.bin && cd ..
python main.py astgen ../testdata/test-eval-exec.php ../testdata/test-eval-exec.php.out -c ../config/test_astgen_php.config -l php
python main.py astgen ../testdata/test-backtick.php ../testdata/test-backtick.php.out -c ../config/test_astgen_php.config -l php
python main.py astgen ../testdata/php/vendor/guzzlehttp/guzzle/ ../testdata/guzzlehttp_guzzle.out -c ../config/test_astgen_php.config -l php
- run astgen job to compute ast for ruby packages
cd static_proxy && ruby astgen.rb -c ../../config/test_astgen_ruby.config.bin -i ../../testdata/test-eval.rb -o ../../testdata/test-eval.rb.out.bin && cd ..
python main.py astgen ../testdata/test-eval.rb ../testdata/test-eval.rb.out -c ../config/test_astgen_ruby.config -l ruby
- run astgen job to compute ast for java packages
cd static_proxy/astgen-java && java -jar target/astgen-java-1.0.0-jar-with-dependencies.jar -help && cd ../../
cd static_proxy/astgen-java && java -jar target/astgen-java-1.0.0-jar-with-dependencies.jar -inpath ../../../testdata/Test.jar -outfile ../../../testdata/Test.jar.out -intype JAR -config ../../../config/astgen_java_smt.config -process_dir ../../../testdata/Test.jar && cd ../../
python main.py astgen ../testdata/protobuf-java-3.5.1.jar ../testdata/protobuf-java-3.5.1.jar.out -c ../config/test_astgen_java.config -l java
python main.py astgen ../testdata/Test.jar ../testdata/Test.jar.out -c ../config/astgen_java_smt.config -l java
astfilter
- use the configs titled
../config/astgen_XXX_smt.config
for each language (e.g.../config/astgen_javascript_smt.config
) in astfilter job - run astfilter job to evaluate api usage for python/pypi package and its dependent packages
python main.py astfilter -n protobuf -c $python_config -d ../testdata/ -o ../testdata/
- run astfilter job to evaluate api usage for javascript/npmjs package and its dependent packages
python main.py astfilter -n eslint-scope -c $javascript_config -d ../testdata/ -o ../testdata/ -l javascript
- run astfilter job to evaluate api usage for php/packagist package and its dependent packages
python main.py astfilter -n designsecurity/progpilot -c $php_config -d ../testdata/ -o ../testdata/ -l php
- run astfilter job to evaluate api usage for ruby/rubygems package and its dependent packages
python main.py astfilter -n protobuf -c $ruby_config -d ../testdata/ -o ../testdata -l ruby
- run astfilter job to evaluate api usage for java/maven package and its dependent packages
python main.py astfilter -n com.google.protobuf/protobuf-java -c $java_config -d ../testdata/ -o ../testdata -l java
taint
- run taint analysis for specific packages
python main.py taint -n json -d /data/maloss/info/ruby -o /data/maloss/result/ruby -l ruby -c ../config/astgen_ruby_smt.config
- run taint analysis for specific packages and ignore their dependencies
python main.py taint -n urllib -i ../malware/pypi-samples/urllib-1.21.1.tgz -d /data/maloss/info/python -o ./ -l python -c ../config/astgen_python_smt.config
python main.py taint -n django-server -i ../malware/pypi-samples/django-server-0.1.2.tgz -d /data/maloss/info/python -o ./ -l python -c ../config/astgen_python_smt.config
pip download --no-binary :all: --no-deps trustme && python main.py taint -n trustme -i trustme-0.5.1.tar.gz -d /data/maloss/info/python -o ./ -l python -c ../config/astgen_python_smt.config
python main.py taint -n eslint-scope -i ../malware/npmjs-samples/eslint-scope-3.7.2.tgz -d /data/maloss/info/javascript -o ./ -l javascript -c ../config/astgen_javascript_smt.config
python main.py taint -n custom8 -i static_proxy/jsprime/jsprimetests/custom8.js -d /data/maloss/info/javascript -o ./ -l javascript -c ../config/astgen_javascript_smt.config
python main.py taint -n stream-combine -i ../malware/npmjs-samples/stream-combine-2.0.2.tgz -d /data/maloss/info/javascript -o ./ -l javascript -c ../config/astgen_javascript_smt.config
python main.py taint -n test-eval-exec -i ../testdata/test-eval-exec.php -d /data/maloss/info/php -o ./ -l php -c ../config/astgen_php_smt.config
python main.py taint -n test-multiple-flows -i static_proxy/progpilot/projects/tests/tests/flows/ -d /data/maloss/info/php -o ./ -l php -c ../config/astgen_php_smt.config
python main.py taint -n test-flow -i ../testdata/test-flow.php -d /data/maloss/info/php -o ./ -l php -c ../config/astgen_php_smt.config
- run taint analysis for specific input file
python main.py taint -n active-support -l ruby -c ../config/astgen_ruby_smt.config -i ../malware/rubygems-samples/active-support-5.2.0.gem -o ./
python main.py taint -n bootstrap-sass -l ruby -c ../config/astgen_ruby_smt.config -i ../malware/rubygems-samples/bootstrap-sass-3.2.0.3.gem -o ./
python main.py taint -n brakeman-rails4 -l ruby -c ../config/astgen_ruby_smt.config -i ../testdata/rails4/ -o ./
filter_pkg
- filter packages based on the api usage or flow presence
python main.py filter_pkg ../data/pypi.with_stats.csv ../data/pypi.with_stats.with_taint_apis.csv -c ../config/astgen_python_taint_apis.config -o /data/maloss/result/python -d /data/maloss/info/python -l python
python main.py filter_pkg ../data/rubygems.with_stats.csv ../data/rubygems.with_stats.with_taint_apis.csv -c ../config/astgen_ruby_taint_apis.config -o /data/maloss/result/ruby -d /data/maloss/info/ruby -l ruby
python main.py filter_pkg ../data/npmjs.with_stats.csv ../data/npmjs.with_stats.with_taint_apis.csv -c ../config/astgen_javascript_taint_apis.config -o /data/maloss/result/javascript -d /data/maloss/info/javascript -l javascript
python main.py filter_pkg ../data/packagist.with_stats.csv ../data/packagist.with_stats.with_taint_apis.csv -c ../config/astgen_php_taint_apis.config -o /data/maloss/result/php -d /data/maloss/info/php -l php
python main.py filter_pkg ../data/maven.csv ../data/maven.with_taint_apis.csv -c ../config/astgen_java_taint_apis.config -o /data/maloss/result/java -d /data/maloss/info/java -l java
static
- run static job to perform astfilter, taint and danger analysis for python and python3 packages
python main.py static -n protobuf -c $python_config -d ../testdata/ -o ../testdata/
dynamic
- run dynamic job to install, main and exercise python packages and capture traces
python main.py dynamic -n protobuf -l python -c ../testdata -o ../testdata
interpret_trace
- run interpret trace job to parse dynamic traces and dump them into per pkg/version protobuf output files
- NOTE: sudo is needed for starting falco to parse traces
sudo python main.py interpret_trace -l python --trace_dir /data/maloss1/sysdig/pypi -c /data/maloss/info/python -o /data/maloss/result/python -p 8
compare_ast
- compare the ast of specified input files and packages for permissions, apis etc.
python main.py compare_ast -i ../malware/npmjs-samples/flatmap-stream-0.1.1.tgz ../benignware/npmjs-samples/flatmap-stream-0.1.0.tgz -o ../testdata/ ../testdata/flatmap-stream.json -l javascript -c ../config/astgen_javascript_smt.config
python main.py compare_ast -i ../testdata/test-backtick.php ../testdata/test-eval-exec.php -o tempout/ tempout/test_eval_backtick.json -l php -c ../config/astgen_php_smt.config
python main.py compare_ast -i ../malware/rubygems-samples/bootstrap-sass-3.2.0.3.gem ../benignware/rubygems-samples/bootstrap-sass-3.2.0.2.gem -l ruby -c ../config/astgen_ruby_smt.config -o ../testdata/ --outfile ../testdata/bootstrap-sass-compare.txt
python main.py compare_ast -i ../malware/rubygems-samples/active-support-5.2.0.gem ../benignware/rubygems-samples/activesupport-5.2.3.gem -c ../config/astgen_ruby_smt.config -o ../testdata/ --outfile ../testdata/activesupport-compare.txt -l ruby
filter_versions
- filter package versions based on compare_ast results, to allow further analysis such as taint analysis
python main.py filter_versions ../data/2019.07/packagist.versions.with_time.csv ../data/2019.07/packagist_ast_stats.apis.json ../data/2019.07/packagist.versions.with_time.filtered_loose_apis.csv
compare_hash
- compare the hash value of same package versions across different package managers
python main.py compare_hash -i ../data/maven.csv ../data/jcenter.csv -d /data/maloss/info/java /data/maloss/info/jcenter -o ../data/maven_jcenter.json
python main.py compare_hash -i ../data/jitpack.csv ../data/jcenter.csv -d /data/maloss/info/jitpack /data/maloss/info/jcenter -o ../data/jitpack_jcenter.json
- compare the hash value of same package versions and their content hashs or api permissions across different package managers
python main.py compare_hash -i ../data/jitpack.csv ../data/jcenter.csv -d /data/maloss/info/jitpack /data/maloss/info/jcenter -o ../data/jitpack_jcenter_filtered.json --inspect_content
python main.py compare_hash -i ../data/jitpack.csv ../data/jcenter.csv -d /data/maloss/info/jitpack /data/maloss/info/jcenter -o ../data/jitpack_jcenter_filtered.json --inspect_api -c ../config/astgen_java_smt.config
python main.py compare_hash -i ../data/jitpack.csv ../data/jcenter.csv -d /data/maloss/info/jitpack /data/maloss/info/jcenter -o ../data/jitpack_jcenter_filtered_api.json --inspect_api -c ../config/astgen_java_smt.config --compare_hash_cache ../data/jitpack_jcenter_filtered.json
interpret_result
- collect and plot api stats
python main.py interpret_result --data_type api -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.01/pypi.with_stats.csv ../data/pypi_api_stats.json
python main.py interpret_result --data_type api -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.01/pypi.with_stats.popular.csv ../data/pypi_pop_api_stats.json
python main.py interpret_result --data_type api -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.01/pypi.with_stats.csv ../data/pypi_api_mapping.json -d --detail_filename
- collect and plot domain stats
python main.py interpret_result --data_type domain -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.06/pypi.csv ../data/2019.06/pypi_domain_stats.json
python main.py interpret_result --data_type domain -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.06/pypi.csv ../data/2019.06/pypi_domain_mapping.json -d
- collect the pre-generated dependency stats
python main.py interpret_result --data_type dependency -l python ../data/pypi.with_stats.popular.csv ../data/pypi_pop_dep_stats.json
- collect the cross version comparison results, can filter by permissions, apis etc.
python main.py interpret_result --data_type compare_ast -c /data/maloss/info/python -o /data/maloss/result/python -l python ../data/2019.06/pypi.with_stats.popular.csv ../data/2019.06/pypi_compare_ast_stats.json
python main.py interpret_result --data_type compare_ast -c /data/maloss/info-2019.07/javascript -o /data/maloss/result-2019.07/javascript -l javascript ../data/2019.07/npmjs.csv ../data/2019.07/npmjs_ast_stats.json --compare_ast_options_file ../data/2019.07/compare_ast_options.json
- collect metadata/static/dynamic results and dump suspicious packages
python main.py interpret_result --data_type install_with_network -c /data/maloss/info/javascript -o /data/maloss/result/javascript -l javascript -m npmjs ../data/2019.06/npmjs.csv ../data/2019.06/npmjs.install_with_network.json
- collect the reverse dependency results
python main.py interpret_result --data_type reverse_dep -l javascript -m npmjs ../airflow/data/high_impact.csv ../airflow/data/high_impact_npmjs.json
python main.py interpret_result --data_type reverse_dep -l python -m pypi ../airflow/data/high_impact.csv ../airflow/data/high_impact_pypi.json
python main.py interpret_result --data_type reverse_dep -l ruby -m rubygems ../airflow/data/high_impact.csv ../airflow/data/high_impact_rubygems.json
- collect metadata/static/compare_ast results and dump suspicious packages
python main.py interpret_result --data_type correlate_info_api_compare_ast -c /data/maloss/info-2019.07/javascript -o /data/maloss/result-2019.07/javascript -l javascript -m npmjs -s ../data/2019.07/npmjs_skip_list.json ../data/2019.07/npmjs_ast_stats.json ../data/2019.07/npmjs_correlate_info_api_compare_ast.json
python main.py interpret_result --data_type correlate_info_api_compare_ast -c /data/maloss/info-2019.07/php -o /data/maloss/result-2019.07/php -l php -m packagist -s ../data/2019.07/packagist_skip_list.json ../data/2019.07/packagist_ast_stats.json ../data/2019.07/packagist_correlate_info_api_compare_ast.json
python main.py interpret_result --data_type taint -c /data/maloss/info-2019.07/php -o /data/maloss/result-2019.07/php -l php ../data/2019.07/packagist.csv ../data/2019.07/packagist_flow_stats.json
grep_pkg
- grep through packages
python main.py grep_pkg ../data/2019.07/rubygems.csv ../data/2019.07/rubygems.csv.pastebin.com pastebin.com -l ruby -p 80
python main.py grep_pkg ../data/2019.07/npmjs.csv ../data/2019.07/npmjs.csv.pastebin.com pastebin.com -l javascript -p 20
speedup
- measure the speedup benefits from summaries
python main.py speedup ../data/2019.01/pypi.with_stats.popular.csv speedup.log -l python
Tool
Internet-wide scanning
Statistics for different package managers
- PyPi stats
- PyPi stats of packages
- NpmJS stats
- RubyGems stats
- Nuget stats
- Packagist stats
- Maven stats (used by other packages)
Static analysis tools for different languages
- List summary
- 13 tools for checking the security risk of open-source dependencies
- Awesome Malware Analysis: a curated list of awesome malware analysis tools and resources
- A curated list of linters, code quality checkers, and other static analysis tools for various programming languages
- PMD: An extensible cross-language static code analyzer. https://pmd.github.io
- Python
- Php
- Ruby
- Check for Ruby security problems
- Locking Ruby in the Safe
- A static analysis security vulnerability scanner for Ruby on Rails applications, github
- Dawn is a static analysis security scanner for ruby written web applications. It supports Sinatra, Padrino and Ruby on Rails frameworks.
- Quality is a tool that runs quality checks on your code using community tools
- NpmJS
- 6 Tools to Scan Node.js Application for Security Vulnerability
- node security platform command-line tool https://nodesecurity.io
- a javascript static security analysis tool
- JSHint is a tool that helps to detect errors and potential problems in your JavaScript code
- A First Look at Firefox OS Security
- WALA is slow
- JSPrime is also capable of performing dataflow analysis, but the architecture is extremely difficult to extend.
- ScanJS, written in-house by Mozilla, is closest in spirit to our own.
- NodeJsScan is a static security code scanner for Node.js applications.
- FLOW IS A STATIC TYPE CHECKER FOR JAVASCRIPT
- JSFlow is a security-enhanced JavaScript interpreter for fine-grained tracking of information flow.
- A tool for studying JavaScript malware.
- A Javascript malware analysis tool
- Scalable Analysis Framework for ECMAScript
- DEPRECATED: Static analysis tool for javascript code.
- Analyzing JavaScript and the Web with WALA
- Jsunpack: jsunpack-n emulates browser functionality when visiting a URL
- Collection of almost 40.000 javascript malware samples
- JSAI: a static analysis platform for JavaScript
- Static analysis of event-driven Node.js JavaScript applications
- Dynamic analysis framework for JavaScript
- Java
- CSharp
- Dependency management tools
- Dynamic analysis
- Analysis framework
AST parsers for different languages
- Python AST parser, use ast.parse
- JavaScript AST parser, use Esprima
- Estree: JavaScript Parser api specifications
- Answers refer to SpiderMonkey and Esprima
- SpiderMonkey
- Esprima, Esprima comparison
- Acorn, Acorn vs Esprima
- Babel compiler, based on acorn
- Python port of Esprima
- How npm handles the "scripts" field
- Node.js API specification
- Javascript Standard objects by category
- Ruby AST parser
- Java AST parser
- C# AST parser
- Php AST parser
- C/C++ AST parser
Resource
- Taobao mirror of NPM,
- Stanford mirror of pypi
- Mirrors of registries in China
- Keeping The npm Registry Awesome: How npmjs works?
- Query npmjs registry via api
- NPM search with history versions
- numeric precision matters: how npm download counts work
- npmjs api documents
- Synk's CLI help you find and fix known vulnerabilities in your dependencies, both ad hoc and as part of your CI system
- Using the European npm mirror
- What I learned from analysing 1.65M versions of Node.js modules in NPM
- Archive.org snapshots websites and can be used for measuring victim websites
- Event Tracing for Windows (ETW)
- Linux Audit
- Strace
- Strace Analyzer
- python-ptrace is a Python binding of ptrace library
- pystrace -- Python tools for parsing and analysing strace output files
- analyzes strace output
- Profiling and visualizing with GNU strace
- Structured output for strace
- like strace, but for ruby code
- pytrace is a fast python tracer. it records function calls, arguments and return values.
- php-strace helps to track down segfaults in running php processes
- How to set strace output characters string width to be longer?
-s
option specifies the maximum string size to print-v
option print unabbreviated argv, stat, termios, etc. args
- Google Summer of Code for Strace output
- DTrace for linux
- osquery: Process and socket auditing with osquery
- Information collected by osquery (tables)
- A curated list of tools and resources for security incident response, aimed to help security analysts and DFIR teams.
- 1st: How are teams currently using osquery?
- 2nd: What are the current pain points of osquery?
- 3rd: What do you wish osquery could do?
- Kolide Cloud is an endpoint monitoring solution which leverages and instruments Facebook’s open-source osquery project. Try it today; completely free for your first 10 devices.
- Kolide fleet for monitoring osquery machines
- Docker support in OSQuery
- Dockerfiles for containerized osquery
- Uptycs: Securing Containers: Using osquery to Solve New Challenges Posed by Hosted Orchestration Services
- Uptycs: Docker and osquery
- osquery For Security: Introduction to osquery — Part 1
- osquery for Security — Part 2
- osquery—Windows, macOS, Linux Monitoring and Intrusion Detection
- Docker and osquery
- Intro to Osquery: Frequently Asked Questions for Beginners
- osquery configuration from palantir
- sysdig: Linux system exploration and troubleshooting tool with first class support for containers
- SELinux, Seccomp, Sysdig Falco, and you: A technical discussion
- Prometheus Monitoring and Sysdig Monitor: A Technical Comparison
- Day 3 - So Server, tell me about yourself. An introduction to facter, osquery and sysdig
Whereas Facter and osquery are predominantly about querying infrequently changing information, Sysdig is much more suited to working with real-time data streams – for example, network or file I/O, or tracking errors in running processes.
- Container Monitoring: Prometheus and Grafana Vs. Sysdig and Sysdig Monitor
- Container monitoring with Sysdig
- Sysdig user guide
- Sysdig falco
- Sysdig falco rules
- Detecting Cryptojacking with Sysdig’s Falco
- Sysdig + logstash + elasticsearch
- Sysdig + ELK (potential)
- Sending Kubernetes & Docker events to Elasticsearch and Splunk using Sysdig
- Runtime Container Security – How to Implement Open Source Container Security
- WTF my container just spawned a shell
- go-auditd
- The Prometheus monitoring system and time series database.
- kubernetes: Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications.
- facter: Collect and display system facts
- Find exploitable PHP files by parameter fuzzing and function call tracing
- An OS X analyzer for Cuckoo Sandbox project
- Cuckoo Sandbox is the leading open source automated malware analysis system
- Native libraries with Maven
- Maven: Bundling and Unpacking Native Libraries
- Native ARchive plugin for Maven
- Elastic file system (EFS) mount outside of AWS
- Amazon EFS Update – On-Premises Access via Direct Connect
- The Go Programming Language
- Ruby (finally) gains in popularity, but Go plateaus
- Top languages: Java, C, C++, Python, C#, PHP, JavaScript, Ruby
- Static analysis references
- IBM appscan allows scanning source/compiled code for vulnerabilities
- Restricted execution
- Sandboxed Python
- Ruby sandboxing vs. integrating a scripting language
- Jailed — flexible JS sandbox
- Is It Possible to Sandbox JavaScript Running In the Browser?
- Is there a way to execute php code in a sandbox from within php
- Runkit_Sandbox
- Sandboxing Java Code
- Execute a method in Java with restricted permissions
- AWS Batch Jobs
- Airflow Dag
- Security advisories
Reference
- module counts
- typo-squatting website
- typo-squatting thesis
- debian popcorn
- pypi packages found to be malicious
- Python Typo Squatting
- PHP Typo Squatting
- JCenter Typo Squatting
- Rubygems typosquatting
- HUNTING MALICIOUS NPM PACKAGES
- Malicious npm packages
- list of all pypi packages
- Crossenv malware on the npm registry
- Open source packages with malicious intent
- HOW TO TAKE OVER THE COMPUTER OF ANY JAVA (OR CLOJURE OR SCALA) DEVELOPER
- Security Corner with Snyk: Top Six Vulnerabilities in Maven and npm
- Another Linux distro poisoned with malware
- NodeJS: Remote Code Execution as a Service
- 17 Backdoored Docker Images Removed From Docker Hub
- Backdoored Python Library Caught Stealing SSH Credentials
- eslint-scope is the ECMAScript scope analyzer used in ESLint. Version 3.7.2 was identified as malicious after a possible npm account takeover. Installing the malicious package would lead to leaking the user's npm token.
- npm Acquires ^Lift Security and the Node Security Platform
- analyze pip ssh-decorate supply-chain attack
- Malicious .jar Files Hosted On Google Code
- Dissection of a Java Malware (JRAT)
- The packages potentially affected by eslint-scope
- Malicious Modules — what you need to know when installing npm packages
- Twelve malicious Python libraries found and removed from PyPI
- Malware packages on PyPI
- Plot to steal cryptocurrency foiled by the npm security team
- Vulnerability Discovered In Komodo’s Agama Wallet – This Is What You Need To Do
- PyPI malware packages
- Report projects that damage other packages, don't adhere to guidelines, or are malicious
- Collection of Php backdoors
- Collection of windows malware
- Snakes in the grass! Malicious code slithers into Python PyPI repository
- Cryptojacking invades cloud. How modern containerization trend is exploited by attackers
- This is the list of all packages found by @malicious-packages/core and removed from repository by npm team
- First Top 10 Risks for Applications Built on Serverless Architectures Research by PureSec Released
- Exploiting Developer Infrastructure Is Ridiculously Easy
- Javascript static + dynamic analysis
- Php backdoor obfuscation techniques
- Php obfuscation techniques
- Understanding Obfuscated Code & How to Deobfuscate PHP and JavaScript
- Joomla Plugin Constructor Backdoor
- A confusing dependency
- Exposed Docker Control API and Community Image Abused to Deliver Cryptocurrency-Mining Malware
- Malicious remote code execution backdoor discovered in the popular bootstrap-sass Ruby gem
- Backdoor in Captcha Plugin Affects 300K WordPress Sites
- Backdoor found in Webmin, a popular web-based utility for managing Unix servers
- POLA Would Have Prevented the Event-Stream Incident
- Cryptojacking Criminals Are Using Multiple Techniques to Install Coinminers
- Google Analytics and Angular in Magento Credit Card Stealing Scripts
- PSA: There is a fake version of this package on PyPI with malicious code
- Typosquatting barrage on RubyGems software repository users
- PyPI 官方仓库遭遇request恶意包投毒
- SourMint: malicious code, ad fraud, and data leak in iOS
- Dependency Hijacking Software Supply Chain Attack Hits More Than 35 Organizations
- Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies
- for pip,
--extra-index-url
for internal/external packages will choose the version with higher version number - for gem,
gem install --source
- index-url extra-index-url install priority order
- index-url extra-index-url install priority order - contd
- pywheels for Raspberry Pi
- for pip,
- Package name squatting: cupy-cuda112