Home

Awesome

DPU Utilities PyPI - Python VersionAnaconda

Build Status

This contains a set of utilities used across projects of the DPU team.

Python

Stored in the python subdirectory, published as the dpu-utils package.

Installation

pip install dpu-utils

OR via the community-maintained Conda recipe:

conda install -c conda-forge dpu-utils

Overview

Below you can find an overview of the utilities included. Detailed documentation is provided at the docstring of each class.

Generic Utilities dpu_utils.utils
General Machine Learning Utilities dpu_utils.mlutils
Code-related Utilities dpu_utils.codeutils
TensorFlow 1.x Utilities dpu_utils.tfutils

Unsorted segment operations following TensorFlow's unsorted_segment_sum operations:

TensorFlow 2.x Utilities dpu_utils.tf2utils

Unsorted segment operations following TensorFlow's unsorted_segment_sum operations:

TensorFlow Models dpu_utils.tfmodels

These models have not been tested with TF 2.0.

PyTorch Utilities dpu_utils.ptutils

Command-line tools

Approximate Duplicate Code Detection

You can use the deduplicationcli command to detect duplicates in pre-processed source code, by invoking

deduplicationcli DATA_PATH OUT_JSON

where DATA_PATH is a file containing tokenized .jsonl.gz files and OUT_JSON is the target output file. For more options look at --help.

An exact (but usually slower) version of this can be found here along with code to tokenize Java, C#, Python and JavaScript into the relevant formats.

Tests

Run the unit tests

python setup.py test

Generate code coverage reports

# pip install coverage
coverage run --source dpu_utils/ setup.py test && \
  coverage html

The resulting HTML file will be in htmlcov/index.html.

.NET

Stored in the dotnet subdirectory.

Generic Utilities:

Code-related Utilities:

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.