Home

Awesome

HugeCTR

Version LICENSE Documentation SOK Documentation

HugeCTR is a GPU-accelerated recommender framework designed for training and inference of large deep learning models.

Design Goals:

NOTE: If you have any questions in using HugeCTR, please file an issue or join our Slack channel to have more interactive discussions.

Table of Contents

Core Features

HugeCTR supports a variety of features, including the following:

To learn about our latest enhancements, refer to our release notes.

Getting Started

If you'd like to quickly train a model using the Python interface, do the following:

  1. Start a NGC container with your local host directory (/your/host/dir mounted) by running the following command:

    docker run --gpus=all --rm -it --cap-add SYS_NICE -v /your/host/dir:/your/container/dir -w /your/container/dir -it -u $(id -u):$(id -g) nvcr.io/nvidia/merlin/merlin-hugectr:24.06
    

    NOTE: The /your/host/dir directory is just as visible as the /your/container/dir directory. The /your/host/dir directory is also your starting directory.

    NOTE: HugeCTR uses NCCL to share data between ranks, and NCCL may requires shared memory for IPC and pinned (page-locked) system memory resources. It is recommended that you increase these resources by issuing the following options in the docker run command.

    -shm-size=1g -ulimit memlock=-1
    
  2. Write a simple Python script to generate a synthetic dataset:

    # dcn_parquet_generate.py
    import hugectr
    from hugectr.tools import DataGeneratorParams, DataGenerator
    data_generator_params = DataGeneratorParams(
      format = hugectr.DataReaderType_t.Parquet,
      label_dim = 1,
      dense_dim = 13,
      num_slot = 26,
      i64_input_key = False,
      source = "./dcn_parquet/file_list.txt",
      eval_source = "./dcn_parquet/file_list_test.txt",
      slot_size_array = [39884, 39043, 17289, 7420, 20263, 3, 7120, 1543, 39884, 39043, 17289, 7420, 
                         20263, 3, 7120, 1543, 63, 63, 39884, 39043, 17289, 7420, 20263, 3, 7120,
                         1543 ],
      dist_type = hugectr.Distribution_t.PowerLaw,
      power_law_type = hugectr.PowerLaw_t.Short)
    data_generator = DataGenerator(data_generator_params)
    data_generator.generate()
    
  3. Generate the Parquet dataset for your DCN model by running the following command:

    python dcn_parquet_generate.py
    

    NOTE: The generated dataset will reside in the folder ./dcn_parquet, which contains training and evaluation data.

  4. Write a simple Python script for training:

    # dcn_parquet_train.py
    import hugectr
    from mpi4py import MPI
    solver = hugectr.CreateSolver(max_eval_batches = 1280,
                                  batchsize_eval = 1024,
                                  batchsize = 1024,
                                  lr = 0.001,
                                  vvgpu = [[0]],
                                  repeat_dataset = True)
    reader = hugectr.DataReaderParams(data_reader_type = hugectr.DataReaderType_t.Parquet,
                                     source = ["./dcn_parquet/file_list.txt"],
                                     eval_source = "./dcn_parquet/file_list_test.txt",
                                     slot_size_array = [39884, 39043, 17289, 7420, 20263, 3, 7120, 1543, 39884, 39043, 17289, 7420, 
                                                       20263, 3, 7120, 1543, 63, 63, 39884, 39043, 17289, 7420, 20263, 3, 7120, 1543 ])
    optimizer = hugectr.CreateOptimizer(optimizer_type = hugectr.Optimizer_t.Adam,
                                        update_type = hugectr.Update_t.Global)
    model = hugectr.Model(solver, reader, optimizer)
    model.add(hugectr.Input(label_dim = 1, label_name = "label",
                            dense_dim = 13, dense_name = "dense",
                            data_reader_sparse_param_array =
                            [hugectr.DataReaderSparseParam("data1", 1, True, 26)]))
    model.add(hugectr.SparseEmbedding(embedding_type = hugectr.Embedding_t.DistributedSlotSparseEmbeddingHash,
                               workspace_size_per_gpu_in_mb = 75,
                               embedding_vec_size = 16,
                               combiner = "sum",
                               sparse_embedding_name = "sparse_embedding1",
                               bottom_name = "data1",
                               optimizer = optimizer))
    model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Reshape,
                               bottom_names = ["sparse_embedding1"],
                               top_names = ["reshape1"],
                               leading_dim=416))
    model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
                               bottom_names = ["reshape1", "dense"], top_names = ["concat1"]))
    model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.MultiCross,
                               bottom_names = ["concat1"],
                               top_names = ["multicross1"],
                               num_layers=6))
    model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                               bottom_names = ["concat1"],
                               top_names = ["fc1"],
                               num_output=1024))
    model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.ReLU,
                               bottom_names = ["fc1"],
                               top_names = ["relu1"]))
    model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Dropout,
                               bottom_names = ["relu1"],
                               top_names = ["dropout1"],
                               dropout_rate=0.5))
    model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.Concat,
                               bottom_names = ["dropout1", "multicross1"],
                               top_names = ["concat2"]))
    model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.InnerProduct,
                               bottom_names = ["concat2"],
                               top_names = ["fc2"],
                               num_output=1))
    model.add(hugectr.DenseLayer(layer_type = hugectr.Layer_t.BinaryCrossEntropyLoss,
                               bottom_names = ["fc2", "label"],
                               top_names = ["loss"]))
    model.compile()
    model.summary()
    model.graph_to_json(graph_config_file = "dcn.json")
    model.fit(max_iter = 5120, display = 200, eval_interval = 1000, snapshot = 5000, snapshot_prefix = "dcn")
    

    NOTE: Ensure that the paths to the synthetic datasets are correct with respect to this Python script. data_reader_type, check_type, label_dim, dense_dim, and data_reader_sparse_param_array should be consistent with the generated dataset.

  5. Train the model by running the following command:

    python dcn_parquet_train.py
    

    NOTE: It is presumed that the evaluation AUC value is incorrect since randomly generated datasets are being used. When the training is done, files that contain the dumped graph JSON, saved model weights, and optimizer states will be generated.

For more information, refer to the HugeCTR User Guide.

HugeCTR SDK

We're able to support external developers who can't use HugeCTR directly by exporting important HugeCTR components using:

Support and Feedback

If you encounter any issues or have questions, go to https://github.com/NVIDIA/HugeCTR/issues and submit an issue so that we can provide you with the necessary resolutions and answers. To further advance the HugeCTR Roadmap, we encourage you to share all the details regarding your recommender system pipeline using this survey.

Contributing to HugeCTR

With HugeCTR being an open source project, we welcome contributions from the general public. With your contributions, we can continue to improve HugeCTR's quality and performance. To learn how to contribute, refer to our HugeCTR Contributor Guide.

Additional Resources

Webpages
NVIDIA Merlin
NVIDIA HugeCTR

Publications

Yingcan Wei, Matthias Langer, Fan Yu, Minseok Lee, Jie Liu, Ji Shi and Zehuan Wang, "A GPU-specialized Inference Parameter Server for Large-Scale Deep Recommendation Models," Proceedings of the 16th ACM Conference on Recommender Systems, pp. 408-419, 2022.

Zehuan Wang, Yingcan Wei, Minseok Lee, Matthias Langer, Fan Yu, Jie Liu, Shijie Liu, Daniel G. Abel, Xu Guo, Jianbing Dong, Ji Shi and Kunlun Li, "Merlin HugeCTR: GPU-accelerated Recommender System Training and Inference," Proceedings of the 16th ACM Conference on Recommender Systems, pp. 534-537, 2022.

Talks

Conference / WebsiteTitleDateSpeakerLanguage
ACM RecSys 2022A GPU-specialized Inference Parameter Server for Large-Scale Deep Recommendation ModelsSeptember 2022Matthias LangerEnglish
Short Videos Episode 1Merlin HugeCTR:GPU 加速的推荐系统框架May 2022Joey Wang中文
Short Videos Episode 2HugeCTR 分级参数服务器如何加速推理May 2022Joey Wang中文
Short Videos Episode 3使用 HugeCTR SOK 加速 TensorFlow 训练May 2022Gems Guo中文
GTC Sping 2022Merlin HugeCTR: Distributed Hierarchical Inference Parameter Server Using GPU Embedding CacheMarch 2022Matthias Langer, Yingcan Wei, Yu FanEnglish
APSARA 2021GPU 推荐系统 MerlinOct 2021Joey Wang中文
GTC Spring 2021Learn how Tencent Deployed an Advertising System on the Merlin GPU Recommender FrameworkApril 2021Xiangting Kong, Joey WangEnglish
GTC Spring 2021Merlin HugeCTR: Deep Dive Into Performance OptimizationApril 2021Minseok LeeEnglish
GTC Spring 2021Integrate HugeCTR Embedding with TensorFlowApril 2021Jianbing DongEnglish
GTC China 2020MERLIN HUGECTR :深入研究性能优化Oct 2020Minseok LeeEnglish
GTC China 2020性能提升 7 倍 + 的高性能 GPU 广告推荐加速系统的落地实现Oct 2020Xiangting Kong中文
GTC China 2020使用 GPU EMBEDDING CACHE 加速 CTR 推理过程Oct 2020Fan Yu中文
GTC China 2020将 HUGECTR EMBEDDING 集成于 TENSORFLOWOct 2020Jianbing Dong中文
GTC Spring 2020HugeCTR: High-Performance Click-Through Rate Estimation TrainingMarch 2020Minseok Lee, Joey WangEnglish
GTC China 2019HUGECTR: GPU 加速的推荐系统训练Oct 2019Joey Wang中文

Blogs

Conference / WebsiteTitleDateAuthorsLanguage
Wechat BlogMerlin HugeCTR 分级参数服务器系列之三:集成到TensorFlowNov. 2022Kingsley Liu中文
NVIDIA DevblogScaling Recommendation System Inference with Merlin Hierarchical Parameter Server/使用 Merlin 分层参数服务器扩展推荐系统推理August 2022Shashank Verma, Wenwen Gao, Yingcan Wei, Matthias Langer, Jerry Shi, Fan Yu, Kingsley Liu, Minseok LeeEnglish/中文
NVIDIA DevblogMerlin HugeCTR Sparse Operation Kit 系列之二June 2022Kunlun Li中文
NVIDIA DevblogMerlin HugeCTR Sparse Operation Kit 系列之一March 2022Gems Guo, Jianbing Dong中文
Wechat BlogMerlin HugeCTR 分级参数服务器系列之二March 2022Yingcan Wei, Matthias Langer, Jerry Shi中文
Wechat BlogMerlin HugeCTR 分级参数服务器系列之一Jan. 2022Yingcan Wei, Jerry Shi中文
NVIDIA DevblogAccelerating Embedding with the HugeCTR TensorFlow Embedding PluginSept 2021Vinh Nguyen, Ann Spencer, Joey Wang and Jianbing DongEnglish
medium.comOptimizing Meituan’s Machine Learning Platform: An Interview with Jun HuangSept 2021Sheng Luo and Benedikt SchiffererEnglish
medium.comLeading Design and Development of the Advertising Recommender System at Tencent: An Interview with Xiangting KongSept 2021Xiangting Kong, Ann SpencerEnglish
NVIDIA Devblog扩展和加速大型深度学习推荐系统 – HugeCTR 系列第 1 部分June 2021Minseok Lee中文
NVIDIA Devblog使用 Merlin HugeCTR 的 Python API 训练大型深度学习推荐模型 – HugeCTR 系列第 2 部分June 2021Vinh Nguyen中文
medium.comTraining large Deep Learning Recommender Models with Merlin HugeCTR’s Python APIs — HugeCTR Series Part 2May 2021Minseok Lee, Joey Wang, Vinh Nguyen and Ashish SardanaEnglish
medium.comScaling and Accelerating large Deep Learning Recommender Systems — HugeCTR Series Part 1May 2021Minseok LeeEnglish
IRS 2020Merlin: A GPU Accelerated Recommendation FrameworkAug 2020Even Oldridge etc.English
NVIDIA DevblogIntroducing NVIDIA Merlin HugeCTR: A Training Framework Dedicated to Recommender SystemsJuly 2020Minseok Lee and Joey WangEnglish