Home

Awesome

<!-- markdownlint-disable first-line-h1 --> <!-- markdownlint-disable html --> <!-- markdownlint-disable no-duplicate-header --> <div align="center"> <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V2" /> </div> <hr> <div align="center" style="line-height: 1;"> <a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;"> <img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;"> <img alt="Chat" src="https://img.shields.io/badge/šŸ¤–%20Chat-DeepSeek%20V2-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;"> <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> </div> <div align="center" style="line-height: 1;"> <a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;"> <img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;"> <img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;"> <img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/> </a> </div> <div align="center" style="line-height: 1;"> <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-CODE" style="margin: 2px;"> <img alt="Code License" src="https://img.shields.io/badge/Code_License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/> </a> <a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/LICENSE-MODEL" style="margin: 2px;"> <img alt="Model License" src="https://img.shields.io/badge/Model_License-Model_Agreement-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/> </a> </div> <p align="center"> <a href="#2-model-downloads">Model Download</a> | <a href="#3-evaluation-results">Evaluation Results</a> | <a href="#5-api-platform">API Platform</a> | <a href="#6-how-to-run-locally">How to Use</a> | <a href="#7-license">License</a> | <a href="#8-citation">Citation</a> </p> <p align="center"> <a href="https://arxiv.org/pdf/2406.11931"><b>Paper Link</b>šŸ‘ļø</a> </p>

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

1. Introduction

We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathematical reasoning capabilities of DeepSeek-V2, while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks, as well as reasoning and general capabilities. Additionally, DeepSeek-Coder-V2 expands its support for programming languages from 86 to 338, while extending the context length from 16K to 128K.

<p align="center"> <img width="100%" src="figures/performance.png"> </p>

In standard benchmark evaluations, DeepSeek-Coder-V2 achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The list of supported programming languages can be found here.

2. Model Downloads

We release the DeepSeek-Coder-V2 with 16B and 236B parameters based on the DeepSeekMoE framework, which has actived parameters of only 2.4B and 21B , including base and instruct models, to the public.

<div align="center">
Model#Total Params#Active ParamsContext LengthDownload
DeepSeek-Coder-V2-Lite-Base16B2.4B128kšŸ¤— HuggingFace
DeepSeek-Coder-V2-Lite-Instruct16B2.4B128kšŸ¤— HuggingFace
DeepSeek-Coder-V2-Base236B21B128kšŸ¤— HuggingFace
DeepSeek-Coder-V2-Instruct236B21B128kšŸ¤— HuggingFace
</div>

3. Evaluation Results

3.1 Code Generation

#TP#APHumanEvalMBPP+LiveCodeBenchUSACO
Closed-Source Models
Gemini-1.5-Pro--83.574.634.14.9
Claude-3-Opus--84.272.034.67.8
GPT-4-Turbo-1106--87.869.337.111.1
GPT-4-Turbo-0409--88.272.245.712.3
GPT-4o-0513--91.073.543.418.8
Open-Source Models
CodeStral22B22B78.168.231.04.6
DeepSeek-Coder-Instruct33B33B79.370.122.54.2
Llama3-Instruct70B70B81.168.828.73.3
DeepSeek-Coder-V2-Lite-Instruct16B2.4B81.168.824.36.5
DeepSeek-Coder-V2-Instruct236B21B90.276.243.412.1

3.2 Code Completion

Model#TP#APRepoBench (Python)RepoBench (Java)HumanEval FIM
CodeStral22B22B46.145.783.0
DeepSeek-Coder-Base7B7B36.243.386.1
DeepSeek-Coder-Base33B33B39.144.886.4
DeepSeek-Coder-V2-Lite-Base16B2.4B38.943.386.4

3.3 Code Fixing

#TP#APDefects4JSWE-BenchAider
Closed-Source Models
Gemini-1.5-Pro--18.619.357.1
Claude-3-Opus--25.511.768.4
GPT-4-Turbo-1106--22.822.765.4
GPT-4-Turbo-0409--24.318.363.9
GPT-4o-0513--26.126.772.9
Open-Source Models
CodeStral22B22B17.82.751.1
DeepSeek-Coder-Instruct33B33B11.30.054.5
Llama3-Instruct70B70B16.2-49.2
DeepSeek-Coder-V2-Lite-Instruct16B2.4B9.20.044.4
DeepSeek-Coder-V2-Instruct236B21B21.012.773.7

3.4 Mathematical Reasoning

#TP#APGSM8KMATHAIME 2024Math Odyssey
Closed-Source Models
Gemini-1.5-Pro--90.867.72/3045.0
Claude-3-Opus--95.060.12/3040.6
GPT-4-Turbo-1106--91.464.31/3049.1
GPT-4-Turbo-0409--93.773.43/3046.8
GPT-4o-0513--95.876.62/3053.2
Open-Source Models
Llama3-Instruct70B70B93.050.41/3027.9
DeepSeek-Coder-V2-Lite-Instruct16B2.4B86.461.80/3044.4
DeepSeek-Coder-V2-Instruct236B21B94.975.74/3053.7

3.5 General Natural Language

BenchmarkDomainDeepSeek-V2-Lite ChatDeepSeek-Coder-V2-Lite InstructDeepSeek-V2 ChatDeepSeek-Coder-V2 Instruct
BBHEnglish48.161.279.783.9
MMLUEnglish55.760.178.179.2
ARC-EasyEnglish86.188.998.197.4
ARC-ChallengeEnglish73.477.492.392.8
TriviaQAEnglish65.259.586.782.3
NaturalQuestionsEnglish35.530.853.447.5
AGIEvalEnglish42.828.761.460
CLUEWSCChinese80.076.589.985.9
C-EvalChinese60.161.678.079.4
CMMLUChinese62.562.781.680.9
Arena-Hard-11.438.141.665.0
AlpaceEval 2.0-16.917.738.936.9
MT-Bench-7.377.818.978.77
Alignbench-6.026.837.917.84

3.6 Context Window

<p align="center"> <img width="80%" src="figures/long_context.png"> </p>

Evaluation results on the Needle In A Haystack (NIAH) tests. DeepSeek-Coder-V2 performs well across all context window lengths up to 128K.

4. Chat Website

You can chat with the DeepSeek-Coder-V2 on DeepSeek's official website: coder.deepseek.com

5. API Platform

We also provide OpenAI-Compatible API at DeepSeek Platform: platform.deepseek.com, and you can also pay-as-you-go at an unbeatable price.

<p align="center"> <img width="40%" src="figures/model_price.jpg"> </p>

6. How to run locally

Here, we provide some examples of how to use DeepSeek-Coder-V2-Lite model. If you want to utilize DeepSeek-Coder-V2 in BF16 format for inference, 80GB*8 GPUs are required.

Inference with Huggingface's Transformers

You can directly employ Huggingface's Transformers for model inference.

Code Completion

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
input_text = "#write a quick sort algorithm"
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Code Insertion

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Base", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
input_text = """<ļ½œfimā–beginļ½œ>def quick_sort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[0]
    left = []
    right = []
<ļ½œfimā–holeļ½œ>
        if arr[i] < pivot:
            left.append(arr[i])
        else:
            right.append(arr[i])
    return quick_sort(left) + [pivot] + quick_sort(right)<ļ½œfimā–endļ½œ>"""
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True)[len(input_text):])

Chat Completion

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct", trust_remote_code=True, torch_dtype=torch.bfloat16).cuda()
messages=[
    { 'role': 'user', 'content': "write a quick sort algorithm in python."}
]
inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
# tokenizer.eos_token_id is the id of <ļ½œendā–ofā–sentenceļ½œ> token
outputs = model.generate(inputs, max_new_tokens=512, do_sample=False, top_k=50, top_p=0.95, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))

The complete chat template can be found within tokenizer_config.json located in the huggingface model repository.

An example of chat template is as belows:

<ļ½œbeginā–ofā–sentenceļ½œ>User: {user_message_1}

Assistant: {assistant_message_1}<ļ½œendā–ofā–sentenceļ½œ>User: {user_message_2}

Assistant:

You can also add an optional system message:

<ļ½œbeginā–ofā–sentenceļ½œ>{system_message}

User: {user_message_1}

Assistant: {assistant_message_1}<ļ½œendā–ofā–sentenceļ½œ>User: {user_message_2}

Assistant:

In the last round of dialogue, note that "Assistant:" has no space after the colon. Adding a space might cause the following issues on the 16B-Lite model:

Older versions of Ollama had this bug (see https://github.com/deepseek-ai/DeepSeek-Coder-V2/issues/12), but it has been fixed in the latest version.

Inference with vLLM (recommended)

To utilize vLLM for model inference, please merge this Pull Request into your vLLM codebase: https://github.com/vllm-project/vllm/pull/4650.

from transformers import AutoTokenizer
from vllm import LLM, SamplingParams

max_model_len, tp_size = 8192, 1
model_name = "deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
llm = LLM(model=model_name, tensor_parallel_size=tp_size, max_model_len=max_model_len, trust_remote_code=True, enforce_eager=True)
sampling_params = SamplingParams(temperature=0.3, max_tokens=256, stop_token_ids=[tokenizer.eos_token_id])

messages_list = [
    [{"role": "user", "content": "Who are you?"}],
    [{"role": "user", "content": "write a quick sort algorithm in python."}],
    [{"role": "user", "content": "Write a piece of quicksort code in C++."}],
]

prompt_token_ids = [tokenizer.apply_chat_template(messages, add_generation_prompt=True) for messages in messages_list]

outputs = llm.generate(prompt_token_ids=prompt_token_ids, sampling_params=sampling_params)

generated_text = [output.outputs[0].text for output in outputs]
print(generated_text)

7. License

This code repository is licensed under the MIT License. The use of DeepSeek-Coder-V2 Base/Instruct models is subject to the Model License. DeepSeek-Coder-V2 series (including Base and Instruct) supports commercial use.

8. Citation

@article{zhu2024deepseek,
  title={DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence},
  author={Zhu, Qihao and Guo, Daya and Shao, Zhihong and Yang, Dejian and Wang, Peiyi and Xu, Runxin and Wu, Y and Li, Yukun and Gao, Huazuo and Ma, Shirong and others},
  journal={arXiv preprint arXiv:2406.11931},
  year={2024}
}

9. Contact

If you have any questions, please raise an issue or contact us at service@deepseek.com.