Awesome
databricks-ml-examples
databricks/databricks-ml-examples
is a repository to show machine learning examples on Databricks platforms.
Currently this repository contains:
llm-models/
: Example notebooks to use different State of the art (SOTA) models on Databricks.llm-fine-tuning/
: Fine tuning scripts and notebooks to fine tune State of the art (SOTA) models on Databricks.
SOTA LLM examples
Databricks works with thousands of customers to build generative AI applications. While you can use Databricks to work with any generative AI model, including commercial and research, the table below lists our current model recommendations for popular use cases. Note: The table only lists open source models that are for free commercial use.
<!--- <style> table th:first-of-type { width: 10%; } table th:nth-of-type(2) { width: 30%; } table th:nth-of-type(3) { width: 30%; } table th:nth-of-type(4) { width: 30%; } </style> -->Use case | Quality-optimized | Balanced | Speed-optimized |
---|---|---|---|
Text generation following instructions | Mixtral-8x7B-Instruct-v0.1 <br> <br> Llama-2-70b-chat-hf | mistral-7b <br><br> MPT-7B-Instruct <br> MPT-7B-8k-Instruct <br> <br> Llama-2-7b-chat-hf <br> Llama-2-13b-chat-hf | phi-2 |
Text embeddings (English only) | e5-mistral-7b-instruct(7B) | bge-large-en-v1.5(0.3B) <br> e5-large-v2 (0.3B) | bge-base-en-v1.5 (0.1B) <br> e5-base-v2 (0.1B) |
Transcription (speech to text) | whisper-large-v2(1.6B) <br> whisper-medium (0.8B) | ||
Image generation | stable-diffusion-xl | ||
Code generation | CodeLlama-70b-hf <br> CodeLlama-70b-Instruct-hf <br> CodeLlama-70b-Python-hf (Python optimized) <br>CodeLlama-34b-hf <br> CodeLlama-34b-Instruct-hf <br> CodeLlama-34b-Python-hf (Python optimized) | CodeLlama-13b-hf <br> CodeLlama-13b-Instruct-hf <br> CodeLlama-13b-Python-hf (Python optimized) <br> CodeLlama-7b-hf <br> CodeLlama-7b-Instruct-hf <br> CodeLlama-7b-Python-hf (Python optimized) |
- To get a better performance on instructor-xl, you may follow the unified template to write instructions.
Model Evaluation Leaderboard
Text generation models
The model evaluation results presented below are measured by the Mosaic Eval Gauntlet framework. This framework comprises a series of tasks specifically designed to assess the performance of language models, including widely-adopted benchmarks such as MMLU, Big-Bench, HellaSwag, and more.
Model Name | Core Average | World Knowledge | Commonsense Reasoning | Language Understanding | Symbolic Problem Solving | Reading Comprehension |
---|---|---|---|---|---|---|
Mistral-7B-v0.1 | 0.522 | 0.558 | 0.513 | 0.555 | 0.342 | 0.641 |
falcon-40b | 0.501 | 0.556 | 0.55 | 0.535 | 0.269 | 0.597 |
falcon-40b-instruct | 0.5 | 0.542 | 0.571 | 0.544 | 0.264 | 0.58 |
Llama-2-13b-hf | 0.479 | 0.515 | 0.482 | 0.52 | 0.279 | 0.597 |
Llama-2-13b-chat-hf | 0.476 | 0.522 | 0.512 | 0.514 | 0.271 | 0.559 |
Mistral-7B-Instruct-v0.1 | 0.469 | 0.48 | 0.502 | 0.492 | 0.266 | 0.604 |
mpt-30b-instruct | 0.465 | 0.48 | 0.513 | 0.494 | 0.238 | 0.599 |
mpt-30b | 0.431 | 0.494 | 0.47 | 0.477 | 0.234 | 0.481 |
Llama-2-7b-chat-hf | 0.42 | 0.476 | 0.447 | 0.478 | 0.221 | 0.478 |
Llama-2-7b-hf | 0.401 | 0.457 | 0.41 | 0.454 | 0.217 | 0.465 |
mpt-7b-8k-instruct | 0.36 | 0.363 | 0.41 | 0.405 | 0.165 | 0.458 |
mpt-7b-instruct | 0.354 | 0.399 | 0.415 | 0.372 | 0.171 | 0.415 |
mpt-7b-8k | 0.354 | 0.427 | 0.368 | 0.426 | 0.171 | 0.378 |
falcon-7b | 0.335 | 0.371 | 0.421 | 0.37 | 0.159 | 0.355 |
mpt-7b | 0.324 | 0.356 | 0.384 | 0.38 | 0.163 | 0.336 |
falcon-7b-instruct | 0.307 | 0.34 | 0.372 | 0.333 | 0.108 | 0.38 |