Home

Awesome

โ˜๏ธ KULLM (๊ตฌ๋ฆ„): Korea University Large Language Model

<p align="center" width="100%"> <img src="assets/kullm_logo.png" alt="NLP Logo" style="width: 50%;"> </p>

Update Logs


<br>

KULLM(๊ตฌ๋ฆ„)์€ ๊ณ ๋ ค๋Œ€ํ•™๊ต NLP & AI ์—ฐ๊ตฌ์‹ค๊ณผ HIAI ์—ฐ๊ตฌ์†Œ๊ฐ€ ๊ฐœ๋ฐœํ•œ ํ•œ๊ตญ์–ด Large Language Model (LLM) ์ž…๋‹ˆ๋‹ค.

KULLM3์„ ๊ณต๊ฐœํ•ฉ๋‹ˆ๋‹ค.

(์ด์ „ ๋ชจ๋ธ์˜ ํ•™์Šต ๋ฐฉ๋ฒ• ๋ฐ ๋ฐ์ดํ„ฐ๋Š” kullm_v2 ๋ธŒ๋žœ์น˜๋ฅผ ์ฐธ๊ณ ํ•ด ์ฃผ์„ธ์š”.)

<br/>

KULLM3 ๋Œ€ํ™” ์„ฑ๋Šฅ ํ‰๊ฐ€ ๊ฒฐ๊ณผ

<img src="assets/kullm3_instruction_evaluation.png" >

๋Œ€ํ™” ์˜ˆ์‹œ

<img src="assets/ex1.png" alt="example 1" >
<img src="assets/ex2.png" alt="example 2">
<img src="assets/ex3.png" alt="example 3">
<img src="assets/ex4.png" alt="example 4">

KULLM ๋ชจ๋ธ ์‹คํ–‰ ์˜ˆ์‹œ ์ฝ”๋“œ

Huggingface TextStreamer๋กœ ์ŠคํŠธ๋ฆฌ๋ฐ

pip install torch transformers==4.38.2 accelerate

์•„๋ž˜ ์˜ˆ์ œ ์ฝ”๋“œ๋กœ ์‹คํ–‰ํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

MODEL_DIR = "nlpai-lab/KULLM3"
model = AutoModelForCausalLM.from_pretrained(MODEL_DIR, torch_dtype=torch.float16).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

s = "๊ณ ๋ ค๋Œ€ํ•™๊ต์— ๋Œ€ํ•ด์„œ ์•Œ๊ณ  ์žˆ๋‹ˆ?"
conversation = [{'role': 'user', 'content': s}]
inputs = tokenizer.apply_chat_template(
    conversation,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors='pt').to("cuda")
_ = model.generate(inputs, streamer=streamer, max_new_tokens=1024, use_cache=True)

# ๋„ค, ๊ณ ๋ ค๋Œ€ํ•™๊ต์— ๋Œ€ํ•ด ์•Œ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ณ ๋ ค๋Œ€ํ•™๊ต๋Š” ๋Œ€ํ•œ๋ฏผ๊ตญ ์„œ์šธ์— ์œ„์น˜ํ•œ ์‚ฌ๋ฆฝ ๋Œ€ํ•™๊ต๋กœ, 1905๋…„์— ์„ค๋ฆฝ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋Œ€ํ•™๊ต๋Š” ํ•œ๊ตญ์—์„œ ๊ฐ€์žฅ ์˜ค๋ž˜๋œ ๋Œ€ํ•™ ์ค‘ ํ•˜๋‚˜๋กœ, ๋‹ค์–‘ํ•œ ํ•™๋ถ€ ๋ฐ ๋Œ€ํ•™์› ํ”„๋กœ๊ทธ๋žจ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๊ณ ๋ ค๋Œ€ํ•™๊ต๋Š” ํŠนํžˆ ๋ฒ•ํ•™, ๊ฒฝ์ œํ•™, ์ •์น˜ํ•™, ์‚ฌํšŒํ•™, ๋ฌธํ•™, ๊ณผํ•™ ๋ถ„์•ผ์—์„œ ๋†’์€ ๋ช…์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์Šคํฌ์ธ  ๋ถ„์•ผ์—์„œ๋„ ํ™œ๋ฐœํ•œ ํ™œ๋™์„ ๋ณด์ด๋ฉฐ, ๋Œ€ํ•œ๋ฏผ๊ตญ ๋Œ€ํ•™ ์Šคํฌ์ธ ์—์„œ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ณ ๋ ค๋Œ€ํ•™๊ต๋Š” ๊ตญ์ œ์ ์ธ ๊ต๋ฅ˜์™€ ํ˜‘๋ ฅ์—๋„ ์ ๊ทน์ ์ด๋ฉฐ, ์ „ ์„ธ๊ณ„ ๋‹ค์–‘ํ•œ ๋Œ€ํ•™๊ณผ์˜ ํ˜‘๋ ฅ์„ ํ†ตํ•ด ๊ธ€๋กœ๋ฒŒ ๊ฒฝ์Ÿ๋ ฅ์„ ๊ฐ•ํ™”ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
<br/>

Training

๋‹น์‹ ์€ ๊ณ ๋ ค๋Œ€ํ•™๊ต NLP&AI ์—ฐ๊ตฌ์‹ค์—์„œ ๋งŒ๋“  AI ์ฑ—๋ด‡์ž…๋‹ˆ๋‹ค.
๋‹น์‹ ์˜ ์ด๋ฆ„์€ 'KULLM'์œผ๋กœ, ํ•œ๊ตญ์–ด๋กœ๋Š” '๊ตฌ๋ฆ„'์„ ๋œปํ•ฉ๋‹ˆ๋‹ค.
๋‹น์‹ ์€ ๋น„๋„๋•์ ์ด๊ฑฐ๋‚˜, ์„ฑ์ ์ด๊ฑฐ๋‚˜, ๋ถˆ๋ฒ•์ ์ด๊ฑฐ๋‚˜ ๋˜๋Š” ์‚ฌํšŒ ํ†ต๋…์ ์œผ๋กœ ํ—ˆ์šฉ๋˜์ง€ ์•Š๋Š” ๋ฐœ์–ธ์€ ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
์‚ฌ์šฉ์ž์™€ ์ฆ๊ฒ๊ฒŒ ๋Œ€ํ™”ํ•˜๋ฉฐ, ์‚ฌ์šฉ์ž์˜ ์‘๋‹ต์— ๊ฐ€๋Šฅํ•œ ์ •ํ™•ํ•˜๊ณ  ์นœ์ ˆํ•˜๊ฒŒ ์‘๋‹ตํ•จ์œผ๋กœ์จ ์ตœ๋Œ€ํ•œ ๋„์™€์ฃผ๋ ค๊ณ  ๋…ธ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
์งˆ๋ฌธ์ด ์ด์ƒํ•˜๋‹ค๋ฉด, ์–ด๋–ค ๋ถ€๋ถ„์ด ์ด์ƒํ•œ์ง€ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ๊ฑฐ์ง“ ์ •๋ณด๋ฅผ ๋ฐœ์–ธํ•˜์ง€ ์•Š๋„๋ก ์ฃผ์˜ํ•ฉ๋‹ˆ๋‹ค.

Model Evaluation (Fully Reproducible)

Prompt

๋ชจ๋ธ ํ‰๊ฐ€์— ์‚ฌ์šฉํ•œ ํ”„๋กฌํ”„ํŠธ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
์‹คํ—˜ ๊ฒฐ๊ณผ, ํ•œ๊ตญ์–ด๋ณด๋‹ค ์˜์–ด ํ”„๋กฌํ”„ํŠธ๊ฐ€ ๋” ์ •ํ™•ํ•œ ํ‰๊ฐ€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.
๋”ฐ๋ผ์„œ ํ‰๊ฐ€์˜ ์ •ํ™•์„ฑ์„ ์œ„ํ•ด ์˜์–ด ํ”„๋กฌํ”„ํŠธ๋กœ ์ง„ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค.

You will be given evaluation instruction, input and AI-generated response.
Your task is to rate the response on given metric.
Please make sure you read and understand these instructions carefully. Please keep this document open while reviewing, and refer to it as needed.

Evaluation Criteria:
- Fluency (1-5): The quality of the language used in the translation. A high-quality response should be grammatically correct, idiomatic, and free from spelling and punctuation errors.
- Coherence (1-5): A high score indicates that the response maintains consistent context. A low score is given if the response shifts context or language inappropriately from instruction(e.g. instruction's language is Korean, but response is English).
- Accuracy (1-5) - The correctness of the answer. The answer should be factually correct and directly answer the question asked
- Completeness (1-5) - The extent to which the response covers all aspects of the question. The response should not just address one part of the question, but should provide a comprehensive response.
- Overall Quality (1-5) - The overall effectiveness and excellence of the response, integrating considerations of all above criteria.

Evaluation Steps:
1. Read the instruction and input carefully and understand what it is asking.
2. Read the AI-generated response and Evaluation Criteria.
3. Assign a score for each criterion on a scale of 1 to 5, where 1 is the lowest and 5 is the highest.

Instruction:
{instruction}

Input:
{input}

Response:
{response}

Evaluation Form (scores ONLY):
- Fluency (1-5):
- Coherence (1-5):
- Accuracy (1-5):
- Completeness (1-5):
- Overall Quality (1-5):
<br/>

์ฃผ์˜์‚ฌํ•ญ

License

Citation

Please cite the repo if you use the data or code in this repo.

@misc{kullm3,
  author = {Kim, Jeongwook and Lee, Taemin and Jang, Yoonna and Moon, Hyeonseok and Son, Suhyune and Lee, Seungyoon and Kim, Dongjun},
  title = {KULLM3: Korea University Large Language Model 3},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/nlpai-lab/kullm}},
}
@inproceedings{lee2023kullm,
  title={KULLM: Learning to Construct Korean Instruction-following Large Language Models},
  author={Lee, SeungJun and Lee, Taemin and Lee, Jeongwoo and Jang, Yoona and Lim, Heuiseok},
  booktitle={Annual Conference on Human and Language Technology},
  pages={196--202},
  year={2023},
  organization={Human and Language Technology}
}
@misc{kullm,
  author = {NLP & AI Lab and Human-Inspired AI research},
  title = {KULLM: Korea University Large Language Model Project},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/nlpai-lab/kullm}},
}