Home

Awesome

FORGE: Pre-training Open Foundation Models for Science

Contributions

FORGE models

Model#Params#TokensLink
Forge-bio1.44B38Bdownload
Forge-che1.44B41Bdownload
Forge-eng1.44B29Bdownload
Forge-mat1.44B15Bdownload
Forge-phy1.44B32Bdownload
Forge-soc1.44B90Bdownload
Forge-s11.44B10Bdownload
Forge-s21.44B20Bdownload
Forge-s31.44B30Bdownload
Forge-s41.44B257Bdownload
Forge-m113B30Bdownload
Forge-m213B257Bdownload
Forge-l22.4B257Bdownload

Data sources

Example usages

from transformers import GPTNeoXForCausalLM, GPTNeoXTokenizerFast
model = GPTNeoXForCausalLM.from_pretrained("path_to_forge_model")
tokenizer = GPTNeoXTokenizerFast.from_pretrained("path_to_forge_model")
prompt = "high entropy alloy applications include"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
gen_tokens = model.generate(input_ids,
                            do_sample=True,
                            temperature=0.7,
                            max_length=100)
gen_text = tokenizer.batch_decode(gen_tokens)[0]
print(gen_text)
high entropy alloy applications include high strength steels, alloys, composites, as well some metal alloys. In recent years, there has been much interest the use of such materials for manufacturing parts, components, machinery. For example, automotive sector an increasing number applications. most widely used is steels.

Pre-processing

Training

Scientific downstream tasks

Raw performance data and plots

Reference

@INPROCEEDINGS{10.1145/3581784.3613215,
  author={Junqi Yin and Sajal Dash and Feiyi Wang and Mallikarjun Shankar},
  title={FORGE: Pre-training Open Foundation Models for Science}, 
  booktitle={SC23: International Conference for High Performance Computing, Networking, Storage and Analysis}, 
  year={2023},
  doi={10.1145/3581784.3613215}}