Awesome

Cloning the repository and downloading the dataset

This repository contains the dataset named "jssp_llm_format_120k.json" by Git Large File Storage (LFS). Follow the steps below to ensure you can properly clone the repository and access the large files.

Prerequisites

Git
Git LFS (download and install from Git LFS website)

Setup Instructions

Step 1: Install Git LFS

Ensure Git LFS is installed on your system. If it is not already installed, you can install it by running:

git lfs install

Step 2: Clone the Repository

git clone https://github.com/starjob42/datasetjsp.git
cd datasetjsp

Step 3: Pull LFS Objects After cloning the repository, ensure Git LFS pulls the large files:

git lfs pull

Step 4: Viewing the DataCard To view the datacard please run the following code:

python read_datacard.py

If Git LFS does not work, the dataset can also be downloaded from: (google drive)

Setting Up Your Python Environment

Follow these instructions to create a virtual environment and install the necessary libraries.

Step 1: Create a Virtual Environment

python3 -m venv llm_env

Activate the Virtual Environment After creating the virtual environment, activate it using the following command:

On Windows

.\llm_env\Scripts\activate

On macOS and Linux

source llm_env/bin/activate

Install the Required Libraries

pip install -r requirements.txt

Training

Make sure to pass correct path to the trainig dataset. The default path is './jssp_llm_format_120k.json'

python train_phi3_lora_jssp.py

Inference

Please download and unzip the checkpoint-1750.zip and put the entire foder inside the checkpoints directory from (google drive). The checkpoints directory should look like this afterwards: ./checkpoints/checkpoint-1750/ . To infer use the following command, which uses 'test_2000.json' testing dataset

python infer_phi3.py

JSSP LLM Format Dataset

Dataset Overview

Dataset Name: jssp_llm_format_120k.json Number of Entries: 120,000
Number of Fields: 5

Fields Description

num_jobs
- Type: int64
- Number of Unique Values: 12
num_machines
- Type: int64
- Number of Unique Values: 12
prompt_jobs_first
- Type: object
- Number of Unique Values: 120,000
prompt_machines_first

Type: object
Number of Unique Values: 120,000

output
- Type: object
- Number of Unique Values: 120,000

Usage

This dataset can be used for training LLMs for job-shop scheduling problems (JSSP). Each entry provides information about the number of jobs, the number of machines, and other relevant details formatted in natural language.

License

This dataset is licensed under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). For more details, see the license description. The dataset will remain accessible for an extended period.

Citation

If you use this dataset in your research, please cite it as follows:


@dataset{jssp_for_llm,
author = {Anonymous},
title = {LLMs can Schedule},
year = {2024},
url = {https://github.com/starjob42/datasetjsp.git},
note = {Submitted to NeurIPS 2024 Datasets and Benchmarks}

}