crystal

CRySTAL: Condensed Reinforcement using Structured Training for Adaptive Learning

1. Overview

1-1. Term explanation

Condensed Reinforcement: Fine-tuning a small model using a larger model through reinforcement learning
Structured Training: Constraining the inputs and outputs to follow a specific format
Adaptive Learning: Adjusting the model to focus on a specific use case instead of general use

1-2. Data source

Yahoo Japan ニュース: https://news.yahoo.co.jp/rss

1-3. Model training

Pretrained model: Qwen/Qwen2.5-0.5B-Instruct
Training algorithm: GRPO trainer
Fine-tuning library: Unsloth

2. Run the program

Prepare a machine with GPU capacity (such as an AWS g4dn.xlarge EC2 instance).

Note: It's possible to run the program on Google Colab, but since this repository is private, it requires additional configuration, which I'm not yet familiar with.

Prerequisites:

Change to ./crystal-ai/ directory:

cd ./crystal-ai/

Install packages and dependencies:

sudo apt-get update -y
sudo apt-get install -y cmake build-essential

curl -LsSf https://astral.sh/uv/0.8.4/install.sh | sh
source $HOME/.local/bin/env

uv sync

Activate the virtual environment:

source ./.venv/bin/activate

Change to ./storage/ directory:

cd ../storage/

Run:

Generate data:

uv run ../crystal-ai/main.py data

Train the model (nohup keeps the program running even after exiting the terminal):

uv run python -c 'import nltk; nltk.download("cmudict")'

# When running the program, mysterious `core.*` files are somehow created. I'm not sure what causes this and it's quite annoying.
# Here we are temporarily disabling core dumps, but I want to take a closer look later.
ulimit -c 0
nohup uv run ../crystal-ai/main.py train &

Evaluate the model, where ${SUBDIR} refers to the subdirectory created during training:

uv run ../crystal-ai/main.py evaluate --lora-dir ./train/${SUBDIR}/lora/

4. References

I want to express my sincere thanks to:

Qwen team, for publishing their SOTA models that we will fine-tune for various downstream tasks.
GRPO trainer from HuggingFace, based on GRPO, an RL algorithm proposed by DeepSeek.
Unsloth, a fast and efficient library for fine-tuning LLMs.
And many more...

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
crystal-ai		crystal-ai
deployment-local		deployment-local
storage		storage
.pep8		.pep8
LICENSE		LICENSE
README.md		README.md
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

crystal

1. Overview

1-1. Term explanation

1-2. Data source

1-3. Model training

2. Run the program

4. References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

metalwhale/crystal

Folders and files

Latest commit

History

Repository files navigation

crystal

1. Overview

1-1. Term explanation

1-2. Data source

1-3. Model training

2. Run the program

4. References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages