Port ConRFT #1832

s1lent4gnt · 2025-09-01T13:31:35Z

What this does

Implement ConRFT (Consistency-based Reinforced Fine-Tuning) approach for fine-tuning Vision-Language-Action (VLA) models in robotic manipulation tasks.

NOTE: this PR depends on #1831

How to test it

Cal-ConRFT (offline)

python src/lerobot/scripts/rl/learner.py --config json/train_conrft_offline.json

HIL-ConRFT (online)

python src/lerobot/scripts/rl/learner.py --config json/train_conrft_online_learner.json

python src/lerobot/scripts/rl/actor.py --config json/train_conrft_online_actor.json

You can find config files in this branch lilkm/configs here : https://github.com/s1lent4gnt/lerobot/tree/lilkm/configs/json

TODO

Implement state stacking and masking observation in OctoEncodingWrapper.
Add/Implement mc_returns in dataset.
Investigate slow training in offline phase.

This module implements the ConRFT (Consistency-based Reinforced Fine-Tuning) approach for fine-tuning Vision-Language-Action (VLA) models in robotic manipulation tasks.

…lementation

… varying image sizes

…ling_sac.py implementation

for more information, see https://pre-commit.ci

s1lent4gnt added 30 commits September 1, 2025 15:25

Add Octo General Robotic Policy

aa4ebfb

Refactor normalization in OctoPolicy

b519aee

Add OctoConfig and OctoPolicy support in the policies module

99f0bff

nit

7aaf507

Add push_to_hub attribute in OctoConfig

d2e8f67

Refactor

0f25e31

debugging

f94a234

load T5 with float32

73e784c

add freezing params

466cc89

Implement selective freezing and data normalization

11446ba

Fix action handling in diffusion model

9e2b799

Update selective freezing in OctoPolicy

49ffa29

Add OctoConfig support in policy configuration

ed3f1fb

Refactor and clean up unused code

7067e2d

Disable normalization

808537c

feat(policy): Add ConRFT policy

a16e6ef

This module implements the ConRFT (Consistency-based Reinforced Fine-Tuning) approach for fine-tuning Vision-Language-Action (VLA) models in robotic manipulation tasks.

refactor

7bcf045

add config parameters and refactor input/output features

7858dcf

refactor: Improve code structure and readability in ConRFT policy imp…

e538081

…lementation

refactor: Update ConRFT configuration and modeling and learner

e028edb

Update image processing in SACObservationEncoder and ReplayBuffer for…

4e94265

… varying image sizes

Add Cal-ConRFT (offline) in learner.py

9b357f1

Update configuration_conrft.py

9321cac

Update modeling_conrft.py to match JAX implementation and follow mode…

19e86f2

…ling_sac.py implementation

Refactor

2c23c00

nit

0fdeca3

Refactor

f336c32

Update config params

b79fac7

Refactor: remove discrete critic

c1d1a4d

Update config with octo params

45965e5

s1lent4gnt added 3 commits September 1, 2025 15:25

First matching check with jax implementation

3b60c6b

Implement HIL-ConRFT (online) phase

a1b5b4b

Refactor: ruff format

99d004f

s1lent4gnt force-pushed the lilkm/port-conrft branch from ad06fcb to 99d004f Compare September 1, 2025 13:41

[pre-commit.ci] auto fixes from pre-commit.com hooks

7070b3c

for more information, see https://pre-commit.ci

s1lent4gnt mentioned this pull request Sep 1, 2025

[WIP] [HIL-SERL] Add Flow Q-learning (FQL) agent with action chunking #1818

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Port ConRFT #1832

Port ConRFT #1832

s1lent4gnt commented Sep 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Port ConRFT #1832

Are you sure you want to change the base?

Port ConRFT #1832

Conversation

s1lent4gnt commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this does

How to test it

TODO

Uh oh!

Uh oh!

s1lent4gnt commented Sep 1, 2025 •

edited

Loading