OPENRLHF

Scale RLHF and agentic RL training with Ray, vLLM, and DeepSpeed

Apache-2.0

ABOUT

Building RLHF pipelines usually means wiring together separate tools for rollout, reward modeling, distributed training, inference, and checkpoint management across many GPUs. OpenRLHF packages those pieces into one framework so teams can run scalable SFT, preference optimization, and reinforcement learning workflows with less glue code.

INSTALL

pip install openrlhf[vllm]

INTEGRATION GUIDE

1. Train PPO, GRPO, or REINFORCE-style alignment workflows for open LLMs 2. Fine-tune models with SFT, reward modeling, and DPO in one framework 3. Run distributed multi-node RLHF jobs with Ray, vLLM, and DeepSpeed 4. Train vision-language models with image inputs inside reinforcement learning loops

OPENRLHF

ABOUT

INTEGRATION GUIDE

TAGS