Skip to content

A PyTorch implementation of the paper "Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits". This repository provides a flexible and modular approach to Reinforcement Learning from Human Feedback (RLHF).

Notifications You must be signed in to change notification settings

ZinYY/Online_RLHF

Error
Looks like something went wrong!

About

A PyTorch implementation of the paper "Provably Efficient RLHF Pipeline: A Unified View from Contextual Bandits". This repository provides a flexible and modular approach to Reinforcement Learning from Human Feedback (RLHF).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published