policy

RLHF-Dialogue-Summarizer-with-PPO-and-Reward-Model

ivanluk914 · PyTorch

or hover any field below to flag it

Overview

Name

Author

ivanluk914

Framework

PyTorch

License

MIT

Skill type

other

Evidence level

untested

Task description

This project implements RLHF to fine-tune a FLAN-T5 language model for dialogue summarization. The training pipeline involves previous PEFT model, dataset preparation, reward modeling that leveraging a toxicity classifier, implementing PPO-based RL with trl library and evaluation before and after RL

Spaces

Action space

other · 0-dim · 0Hz

Observation space

type: other

Links

HuggingFace repo

null

Paper (arXiv)

null

Compatible robots

3+17 mentioned but not in catalog yet

SpotBoston Dynamics T1Booster Robotics ApolloApptronik

Compatible environments

No environments list RLHF-Dialogue-Summarizer-with-PPO-and-Reward-Model yet.

Datasets that reference this policy

No datasets reference RLHF-Dialogue-Summarizer-with-PPO-and-Reward-Model yet.