policy

Reinforcement-Learning-Based-Fine-Tuning-of-Large-Language-Model-

AkashBadhautiya · PyTorch

or hover any field below to flag it

Overview

Name

Author

AkashBadhautiya

Framework

PyTorch

License

unknown

Skill type

other

Evidence level

untested

Task description

Implementing a full RLHF (Reinforcement Learning from Human Feedback) pipeline to fine-tune a pre-trained transformer (GPT-2) using PPO and GRPO optimization methods. The project integrates Supervised Fine-Tuning (SFT), Reward Modeling (RM), and Reinforcement Learning (RL) stages to align model beha

Spaces

Action space

other · 0-dim · 0Hz

Observation space

type: other

Links

HuggingFace repo

null

Paper (arXiv)

null

Compatible environments

No environments list Reinforcement-Learning-Based-Fine-Tuning-of-Large-Language-Model- yet.

Datasets that reference this policy

No datasets reference Reinforcement-Learning-Based-Fine-Tuning-of-Large-Language-Model- yet.

Overview

Spaces

Links

Compatible robots

Compatible environments

Datasets that reference this policy