policy

VinePPO

Asimawad · PyTorch

or hover any field below to flag it

Overview

Name

VinePPO

Author

Asimawad

Framework

PyTorch

License

Apache-2.0

Skill type

navigation

Evidence level

untested

Task description

This repository contains an experimental implementation of **Fine-Grained Credit Assignment for RL Training (CAL)** on top of Google's Tunix framework. This research explores token-level reward assignment to improve training stability and sample efficiency in reinforcement learning for large languag

Spaces

Action space

other · 0-dim · 0Hz

Observation space

type: other

Links

HuggingFace repo

null

Paper (arXiv)

null

Compatible environments

No environments list VinePPO yet.

Datasets that reference this policy

No datasets reference VinePPO yet.

Overview

Spaces

Links

Compatible robots

Compatible environments

Datasets that reference this policy