policy

criteriaforreward

Mukullight · PyTorch

or hover any field below to flag it

Overview

Name

criteriaforreward

Author

Mukullight

Framework

PyTorch

License

MIT

Skill type

manipulation

Evidence level

untested

Task description

the following repository contains the code for finetuning the reward models using ranked human preference it helps stream line the process by which the human feedback can be easily integrated into the rl based fine tuning for llm alignment

Spaces

Action space

other · 0-dim · 0Hz

Observation space

type: other

Links

HuggingFace repo

null

Paper (arXiv)

null

Compatible robots

3+17 mentioned but not in catalog yet

SpotBoston Dynamics T1Booster Robotics ApolloApptronik

Compatible environments

No environments list criteriaforreward yet.

Datasets that reference this policy

No datasets reference criteriaforreward yet.