policy

criteriaforreward

Mukullight · PyTorch

or hover any field below to flag it

Overview

Name
criteriaforreward
Author
Mukullight
Framework
PyTorch
License
MIT
Skill type
manipulation
Evidence level
untested
Task description
the following repository contains the code for finetuning the reward models using ranked human preference it helps stream line the process by which the human feedback can be easily integrated into the rl based fine tuning for llm alignment

Spaces

Action space
other · 0-dim · 0Hz
Observation space
  • type: other

Links

HuggingFace repo
null
Paper (arXiv)
null

Compatible robots

3+17 mentioned but not in catalog yet

Compatible environments

0

No environments list criteriaforreward yet.

Datasets that reference this policy

0

No datasets reference criteriaforreward yet.