policy

weibo_robert_llm

ChenChiShui · PyTorch

or hover any field below to flag it

Overview

Name

weibo_robert_llm

Author

ChenChiShui

Framework

PyTorch

License

MIT

Skill type

other

Evidence level

untested

Task description

Weibo Robert LLM 基于 Qwen3-4B 和 CommentR Interaction Dataset 的微博评论机器人训练项目，通过多阶段训练（SFT → Reward Model → RL）学习生成符合人类偏好的高质量评论回复。

Action space

other · 0-dim · 0Hz

Observation space

HuggingFace repo

null

Paper (arXiv)

null

3+17 mentioned but not in catalog yet

No environments list weibo_robert_llm yet.

No datasets reference weibo_robert_llm yet.