dataset

RSVLM_SFT

RL-MIND

or hover any field below to flag it

Overview

Name

RSVLM_SFT

Source

RL-MIND

Episodes

Robot count

Format

other

Description

MF-RSVLM FUSE-RSVLM: Feature Fusion Vision-Language Model for Remote Sensing Project Page | Paper | Model | Dataset If this project helps you, please give us a star on GitHub. Overview MF-RSVLM is a remote sensing vision-language model (VLM). It combines a CLIP vision encoder, a two-layer MLP projector, and a Vicuna-7B LLM, and is trained in two stages for modality alignment and instruction following. Visual Encoder: CLIP… See the full description on the dataset page: https://huggingface.co/datasets/RL-MIND/RSVLM_SFT.

Robots used

null

Links

HuggingFace dataset

RL-MIND/RSVLM_SFT