We recommend users download the image using the provided coordinates from the Google Earth API. Typically, each image is organized with the following naming format:
Country_Id_City_ULLon_ULLat_LRLon_LRLat.jpg
# Country: Name of the country
# Id: Unique identifier for this city
# City: Name of the city
# ULLon: Longitude of the upper left corner
# ULLat: Latitude of the upper left corner
# LRLon: Longitude of the lower right corner
# LRLat: Latitude of the lower right corner
- RSVQA: source
- NWPU: source
- RSICD: source
- RSITMD: source
- DIOR-RSVG: source
- RSVG: source
- UCM: source
- fMoW: source
- LLaVA: Follow instruction from here (we only use random 20K subset).
-
Please download all the data through the Google Earth API and place it into a single directory with a name ending in “_Image.”
-
Download our caption files from here and place all the json files in a folder named “OSMCapAnn.”
Finally, your pretraining data folder should be structured as follows:
|-PretrainData
|----XXX_Image
| |---xxxxx.jpg
| ...
|----OSMCapAnn
| |features_01.json
| ...
- Download the instruction data from here and the corresponding images. Then organize the image folder names and json names in the following similar format:
|-Stage2Data
|----RSITMD_Image
|----RSITMD.json
|----RSITMDDetail_Image
|----RSITMDDetail.json
|----UCM_Image
|----UCM.json
| ...
- Download the instruction data from here and the corresponding images. Then organize the image folder names and json names in the following similar format:
|-Stage3Data
|----OSM_Image
|----OSM.json
|----LLaVA_Image
|----LLaVA.json
| ...
- Classification, VQA are using the same format. Therefore, just download from source.
- Our reformat result of VG can be found at here.