VQAsk

Official implementation of VQAsk: a multimodal Android GPT-based application to help blind users visualize pictures (https://doi.org/10.1145/3656650.3656677). In this project we developed a Flutter application focused on the Visual Question Answering task, a computer vision task where a system is given a text-based question about an image, and it must infer the answer. We have implemented different interaction modes to guarantee an enjoyable user experience and to give users the possibility to choose how to exploit the application's functionalities according to their needs or preferences. Particularly, this application provides the following modalities:

Voice Interaction
Haptic Interaction
Visual Interaction

This project has been started to be developed during the A.Y. 2022-23, for the Multimodal Interaction course at Sapienza University of Rome, and was then carried on as a research project.

Some screen examples of the app

Getting Started

To launch the application, type the command flutter run on the terminal after connecting a physical or emulated android device.

A few resources to get you started if this is your first Flutter project:

For help getting started with Flutter development, view the online documentation, which offers tutorials, samples, guidance on mobile development, and a full API reference.

Citation

If you find this work useful, feel free to cite us:

@inproceedings{vqask2024,
author = {De Marsico, Maria and Giacanelli, Chiara and Manganaro, Clizia Giorgia and Palma, Alessio and Santoro, Davide},
title = {VQAsk: a multimodal Android GPT-based application to help blind users visualize pictures},
year = {2024},
isbn = {9798400717642},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3656650.3656677},
doi = {10.1145/3656650.3656677},
booktitle = {Proceedings of the 2024 International Conference on Advanced Visual Interfaces},
articleno = {39},
numpages = {5},
keywords = {Visual Question Answering, natural language processing and computer vision for scene interpretation, visually impaired users},
location = {<conf-loc>, <city>Arenzano, Genoa</city>, <country>Italy</country>, </conf-loc>},
series = {AVI '24}
}

Authors

Marilena de Marsico
Chiara Giacanelli
Clizia Giorgia Manganaro
Alessio Palma
Davide Santoro

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
android		android
assets		assets
ios		ios
lib		lib
linux		linux
macos		macos
test		test
web		web
windows		windows
.env		.env
.gitignore		.gitignore
.metadata		.metadata
README.md		README.md
VQAsk_ a multimodal Android application to help blind users visualize pictures.pdf		VQAsk_ a multimodal Android application to help blind users visualize pictures.pdf
analysis_options.yaml		analysis_options.yaml
flutter_01.png		flutter_01.png
flutter_launcher_icons.yaml		flutter_launcher_icons.yaml
pubspec.lock		pubspec.lock
pubspec.yaml		pubspec.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VQAsk

Some screen examples of the app

Getting Started

Citation

Authors

About

Releases

Packages

Contributors 4

Languages

alesspalma/VQAsk

Folders and files

Latest commit

History

Repository files navigation

VQAsk

Some screen examples of the app

Getting Started

Citation

Authors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages