Skip to content

Official implementation of the "VQAsk: a multimodal Android GPT-based application to help blind users visualize pictures" paper [AVI '24].

Notifications You must be signed in to change notification settings

alesspalma/VQAsk

Repository files navigation

VQAsk

Official implementation of VQAsk: a multimodal Android GPT-based application to help blind users visualize pictures (https://doi.org/10.1145/3656650.3656677). In this project we developed a Flutter application focused on the Visual Question Answering task, a computer vision task where a system is given a text-based question about an image, and it must infer the answer. We have implemented different interaction modes to guarantee an enjoyable user experience and to give users the possibility to choose how to exploit the application's functionalities according to their needs or preferences. Particularly, this application provides the following modalities:

  • Voice Interaction
  • Haptic Interaction
  • Visual Interaction

This project has been started to be developed during the A.Y. 2022-23, for the Multimodal Interaction course at Sapienza University of Rome, and was then carried on as a research project.

Some screen examples of the app

Getting Started

To launch the application, type the command flutter run on the terminal after connecting a physical or emulated android device.

A few resources to get you started if this is your first Flutter project:

For help getting started with Flutter development, view the online documentation, which offers tutorials, samples, guidance on mobile development, and a full API reference.

Citation

If you find this work useful, feel free to cite us:

@inproceedings{vqask2024,
author = {De Marsico, Maria and Giacanelli, Chiara and Manganaro, Clizia Giorgia and Palma, Alessio and Santoro, Davide},
title = {VQAsk: a multimodal Android GPT-based application to help blind users visualize pictures},
year = {2024},
isbn = {9798400717642},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3656650.3656677},
doi = {10.1145/3656650.3656677},
booktitle = {Proceedings of the 2024 International Conference on Advanced Visual Interfaces},
articleno = {39},
numpages = {5},
keywords = {Visual Question Answering, natural language processing and computer vision for scene interpretation, visually impaired users},
location = {<conf-loc>, <city>Arenzano, Genoa</city>, <country>Italy</country>, </conf-loc>},
series = {AVI '24}
}

Authors

  • Marilena de Marsico
  • Chiara Giacanelli
  • Clizia Giorgia Manganaro
  • Alessio Palma
  • Davide Santoro

About

Official implementation of the "VQAsk: a multimodal Android GPT-based application to help blind users visualize pictures" paper [AVI '24].

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •