Skip to content

UBC-CIC/student-advising-assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Student Advising Assistant

This is a prototype question answering system for the purpose of student advising at higher education institutions. It performs retrieval augmented generation using university websites as information sources. For more information visit the CIC Website.

Index Description
High Level Architecture High level overview illustrating component interactions
Deployment How to deploy the project
User Guide The working solution
Developer Guide Information for further developers
Changelog Any changes post publish
Credits Meet the team behind the solution
License License details

High Level Architecture

The following architecture diagram illustrates the various AWS components utilized to deliver the solution. For an in-depth explanation of the frontend and backend stacks, refer to the Architecture Design.

Alt text

Deployment Guide

To deploy this solution, please follow the steps laid out in the Deployment Guide

User Guide

For instructions on how to navigate the web app interface, refer to the Web App User Guide.

Developer Guide

For instructions on how to develop the application, refer to the Developer Guide.

Directories

├───aws_helpers
├───backend
│   └───cdk
│       ├───bin
│       ├───lambda
│       │   ├───create_db_user
│       │   ├───fetch_feedback
│       │   ├───start_ecs
│       │   ├───store_feedback
│       │   └───trigger_lambda
│       ├───layers
│       ├───lib
│       └───test
├───document_scraping
├───embeddings
├───flask_app
│   ├───documents
│   ├───embeddings
│   ├───filters
│   ├───llms
│   ├───prompts
│   ├───retrievers
│   ├───static
│   └───templates
└───misc
  1. /aws_helpers: Contains utilities classes / functions for connecting to AWS Services, used across the other portions of the app
  2. /backend/cdk: Contains the deployment code for the app's AWS infrastructure
    • /lambda: Contains the scripts for all lambda functions
    • /lib: Contains the deployment code for all 4 stacks of the infrastructre
  3. /document_scraping: Contains the scripts that run to scrape text from the information source websites
  4. /embeddings: Contains the scripts that convert the scraped texts to embeddings, then uploads them to the vectorstore
  5. /flask_app: Contains the inference and user interface code for the prototype question answering system
    • /documents: Functions relating to document loading
    • /embeddings: Classes relating to embeddings
    • /filters: Classes relating to document filters
    • /llms: Classes relating to LLMs, adapters to connect to LLMs, and helpers to load LLMs
    • /prompts: Prompt template definitions
    • /retrievers: Retriever classes for PGVector
    • /static: Static web content as .md or .json
    • /templates: HTML files with Jinja2 templates for the web app's pages
  6. /misc: Contains various other useful scripts, contents described in the Developer Guide

Changelog

N/A

Credits

Phase 2

Phase 2 of this application was architected and developed by Aman Prakash, with project assistance by Miranda Newell.

Phase 1

Phase 1 of this application was architected and developed by Arya Stevinson and Tien Nguyen, with project assistance by Victoria Li.

A special thanks to the UBC Cloud Innovation Centre Technical and Project Management teams for their guidance and support.

License

This project is distributed under the MIT License.

Licenses of libraries and tools used by the system are listed below:

PostgreSQL license

  • For PostgreSQL and pgvector
  • "a liberal Open Source license, similar to the BSD or MIT licenses."

BSD 3-clause

  • For networkx, Flask, pytorch, lxml, pandas, python-dotenv, dateparser, and scrapy

Apache 2.0

  • For transformers, sentence-transformers, fschat, and pyjson5

GNU Lesser General Public License

  • For psycopg2-binary

MIT License

  • For Spacy and dictdiffer

HFOIL 1.0

  • For Huggingface text-generation-inference, used by the DLC container for the sagemaker inference endpoint
  • "HFOIL is not a true open source license because we added a restriction: to sell a hosted or managed service built on top of TGI, we now require a separate agreement."

LLaMa 2 Community License Agreement

  • For Vicuna 1.5, tuned off Llama 2
  • Not true open source due to some restrictions regarding inappropriate use
  • "Your use of the Llama Materials must comply with applicable laws and regulations"
  • Also includes restrictions on solutions that have "700 million monthly active users"

LLaMa 3 Community License Agreement

  • For Llama 3 8B Instruct and Llama 3 70B Instruct models

Mistral Legal terms and conditions

  • For Mistral 7B Instruct and Mistral Large models