Skip to content

Summarization of PDF file using LangChain and GPT 3.5

Notifications You must be signed in to change notification settings

orlovtsu/pdf_summarization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Summarizer

This project is a Streamlit application that summarizes the content of PDF files using natural language processing (NLP) techniques.

It utilizes the LangChain library and the OpenAI GPT-3.5 model for generating summaries.

This application requires OpenAI API key. Summarization will be charged by OpenAI in accordance with their prices for API requests.

Features

  • Upload a PDF file and get a concise summary of its content.
  • The summary focuses on key facts and is typically 5-7 sentences long.
  • The application uses a character-based text splitter for chunking the text.
  • Embeddings are generated using the sentence-transformers/all-MiniLM-L6-v2 model.
  • The summaries are generated using the GPT-3.5 model with a temperature setting of 0 for deterministic outputs.

Installation

Recommended way

To set up the project using Docker, follow these steps:

  1. Clone the repository:
git clone https://github.com/orlovtsu/pdf_summarization.git
  1. Navigate to the project directory and run:
docker compose build --no-cache
docker compose up

Once the docker is built and app is running, you can access it in your web browser at http://localhost:8501

Alternative way

To set up project using streamlit, follow these steps:

  1. Clone the repository:
git clone https://github.com/orlovtsu/pdf_summarization.git
  1. Navigate to the project directory and install the required dependencies:
pip install -r requirements.txt
  1. Start the application using
streamlit run app.py

Once the application is running, you can access it in your web browser at http://localhost:8501

Usage

  1. To run the application you will need your own OpenAI API Key:
  1. Once you have OpenAI key in the clipboard, paste it into the field "Enter your unique key:"
  2. Send Submit
  3. Click Browse files and select your PDF file
  4. Once the file is uploaded click "Summarize PDF" and wait for a several seconds.

After the response you will see a summarized response and information about your query and cost of it.

Tokens Used: 1924 Prompt Tokens: 1800 Completion Tokens: 124 Successful Requests: 1 Total Cost (USD): $0.005896

Contributing

Contributions are welcome! If you have any suggestions or improvements, feel free to submit a pull request or open an issue.

About

Summarization of PDF file using LangChain and GPT 3.5

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published