Text Summarizer Using Phi 3

This repository contains a text summarizer that leverages the Phi 3 model to generate summaries from textual artifacts. The summarizer can process large texts, generate summaries, and convert them into markdown and PDF formats.

$~$

Features

Multi-GPU Support: Efficiently utilize multiple GPUs for faster processing.
Customizable Commands: Easily modify default commands and arguments.
Markdown Generation: Convert JSON summaries into well-structured Markdown documents.
PDF Conversion: Transform Markdown files into PDF documents.
Organized Summaries: Automatically organize summaries by author and generate a table of contents.

$~$

Installation

Fork the repository.

git clone https://github.com/your-username/TextSummarizer.git
cd TextSummarizer

Install the required dependencies:

pip install torch transformers nltk tqdm

Download the NLTK punkt tokenizer:

import nltk
nltk.download('punkt')

install CUDA toolkit and appropriate GPU drivers for your system to enable GPU acceleration.
Ensure you have sufficient disk space and RAM to load and run the Phi-3 model.
Run the main script llm_general_multi_process_gpu.py with the appropriate command-line arguments as shown in the example usage.

$~$

Usage

General Runner Usage

To run the summarizer, use the following command:

python3 utility/llm_general_multi_process_gpu.py \
    --input_json ./youtube/videos.json \
    --output_dir ./youtube/partitions \
    --gpu_devices 5 6 7 \
    --processes_per_gpu 2 \
    --model_id="microsoft/Phi-3.5-mini-instruct" \
    --first_summary_field="phi_mini_summary" \
    --summary_over_summary_field="summary_over_summary" \
    --artifact_text_field="transcript" \
    --debug

$~$

Example of Usage

Generate Markdown from JSON:

python3 utility/convert_json_with_summaries_to_md.py --input_json ./youtube/videos.json --output_md ./youtube/videos.md

Convert Markdown to PDF:

python3 utility/create_md_pdf.py --input_md ./youtube/videos.md --output_pdf ./youtube/videos.pdf

Generate Summaries with Custom Arguments:

python3 utility/llm_general_multi_process_gpu.py \
    --input_json ./custom_data/input.json \
    --output_dir ./custom_data/output \
    --gpu_devices 0 1 \
    --processes_per_gpu 1 \
    --model_id="microsoft/Phi-3.5-mini-instruct" \
    --first_summary_field="custom_summary" \
    --summary_over_summary_field="custom_summary_over_summary" \
    --artifact_text_field="custom_transcript" \
    --debug

$~$

Configuration

default_llm_commands.json: Contains default instructions for the summarizer. You can create a custom file to override these commands.
generation_args.json: Specifies generation arguments for the language model. Modify this file to adjust generation parameters.

$~$

Input Data Format

The input JSON file should contain a list of objects, each representing a document to be summarized. Each object should include:

The artifact's text field (specified by --artifact_text_field)
Any relevant context information

$~$

Output

The script generates JSON files with summaries in the specified output directory. You can use the utility scripts in the utility folder to convert the JSON output to Markdown and PDF formats.

$~$

Contributing

Contributions to this project are welcome! Please fork the repository and submit a pull request with your proposed changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Summarizer Using Phi 3

Table of Contents

Features

Installation

Usage

General Runner Usage

Example of Usage

Configuration

Input Data Format

Output

Contributing

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
utility		utility
README.md		README.md
default_llm_commands.json		default_llm_commands.json
generation_args.json		generation_args.json
llm_commands.json		llm_commands.json
llm_general_multi_process_gpu.py		llm_general_multi_process_gpu.py

Itz-Agasta/TextSummarizer

Folders and files

Latest commit

History

Repository files navigation

Text Summarizer Using Phi 3

Table of Contents

Features

Installation

Usage

General Runner Usage

Example of Usage

Configuration

Input Data Format

Output

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages