Arabic Text Summarization

This repository contains the code for an Arabic text summarization system. The system utilizes the AraBART model, based on the Transformers architecture, to generate summaries for Arabic text documents. The model is trained using a labeled dataset and is capable of both extractive and abstractive summarization.

Installation

To run the code, please follow the steps below:

Clone the repository:

git clone https://github.com/Geo-y20/Text-Summarization-in-Arabic.git

Change the directory to the project folder:

cd Text-Summarization-in-Arabic

Install the required dependencies:

pip install -r requirements.txt

Usage

Training

To train the summarization model, you need to provide a labeled dataset. The dataset should be in JSONL format, where each line represents a document with its corresponding summary. Modify the train.py file to load your labeled dataset and adjust the training parameters if necessary. Then, run the following command to start the training process:

python train.py

Inference

To generate summaries using the trained model, you can provide a separate validation dataset or test the model on your own text data. Modify the inference.py file to load your dataset and adjust the inference parameters as needed. Run the following command to generate summaries:

python inference.py

Directory Structure

The repository structure is organized as follows:

- data/
  - labeled_dataset.jsonl      # Labeled dataset for training
  - validation_dataset.jsonl   # Dataset for validation or testing
- models/
  - trained_model/             # Saved trained model
    - config.json
    - pytorch_model.bin
    - ...
- utils/
  - preprocessing.py           # Preprocessing utilities
  - evaluation.py              # Evaluation metrics
- train.py                     # Training script
- inference.py                 # Inference script
- requirements.txt             # Dependencies
- README.md                    # Project documentation
- LICENSE                      # License information

Contributing

Contributions to this project are welcome. If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

Our participation in A competition and finals

License

This project is licensed under the MIT License.

Contact

For any questions or inquiries, please feel free to reach out to the project maintainers:

George Youhana - [email protected]
Mostafa Magdy - [email protected]
Abdallah Alkhouly- [email protected]
Ahmed Hafez- [email protected]
Mahmoud Yasser- [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Data		Data
README.md		README.md
Text-Summarization-in-Arabic.ipynb		Text-Summarization-in-Arabic.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic Text Summarization

Installation

Usage

Training

Inference

Directory Structure

Contributing

License

Contact

About

Releases

Packages

Languages

Geo-y20/Text-Summarization-in-Arabic

Folders and files

Latest commit

History

Repository files navigation

Arabic Text Summarization

Installation

Usage

Training

Inference

Directory Structure

Contributing

License

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages