This repository contains the code for an Arabic text summarization system. The system utilizes the AraBART model, based on the Transformers architecture, to generate summaries for Arabic text documents. The model is trained using a labeled dataset and is capable of both extractive and abstractive summarization.
To run the code, please follow the steps below:
- Clone the repository:
- Change the directory to the project folder:
- Install the required dependencies:
git clone https://github.com/Geo-y20/Text-Summarization-in-Arabic.git
cd Text-Summarization-in-Arabic
pip install -r requirements.txt
To train the summarization model, you need to provide a labeled dataset. The dataset should be in JSONL format, where each line represents a document with its corresponding summary. Modify the train.py
file to load your labeled dataset and adjust the training parameters if necessary. Then, run the following command to start the training process:
python train.py
To generate summaries using the trained model, you can provide a separate validation dataset or test the model on your own text data. Modify the inference.py
file to load your dataset and adjust the inference parameters as needed. Run the following command to generate summaries:
python inference.py
The repository structure is organized as follows:
- data/ - labeled_dataset.jsonl # Labeled dataset for training - validation_dataset.jsonl # Dataset for validation or testing - models/ - trained_model/ # Saved trained model - config.json - pytorch_model.bin - ... - utils/ - preprocessing.py # Preprocessing utilities - evaluation.py # Evaluation metrics - train.py # Training script - inference.py # Inference script - requirements.txt # Dependencies - README.md # Project documentation - LICENSE # License information
Contributions to this project are welcome. If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.
Our participation in A competition and finals
This project is licensed under the MIT License.
For any questions or inquiries, please feel free to reach out to the project maintainers:
- George Youhana - [email protected]
- Mostafa Magdy - [email protected]
- Abdallah Alkhouly- [email protected]
- Ahmed Hafez- [email protected]
- Mahmoud Yasser- [email protected]