Skip to content

Latest commit

 

History

History
75 lines (60 loc) · 6.71 KB

README.md

File metadata and controls

75 lines (60 loc) · 6.71 KB

Omdena Kolkata Chapter - Text Summarization

Creating a Text Summarization Tool to Combat the Overload of Information

  • Given the fast pace of today's world, it is possible for many of us to be shocked by the information/overloaded content in the mail, news, internet, and many more places.
  • Text summarization, an automated process of generating a concise and cohesive summary of a longer document can come to our rescue, as we can get the main point from a long text quickly.
  • How does it benefit us? It can help us save time, increase productivity and enhance understanding of the long text, thus making it a valuable tool.
  • Text summarization can be particularly important when considering a dataset like the Daily Mail, which is known for its vast amount of news articles and information.
    • News Consumption: The Daily Mail dataset contains a large volume of news articles covering diverse topics. Text summarization enables readers to quickly skim through and grasp the main points of each article, facilitating efficient news consumption. It allows users to stay informed about current events without having to read every article in its entirety.

    • Time-Saving: The Daily Mail dataset is extensive, and reading each article thoroughly would be time-consuming. By using text summarization techniques, individuals can save time by obtaining condensed versions of articles. They can quickly identify the articles that are most relevant to their interests or extract key information from multiple articles efficiently.

    • Content Indexing: Text summarization plays a crucial role in organizing and indexing the Daily Mail dataset. By generating summaries for each article, it becomes easier to categorize and search for specific information within the dataset. This indexing capability enables efficient retrieval of relevant articles based on specific keywords or topics.

    • Content Aggregation: The Daily Mail dataset includes a wide range of articles on different subjects. Text summarization helps in aggregating information from multiple articles into concise summaries. By combining and condensing the key points from various sources, it becomes possible to provide users with a comprehensive overview of a specific topic, combining insights from different articles.

    • Trend Analysis: Text summarization techniques can assist in analyzing trends and patterns within the Daily Mail dataset. By summarizing numerous articles on a particular subject, it becomes easier to identify common themes, opinions, or emerging trends. This information can be valuable for researchers, analysts, or businesses aiming to understand public opinion, market trends, or sentiment on specific topics.

    • Content Personalization: Text summarization algorithms can be used to personalize content recommendations based on user preferences within the Daily Mail dataset. By summarizing articles and understanding user preferences, it becomes possible to deliver tailored summaries or recommendations that match individual interests. This enhances the user experience by providing relevant and concise information.

Contribution Guidelines

  • Have a Look at the project structure and folder overview below to understand where to store/upload your contribution
  • If you're creating a task, Go to the task folder and create a new folder with the below naming convention and add a README.md with task details and goals to help other contributors understand
    • Task Folder Naming Convention : task-n-taskname.(n is the task number) ex: task-1-data-analysis, task-2-model-deployment etc.
    • Create a README.md with a table containing information table about all contributions for the task.
  • If you're contributing for a task, please make sure to store in relavant location and update the README.md information table with your contribution details.
  • Make sure your File names(jupyter notebooks, python files, data sheet file names etc) has proper naming to help others in easily identifing them.
  • Please restrict yourself from creating unnessesary folders other than in 'tasks' folder (as above mentioned naming convention) to avoid confusion.

Project Structure

├── LICENSE
├── README.md          <- The top-level README for developers/collaborators using this project.
├── original           <- Original Source Code of the challenge hosted by omdena. Can be used as a reference code for the current project goal.
│ 
│
├── reports            <- Folder containing the final reports/results of this project
│   └── README.md      <- Details about final reports and analysis
│ 
│   
├── src                <- Source code folder for this project
    │
    ├── data           <- Datasets used and collected for this project
    │   
    ├── docs           <- Folder for Task documentations, Meeting Presentations and task Workflow Documents and Diagrams.
    │
    ├── references     <- Data dictionaries, manuals, and all other explanatory references used 
    │
    ├── tasks          <- Master folder for all individual task folders
    │
    ├── visualizations <- Code and Visualization dashboards generated for the project
    │
    └── results        <- Folder to store Final analysis and modelling results and code.

Folder Overview

  • Original - Folder Containing old/completed Omdena challenge code.
  • Reports - Folder to store all Final Reports of this project
  • Data - Folder to Store all the data collected and used for this project
  • Docs - Folder for Task documentations, Meeting Presentations and task Workflow Documents and Diagrams.
  • References - Folder to store any referneced code/research papers and other useful documents used for this project
  • Tasks - Master folder for all tasks
    • All Task Folder names should follow specific naming convention
    • All Task folder names should be in chronologial order (from 1 to n)
    • All Task folders should have a README.md file with task Details and task goals along with an info table containing all code/notebook files with their links and information
    • Update the task-table whenever a task is created and explain the purpose and goals of the task to others.
  • Visualization - Folder to store dashboards, analysis and visualization reports
  • Results - Folder to store final analysis modelling results for the project.

Resource

  • Omdena Local chapter Link here
  • Omdena Github Link here