Download 50 public profile PDFs from my LindkIn and storing them in TASK 1 folder
In Task 2 , Public profile PDFs are converted into text . Conversion of PDF into text is done through slate library .In Task 2 , while loop used for loading and converting multiple PDFs into text. After that adding the text generated into list and then converting list into DataFrame . Store the data frame in CSV named Profile_text.
In Task 3, Text generated in Task 2 are loaded and Tokenized into words . After that remove the stop words row wise.Extract most frequent words (5 words) of each profile and use RAKE library to extract essential words of each profile PDFs. Finally Store the file in CSV named Profile_keyword.