Skip to content

Latest commit

 

History

History
38 lines (36 loc) · 1.04 KB

README.md

File metadata and controls

38 lines (36 loc) · 1.04 KB

IMDB_scrapper

This is a script written to extract all relevant information related to top 100 movies from the website:https://www.imdb.com/search/title/?count=100&groups=top_1000&sort=user_rating%27

Step 1: Following features were considered:
    1. Date of scraping
    1. title (movie name)
    1. certification
    1. duration of the movie (time)
    1. genre
    1. release date
    1. release country
    1. rating
    1. users
    1. critic
    1. summary of the movie
    1. director
    1. writer
    1. primary-actor
    1. meta score
    1. primary image
    1. primary video
    1. other images link
    1. other video link
    1. all actors and their characters in the movie
    1. plot
    1. plot keywords
    1. languages
    1. filming_location
    1. budget
    1. opening_weekend
    1. gross_amount
    1. cumilative_gross
    1. production_company
    1. sound_mix
    1. aspect_ratio
Step 2: All the above features are extracted for top 100 movies listed on the webpage.
Step 3: After extraction, all the information about each movie is stored in a json format.