This rating is based on the popular IMDb website that offers movies and TV shows information. They made their datasets publicly available on IMDb Datasets. Your mission is to write a web application in Java that, if you choose to accept it, can meet the following requirements:
To-Do #1 (easy):
Typecasting: Given a query by the user where they provide an actor/actress name, the system must determine whether that person is typecast (at least half of their work is a genre).
To-Do #2 (easy):
Find the coincidence: When given a query by the user where the input is two actor/actress names, the app responds with a list of movies or TV shows shared by both people.
Requirement #3 (difficult):
Six grades of Kevin Bacon: Upon the user's query, you must provide the grade of distinction between the person the user is entering (eg Gamer or player) and Kevin Bacon.
- Requirement 1 [Done]
- Requirement 2 [Done]
- Requirement 3 [Not Done]
*First of all, we can access the datasets publicly provided by imdb from these links: https://www.imdb.com/interfaces/, https://datasets.imdbws.com/
* We need to download these files and put them in the data folder under resource. For our DataLoader class to work correctly.
* `name.basics.tsv.gz`
* `title.akas.tsv.gz`
* `title.basics.tsv.gz`
* `title.crew.tsv.gz`
* `title.episode.tsv.gz`
* `title.principals.tsv.gz`
* `title.ratings.tsv.gz`
-
If you have a running MySQL Server, run
docker-compose up
to start MySQL. -
We have two options for importing the csv files that Imdb offers us to the database:
-
Exporting our database and sending it to you and importing it.
-
The other way is
http://localhost:8181/data/load
by calling this request via swagger and getting our data We can first unzip it to the directory we specify and process it by reading it line by line. It should write something like this at the bottom of the log line that the load process is complete: INFO 7086 --- [pool-1-thread-1] c.c.l.movies.service.data.DataLoader : Total time to load data in ms:456223) *This process may vary depending on the speed of your computer. The data size is too much, about 10 million records are thrown. Here, index definitions are made so that our tables do not have performance problems. These sqls are defined at the bottom.
-
This project has swagger ui integration. From there, we can manually test our related methods. The swaggerConfig configurations of this framework have been performed. By entering localhost:8181/swagger-ui.html we can see the links that we can run from there.
- Load data (csv -> mysql) via http endpoint:
- localhost:8181/data/load
- Requirement 1, use (GET):
localhost:8181/search/names/?name=Keanu Reeves&full-fetch=false
localhost:8181/search/names/?name=David Schwimmer&full-fetch=false
localhost:8181/search/names/?name=Steve McQueen&full-fetch=false
localhost:8181/search/names/?name=Matt LeBlanc&full-fetch=false
- Requirement 2, use:
-
POST with body:
{ "names":["Angelina Jolie", "Brad Pitt"] }
To URL:
localhost:8181/search/names/coincidence
-
POST with body:
{ "names":["Orlando Bloom", "Keira Knightley"] }
To URL:
localhost:8181/search/names/coincidence
-
POST with body:
{ "names":["David Schwimmer", "Matt LeBlanc"] }
To URL:
localhost:8181/search/names/coincidence
-
POST with body:
{ "names":["George Clooney", "Brad Pitt"] }
To URL:
localhost:8181/search/names/coincidence
-
POST with body:
{ "names":["George Clooney", "Matt Damon"] }
To URL:
localhost:8181/search/names/coincidence
-
POST with body:
{ "names":["George Clooney", "Matt Damon", "Brad Pitt"] }
To URL:
localhost:8181/search/names/coincidence
-
POST with body:
{ "names":["John Travolta", "Uma Thurman", "Samuel L. Jackson", "Bruce Willis"] }
To URL:
localhost:8181/search/names/coincidence
-
POST with body:
{ "names":["Al Pacino", "Robert De Niro", "Val Kilmer", "Jon Voight"] }
To URL:
localhost:8181/search/names/coincidence
-
- aka -> title_id
ALTER TABLE `movies_search_service_db`.`aka` ADD INDEX `title_id_idx` (`title_id` ASC);
- basic -> primary_title
ALTER TABLE `movies_search_service_db`.`basic` ADD INDEX `primary_title_idx` (`primary_title` ASC);
- basic -> original_title
ALTER TABLE `movies_search_service_db`.`basic` ADD INDEX `original_title_idx` (`original_title` ASC);
- episode -> parent_tconst
ALTER TABLE `movies_search_service_db`.`episode` ADD INDEX `parent_tconst_idx` (`parent_tconst` ASC);
- basic -> genres
ALTER TABLE `movies_search_service_db`.`basic` ADD INDEX `genres_idx` (`genres` ASC);
- principal -> tconst
ALTER TABLE `movies_search_service_db`.`principal` ADD INDEX `principal_tconst_idx` (`tconst` ASC);
- name -> primary_name
ALTER TABLE `movies_search_service_db`.`name` ADD INDEX `primary_name_idx` (`primary_name` ASC);
- principal -> nconst
ALTER TABLE `movies_search_service_db`.`principal` ADD INDEX `principal_nconst_idx` (`nconst` ASC);
- "Film-Noir"
- "Action"
- "War"
- "History"
- "Western"
- "Documentary"
- "Sport"
- "Thriller"
- "News"
- "Biography"
- "Adult"
- "Comedy"
- "Mystery"
- "Musical"
- "Short"
- "Talk-Show"
- "Adventure"
- "Horror"
- "Romance"
- "Sci-Fi"
- "Drama"
- "Music"
- "Game-Show"
- "Crime"
- "Fantasy"
- "Animation"
- "Family"
- "Reality-TV"
select distinct category from principal limit 30590275;
- Results:
- self
- director
- cinematographer
- composer
- producer
- editor
- actor
- actress
- writer
- production_designer
- archive_footage
- archive_sound