-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.txt
393 lines (292 loc) · 14.2 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
# Anime/Manga Recommendation System with IR Algorithm
This Flask-based Anime/Manga recommendation system utilizes an IR
similarity algorithm to provide personalized suggestions based on user
preferences, including descriptions, ratings, interactions, and viewing
history. Built with Flask, it features document search, ranking, spell
correction, and user feedback, offering Anime/Manga enthusiasts a
comprehensive platform for seamless content discovery.
## Table of Contents
- [Description](##Description)
- [Installation](##Installation)
- [Run the Application](##RunthApplication)
- [Usage](##Usage)
- [Components](##Component)
- [Features](##Features)
- [Contributing](##Contributing)
- [License](##License)
- [Note](##Note)
## Description
This web application serves as an information retrieval system for
Anime/Manga recommendations, integrating features that enable users to
input their preferences and receive personalized suggestions. The system
leverages an Information Retrieval (IR) similarity algorithm to match
user preferences with Anime/Manga in the dataset. Recommendations are
generated based on various factors, including anime/manga descriptions,
ratings, user interactions, and the user's viewing history.
Built with Flask, the application also incorporates functionalities such
as document search and ranking, spell correction for user queries, and a
user feedback system. Users can explore and interact with the system
through a web-based interface.
This integration offers a comprehensive platform for Anime/Manga
enthusiasts, providing a seamless experience for discovering new content
tailored to their tastes. Whether users are seeking recommendations
based on descriptions, ratings, or personal viewing history, the system
aims to enhance the overall Anime/Manga discovery process.
## Installation
1. **Clone the repository :**
``` bash
git clone https://github.com/ReyKan-KP/Anime-Manga-Recommendation-System-Using-IR.git
```
2. **Install dependencies :**
``` bash
pip install -r requirements.txt
```
## Run the Application
1. **Run main.py :**
``` bash
python main.py runserver
```
2. **Open index.html:**
`Click on http://127.0.0.1:5000 (as shown in terminal)`
## Usage
1. Execute the script and follow the link to access the Anime/Manga
Recommendation page.
2. Specify your user status:
- If you're an existing user,Select Yes and provide your User ID
(greater than 1).
- If you're a guest,Select No then the User ID is set to 1.
3. Based on your input, you can:
- Enter the description of the Anime/Manga you wish to explore.
- Receive recommendations based on the previous user's watched or
read list (weighted terms of the user watch or read document are
considered, and cosine similarity is calculated based on the
resulting weight index).
4. Enter the Page Number and click submit.
5. Review the ranking determined by the combined score (cosine
similarity + normalized evaluation).
- Normalized evaluation is calculated as (evaluation/totalDocs) \*
100, where evaluation is the product of feedback and rating.
6. In the feedback box, enter relevant document IDs following the
provided instructions.
7. Upon submitting feedback, the feedback count for the specified
document ID will increment by 1, and the normalized evaluation will
be updated in the dataset.
8. For the Assessment of the IR system, a PR curve and an 11-point
interpolated precision graph will be displayed for the top 10
documents based on the relevance document entered by the user in the
feedback box.
9. For a new query or recommendation, reload the page.
## Components
1. ### DataSet - Collected from My Anime List dataset from Kaggle
- **main.csv**
- This CSV file contains a list of 87,316 anime/manga entries.
- **Path:** `main dataset/main.csv`
- **Columns:**
`ID, Title, Description, Rating, Feedback, Evaluation, Normalized Evaluation`
- **user_profiles.csv**
- This CSV file contains a list of 75,989 user profiles.
- **Path:** `main dataset/user_profiles.csv`
- **Columns:** `User ID, Profile, Watch or Read`
2. ### PreProcessing
- **Lowercasing:**
- Converts all text to lowercase to ensure consistency and
avoid case-sensitive discrepancies.
- **Tokenization:**
- Breaks down sentences or paragraphs into individual words or
tokens, essential for text-based analysis.
- **Stopwords Removal:**
- Removes common words (stopwords) that do not contribute
significantly to the meaning of the text, reducing noise in
the data.
- **Stemming:**
- Reduces words to their base or root form, grouping together
words derived from the same root, even with different
inflections or suffixes.
3. ### Indexing
- **Index File Format:**
- **Path:** `index.txt`
- The index file is a `.txt` file storing term information in
the format: `term - (doc_id, term_frequency)`.
- **Indexer Function:**
- The `makeIndex()` function is crucial for indexing
anime/manga descriptions, processing each description and
creating entries in the index file.
- **Index Loading:**
- At runtime, the program loads the `index.txt` file to access
the pre-built index.
- **makeIndexMap() Function:**
- The `makeIndexMap()` function generates a dictionary
containing term, doc_ID, and frequency without reading the
entire document.
4. ### Search and Ranking
- **Types of Search:**
1. **Query Search:**
- Involves searching based on user-provided queries.
2. **Recommendation Search:**
- Provides recommendations based on the user's watch/read
history.
- **Term Frequency Calculation:**
- Term frequency is calculated using the formula:
`1 + log2(tf)`.
- **Inverse Document Frequency Calculation:**
- Inverse Document Frequency (IDF) is calculated as:
`1 + log2(total_Doc/Doc_freq)`.
- **Weight Index:**
- The weight index for each term is determined by multiplying
the term frequency (tf) and inverse document frequency
(idf): `tf * idf`.
- **Recommendation Search Workflow:**
1. **User Watch/Read History:**
- The system gathers the watch/read history of the user.
2. **Weight Index Calculation:**
- Calculates the weight index of each document that the
user has watched/read.
3. **Query Weight Index:**
- Obtains the query weight index by summing the weight
indices of the user's watch/read history.
4. **Cosine Similarity:**
- Calculates cosine similarity to determine the similarity
between each document's weight index and the query
weight index.
- **Cosine Similarity:**
- Measures the cosine of the angle between two vectors,
providing a similarity score.
- **Search Optimization:**
- Utilizes weighted indices, term frequency, and inverse
document frequency for accurate and relevant search results.
- Cosine similarity ensures efficient comparison and ranking
of documents based on their relevance.
- **Result Presentation:**
- Presents search results in a ranked order, with higher
cosine similarity scores indicating higher relevance.
5. ### Spell Checking
- The `correct_spelling` function is a spell checker designed to
identify and correct misspelled words in a given query.
- **Library Used:** The function utilizes the `enchant` library,
specifically the English (U.S.) dictionary, to check word
spelling.
- **Input:** Accepts a query (text) string as input.
- **Processing:**
- Splits the input query into individual words.
- Checks if each word is misspelled using the `enchant`
dictionary.
- If misspelled, suggests a correction using the first
suggestion from the dictionary.
- Assembles corrected words into a new query string.
- **Output:** Returns the corrected query.
6. ### Feedback
- **Feedback IDs Extraction:**
- The system accepts a list of feedback IDs from the user,
obtained from the JSON payload using
`request.json.get('feedbackIDs', [])`.
- **Feedback Function:**
- The `feedback()` function processes received feedback IDs,
updating the dataset by incrementing feedback count,
recalculating evaluation based on rating, and computing
normalized evaluation. The updated dataset is saved to 'main
dataset/main.csv', and feedback IDs are printed for
confirmation.
- **Normalized Evaluation Map:**
- The function retrieves the normalized evaluation map using
the `getNormalizedEvaluation` function from the
'preprocessingAndIndexing' module.
- By incorporating this feedback component, the system allows
users to provide feedback on specific IDs, updating the dataset
for continuous improvement in recommendations or search results.
7. ### Assessment: Information Retrieval (IR) Evaluation
- **Precision-Recall Curve Generation:**
- The `PRcurve` module generates Precision-Recall curves to
evaluate the performance of the Information Retrieval (IR)
system.
- **PRcurve Function:**
- Takes parameters including total_docs, relevant_docs_id, and
top_10 to calculate precision and recall at each rank in the
top 10. Interpolation is applied, and the resulting curve is
plotted using Matplotlib.
- **Workflow:**
1. **Relevance Annotation:**
- Determines the relevance of each document in the top 10
based on provided relevant document IDs.
2. **Precision and Recall Calculation:**
- Calculates precision and recall at each rank, providing
insights into system performance.
3. **Interpolation:**
- Interpolates precision values at 11 recall levels,
producing a smooth Precision-Recall curve.
4. **Curve Plotting:**
- Plots the Precision-Recall curve for visualization.
- **Result Interpretation:**
- Precision and recall values breakdown the performance at
each rank in the top 10.
- Interpolated precision values offer a comprehensive view of
the system's behavior across different recall levels, aiding
in evaluation and comparison of retrieval strategies.
8. ### User Interface (UI) Component
The UI component of the Anime/Manga Recommendation System employs
Flask, HTML, CSS, and JavaScript for an interactive user experience.
Key elements include a spell checker, Flask routes for processing
user input and feedback, HTML templates, and JavaScript for user
interactions.
1. **Components**
1. **Spell Checker :**
- The `correct_spelling` function corrects misspelled
words using the `enchant` library.
2. **Flask Routes :**
- `/`: Renders the main HTML page.
- `/process_user_input`: Processes user queries and
recommendations.
- `/process_feedback_input`: Handles user feedback.
3. **HTML Templates :**
- The main template (`index.html`) defines the UI
structure, including forms, loading indicators, and
result containers.
4. **JavaScript Interaction:**
- JavaScript functions handle user interactions like user
type selection, entering IDs, and form submissions.
2. **User Interaction Workflow**
1. **User Type Selection:**
- Users choose existing or guest status.
2. **User ID and Options:**
- For existing users, ID entry and dynamic options for
query or recommendations are provided.
3. **Query and Page Input:**
- Users enter queries or page numbers.
4. **Search Execution:**
- Clicking search triggers Flask to process input and
display results.
5. **Feedback Submission:**
- Users can submit feedback, updating the system dataset.
3. **Loading and Feedback**
- Loading indicators signal ongoing processes.
- Feedback prompts are presented after search results for user
contribution.
## Features
1. **Data collection for comprehensive content coverage.**
2. **Preprocessing for data optimization**
3. **Indexing for efficient data retrieval.**
4. **Document search and ranking.**
- **Search based on user query.**
- **Recommendation based on user's previous watch or read
history.**
5. **Spell correction for user queries.**
6. **User feedback handling.**
7. **Assessment Evaluation with PR Curve.**
8. **Web-based interface using Flask.**
## Contributing
We welcome contributions from the community! If you'd like to contribute
to this project, here's how:
### To contribute, follow these steps:
1. Fork the repository.
2. Create a new branch (`git checkout -b feature/your-feature-name`).
3. Commit your changes (`git commit -m 'Add some feature'`).
4. Push to the branch (`git push origin feature/your-feature-name`).
5. Open a Pull Request.
## License
This project is licensed under the [MIT License](LICENSE).
## Note
- Make sure to have the necessary CSV files ('main dataset\\main.csv'
and 'main dataset\\user_profiles.csv') in the specified directory
for data retrieval.
This README provides an overview of the Anime Recommendation System. For
more details on the code and functionality, please refer to the code
comments and documentation within the script.