phase2 Feature Engineering

Abstract

One of the classic problems in the web domain is to yield the best results as part of a user query that can be in form of multimedia. There are various techniques one can employ for Information Retrieval (IR) from huge sets of multimedia consisting of text, images, audio, video, etc. In this phase, we focus on implementing a naive search engine by applying the concepts of vector models and similarity/distance measures for textual and visual descriptors of multimedia on a publicly available dataset comprising of various models to represent text and images.

The search engines may yield varying results based on their model implementation. We strongly focus on the use of proven similarity metrics like Cosine similarity, Euclidean distance and others that work well with text and images, to yield best results in the given time frame. We also employ state of the art techniques of dimensionality reduction using Principal Component Analysis (PCA), Singular Value Decomposition (SVD) and Latent Dirichlet Allocation (LDA).

Introduction

In this phase, we experiment with the dataset provided with respect to three entities - location, user and images where each user has clicked pictures at certain locations and has tagged them. It also provides two categories of information - some of the features are extracted for text (image tags, title) like TF, DF, TF-IDF forming a set of textual descriptors and various models for images like CN, CM, HOG, etc which are extracted from the images and form a set of visual descriptors. These descriptors help in creating a feature vector for each entity in the dataset and enable comparison between them by computing a similarity score using metrics (that work well with text and images) like Cosine similarity and Euclidean distance.

Steps to setup the project

Create a virtualenv in the local directory and activate the virtualenv
Maintain requirements.txt, but push the changes to requirements.txt which has main packages instead of all linked packages which come using pip freeze
pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
code		code
dataset		dataset
dumped_objects		dumped_objects
output		output
Project_Phase2_report.pdf		Project_Phase2_report.pdf
README.md		README.md
introduction.png		introduction.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

phase2 Feature Engineering

About

Releases

Packages

Contributors 5

Languages

Shivamdhar/CSE-515-FeatureEngineering

Folders and files

Latest commit

History

Repository files navigation

phase2 Feature Engineering

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages