ElasticSearch_Podcast_Search

Introduction

This is the group project of DD2477 Search Engines and Information Retrieval Systems (60034) at KTH. We implemented a small search engine which allows user to search for their interested podcast clips based on their query and time constraint. We use the famous SPOTIFY PODCAST DATASET, which includes the text information of the podcast transcripts and time markers. The main backend framework is Elasticsearch. It's used to index the transcriptions of the podcasts dataset and return ranked search results. The GUI is implemented with PyQt, where users can specify the query and time limit in the text box.

Members and Contribution

Minchong Li: Backend logic design
Tengfei Lu: GUI design
Zihao Xu: Data indexing and Elasticsearch query design

Setup

Make sure you have downloaded the SPOFITY PODCAST DATASET, and most importantly, the Elasticsearch engine is installed on your machine.
Go to config.yaml: modify the meta_path and trans_root values to your own path for the dataset.
Launch Elasticsearch by running elasticsearch.bat.
Run index.bat if you have not indexed the dataset.
Run search.bat to start the search engine.

Usage

Select search method and input the time limit in the corresponding text box. The status bar in the bottom will show as "set time limit: x min" if time limit is set successfuly.
Input the query and hit the Enter key. The result box at the middle lists top of the filename of related transcripts form sorted by score from high to low. The status bar will show how much time was consumed during search.
Click on an item, and left your cursor on that. There will be a float window that shows the episode_name of that item.
Double click on an item and the transcript text will be displayed in the text box at the bottom.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
eval		eval
.gitignore		.gitignore
GUI.py		GUI.py
README.md		README.md
clip.py		clip.py
config.yaml		config.yaml
demo.png		demo.png
es_client.py		es_client.py
icon.png		icon.png
index.bat		index.bat
main.py		main.py
search.bat		search.bat
searcher.py		searcher.py
transcript.py		transcript.py
transcript_dict.py		transcript_dict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ElasticSearch_Podcast_Search

Introduction

Members and Contribution

Setup

Usage

About

Releases

Packages

Contributors 3

Languages

DayBeha/ElasticSearch_Postcasts

Folders and files

Latest commit

History

Repository files navigation

ElasticSearch_Podcast_Search

Introduction

Members and Contribution

Setup

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages