GitHub - abhiKohar/lucene_modified: Entity search using lucence

Purpose of the software // What is the software for?

Current lucene/elastic search implementations do not support boosting for nested span queries. Though you can specify boost as an argument but it has no effect on search results. This is an effort to extend lucene to support such nested span queries. This is useful for entity semantic search and entity semantic document search. This makes the scoring more relevant. This is an example query. For example:

Span{

#professor

}

Span{

Data mining

Boost : 5

}

Span{

Illinois

Boost : 2

}

Span {

#img

Boost : 20

}

We can adjust our search results/scores based on the above boosting factors.

Actual query examples: Entity Search (find professors in data mining) GET /entity_search_cs_departments/_search_with_clusters? { "search_request" : { "query": "#professor mining " ,
"size":100 } } Entity-Semantic Document Search (find home pages of professors in data mining) GET /entity_search_cs_departments/_es_document_search? { "search_request":{ "query": "@near ( #professor #email #phone ) @contains ( mining )", "size" : 100 } }

System requirements and dependencies // What machine/OS/library requirements? How to set up the requirements/dependencies?

This has been tested on linux ubuntu – 16.04

Requirements:

Java – jre / jdk 1.8

Ant – latest

Gradle – 4.4

Intellij -latest

Ivy-bootstrap

Execution // How to run the code? To compile the sources run 'ant compile' [can do from intellij also]

To run all the tests run 'ant test'

To setup your ide run 'ant idea' [used with intellij before import], 'ant netbeans', or 'ant eclipse'

For Maven info, see dev-tools/maven/README.maven

Design/Code organization

To incorporate the changes following design and implementations have been made.

Changes to spanboostQuery so that we can tell whether the nested query is boosted. This inherently uses spannearsunordered and spannearsordered - which needs to be changed to modify and pass the nested parameter. once that is identified - we need to pass the boost to spanweight class and span scorer so that while calculating the span we can adjust the score and weight of each span accordingly.

This has been tested by writing test cases in the TestSpanBoostAbhinav and TestSpanBoostQuery.

Name		Name	Last commit message	Last commit date
Latest commit History 1,319 Commits
dev-tools		dev-tools
lucene		lucene
solr		solr
.gitignore		.gitignore
.hgignore		.hgignore
README.md		README.md
build.xml		build.xml
helloWorld.py		helloWorld.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

abhiKohar/lucene_modified

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages