GitHub - SiriuslySirius/positional-index-query-implementation

Positional Index Query Implementation
This program constructs positional indexes and processing phrase and proximity queries from a large document corpus (over 30,000 documents) from Project Gutenberg. Query results are stored in CSV files for ease of reading through spreadsheet software. Query results are bi-direction, meaning the order of the queried words do not matter as both ordering will be included.

There are two CSV files that will be generated by the program, one with just the DocID, first term index and second term index, which are required by the assignment, and another file that is a detailed version that has everything from the first file, but it also includes the filepath to the text file and the exact phrase from the text file. A new file will be created for every unique query, otherwise non-unique queries will have their files overwritten in case you decide to use a different corpus or add on to it. When validating from the detailed version, keep in mind that symbols, white spaces, and numbers are not included in the results, so you will need to account for that if you're using the search function of whatever text editor you are using to test the query results against results from searching the document from a text editor.

How to Compile:
javac -O .\PositionalIndex.java

How to Run and their Parameters:
java PositionalIndex <path-to-input-files> <path-to-output-result-files> <first-word> <second-word> <int-distance-between-words>

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
gutenberg-corpus-sample		gutenberg-corpus-sample
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
PositionalIndex.java		PositionalIndex.java
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

SiriuslySirius/positional-index-query-implementation

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages