To learn general architecture and details of search engine.
Want your star to satisfy my little vanity ^_^.
开发结束啦,有空再认真写写README,详细文档请见技术文档
需要借鉴请star
Here should be an architecture image.
- Engine: manager of whole search engine
- Tokenizer: to tokenize (segment) Chinese sentences in documents and search sentence into words
- Indexer: to build index table and inverted index table of all documents, rank document relevance and sort
- Storage: to store tables into disk and manage cache
- Crawler: crawl pages from seed page or maintain a url set
- actor model design (for highly async)
- sync in actor system (using Future)
- file structure of inverted index table (for quickly reading)
- speed up searching documents (data structure)
- rank algorithm
- crawler
- maybe distribution? (actor model's advantage over CSP model)
MIT License
Copyright (c) 2020 Chunxu Zhang