Skip to content

Latest commit

 

History

History
65 lines (61 loc) · 2.09 KB

analysis.md

File metadata and controls

65 lines (61 loc) · 2.09 KB

EngineActor - Done AddRequest(response) get doc id ! SegDocReq(id, html) - Done SearchRequest(word, cb) ? SegSWReq(word) -> words ! IndexSearchReq(words, cb)

TokenizeActor - Done TokenizeDocumentRequest(id, html) extract content body word segmentation, the result should be: [ {keyword: "apple", position: [2, 3, 4]} ] ! IndexRequest(id, {content, url, title}, words) - Done TokenizeSearchWordRequest(word) word segmentation sender ! words

IndexActor - Done IndexRequest(id, content, words) get hash from content, hash must be average ? SCReq(hash, content) -> offset engine.indexTable[id] = offset merge engine.invertedIndexTable, it should be: { "word for one doc in Tokenize": [start position of the `word` in doc], "word for global docs": {"indics": [(docId, pos of pos), ...], "position": [[start position of the `word` in doc], ...]}, "apple": {"indics": [(1, 2), (3, 0), (2, 1)], "position": [[2, 3, 4], [1, 2], [4, 5]]} ... } when engine.invertedIndexTable size(?) > xxx: ! FlushInvertedIndexRequst() when engine.indexTable size % xxx === 0: ! FlushIndexRequest() - Done IndexSearchRequest(words, cb) for word in words: ! FindInvertedIndexItemRequest(word) -> futureList await all futureList !!! merge calculate BM25 sort replace id into (title, url, content) replace content into [first position:XXX] cb( [(title, url, content), [positions]] )

StorageActor - StoreContentRequest(hash, content) fileID <- hash % FileScale get file offset ! sender offset ( Future ) write content to the file close - update FlushIndexRequest() !!! - update FlushInvertedIndexRequst() !!! - update FindInvertedIndexItemRequest(word) !!!