数据地址:https://www.kaggle.com/Cornell-University/arxiv
大小:5GB左右
使用 Logstash 将 json 文件导入 Elasticsearch。
Logstash 配置如下:
input {
file {
path => "/opt/source/arxiv-metadata-oai-snapshot.json"
codec => json {
charset => "UTF-8"
}
start_position => "beginning"
tags => "arxiv-metadata-oai-snapshot"
}
}
output {
if "arxiv-metadata-oai-snapshot" in [tags] {
elasticsearch {
index => "arxiv-metadata-oai-snapshot"
hosts => ["192.168.112.157:9200"]
}
}
}
想先查询一个完全匹配的题目,可以用短语搜索:
参见:https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query-phrase.html
POST /arxiv-metadata-oai-snapshot/_search
{
"query": {
"match_phrase": {
"title": "The Stability Region of the Two-User Interference Channel"
}
}
}