feat: Set parameters for Milvus using the configuration file #998 #1001

e7217 · 2024-11-25T06:15:57Z

description
I added parameters to the constructor that apply to index_type, nlist, M, efConstruction, and others.

vectordb:
- name: autorag_2024_xx_xx
  db_type: milvus
  ...
  index_type: hnsw
  params: 
    M : 8
    efConstruction: 50

refer to langchain-milvus :
https://github.com/langchain-ai/langchain-milvus/blob/288f5197f1e68d1cc98a416e0f7b8f3a6a3a9517/libs/milvus/langchain_milvus/vectorstores/milvus.py#L657

After modifying my configuration file, I was able to get successful results.

vectordb:
- name: autorag_2024_xx_xx
  db_type: milvus
  embedding_model: openai
  collection_name: autorag_2024_xx_xx
  uri: http://192.xxx.xxx.xxx:19530
  embedding_batch: 50
  similarity_metric: l2
  # index_type: hnsw
  params: 
    nlist : 16384
    # M : 8
    # efConstruction: 50
node_lines:
- node_line_name: retrieve_node_line
  nodes:
    - node_type: retrieval
      strategy:
        metrics: [retrieval_f1, retrieval_ndcg, retrieval_map]
      top_k: 3
      modules:
        - module_type: bm25
          bm25_tokenizer: [ ko_kiwi, ko_okt, ko_kkma ]
        - module_type: vectordb
          vectordb: autorag_2024_xx_xx
        - module_type: hybrid_rrf
        - module_type: hybrid_cc
          normalize_method: [ mm, tmm ]
- node_line_name: post_retrieve_node_line
  nodes:
    - node_type: prompt_maker
      strategy:
        metrics:
          - metric_name: rouge
          - metric_name: sem_score
            embedding_model: openai
#          - metric_name: bert_score
#            lang: ko
        generator_modules:
          - module_type: openai_llm
            llm: gpt-4o-mini
      modules:
        - module_type: fstring
          prompt:
          - | 
            단락을 읽고 질문에 답하세요. \n 질문 : {query} \n 단락: {retrieved_contents} \n 답변 :
          - |
            단락을 읽고 질문에 답하세요. 답할때 단계별로 천천히 고심하여 답변하세요. 반드시 단락 내용을 기반으로 말하고 거짓을 말하지 마세요. \n 질문: {query} \n 단락: {retrieved_contents} \n 답변 :
    - node_type: generator
      strategy:
        metrics: # bert_score 및 g_eval 사용 역시 추천합니다. 빠른 실행을 위해 여기서는 제외하고 하겠습니다.
          - metric_name: rouge
          - metric_name: sem_score
            embedding_model: openai
#          - metric_name: bert_score
#            lang: ko
      modules:
        - module_type: openai_llm
          llm: gpt-4o-mini
          temperature: [ 0.1, 1.0 ]
          batch: 16

[11/25/24 14:32:00] INFO     [evaluator.py:127] >>                                                                                                                  evaluator.py:127
                                             _        _____            _____                                                                                                        
                                  /\        | |      |  __ \     /\   / ____|                                                                                                       
                                 /  \  _   _| |_ ___ | |__) |   /  \ | |  __                                                                                                        
                                / /\ \| | | | __/ _ \|  _  /   / /\ \| | |_ |                                                                                                       
                               / ____ \ |_| | || (_) | | \ \  / ____ \ |__| |                                                                                                       
                              /_/    \_\__,_|\__\___/|_|  \_\/_/    \_\_____|                                                                                                       
                                                                                                                                                                                    
                                                                                                                                                                                    
                    INFO     [evaluator.py:128] >> Start Validation input data and config YAML file first. If you want to skip this, put the --skip_validation flag evaluator.py:128
                             or `skip_validation` at the start_trial function.                                                                                                      
                    WARNING  [validator.py:50] >> Minimal Requested sample size (5) is larger than available records (1). Sampling will be limited to 1 records.     validator.py:50
                    INFO     [evaluator.py:228] >> Embedding BM25 corpus...                                                                                         evaluator.py:228
[11/25/24 14:32:14] INFO     [evaluator.py:248] >> BM25 corpus embedding complete.                                                                                  evaluator.py:248
                    INFO     [_client.py:1039] >> HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"                                          _client.py:1039
[11/25/24 14:32:17] INFO     [_client.py:1787] >> HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"                                          _client.py:1787
[11/25/24 14:32:20] INFO     [evaluator.py:205] >> Running node line retrieve_node_line...                                                                          evaluator.py:205
                    INFO     [node.py:55] >> Running node retrieval...                                                                                                    node.py:55
                    INFO     [run.py:165] >> Running retrieval node - semantic retrieval module...                                                                        run.py:165
                    INFO     [base.py:18] >> Initialize retrieval node - VectorDB                                                                                         base.py:18
                    INFO     [base.py:31] >> Running retrieval node - VectorDB module...                                                                                  base.py:31
...

close #882
close #998

…nc-Korea#998

bwook00 · 2024-11-25T13:23:25Z

@e7217
Thanks for your contribution!

I'll add pymilvus>=2.3.0 at requirements.txt!

bwook00

LGTM!

Thank you for your contribution 👍

feat: Set parameters for Milvus using the configuration file Marker-I…

5b9c11a

…nc-Korea#998

e7217 mentioned this pull request Nov 25, 2024

fix: change default similarity_metric for milvus #1000

Closed

bwook00 self-requested a review November 25, 2024 11:40

Add pymilvus>=2.3.0 at requirements.txt

abc81df

bwook00 approved these changes Nov 25, 2024

View reviewed changes

bwook00 merged commit 07f9752 into Marker-Inc-Korea:main Nov 25, 2024
1 check passed

e7217 deleted the fix/add-config-param-for-vectordb branch November 25, 2024 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Set parameters for Milvus using the configuration file #998 #1001

feat: Set parameters for Milvus using the configuration file #998 #1001

e7217 commented Nov 25, 2024 •

edited

Loading

bwook00 commented Nov 25, 2024

bwook00 left a comment

feat: Set parameters for Milvus using the configuration file #998 #1001

feat: Set parameters for Milvus using the configuration file #998 #1001

Conversation

e7217 commented Nov 25, 2024 • edited Loading

bwook00 commented Nov 25, 2024

bwook00 left a comment

Choose a reason for hiding this comment

e7217 commented Nov 25, 2024 •

edited

Loading