Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Set parameters for Milvus using the configuration file #998 #1001

Merged

Conversation

e7217
Copy link
Contributor

@e7217 e7217 commented Nov 25, 2024

description
I added parameters to the constructor that apply to index_type, nlist, M, efConstruction, and others.

vectordb:
- name: autorag_2024_xx_xx
  db_type: milvus
  ...
  index_type: hnsw
  params: 
    M : 8
    efConstruction: 50

refer to langchain-milvus :
https://github.com/langchain-ai/langchain-milvus/blob/288f5197f1e68d1cc98a416e0f7b8f3a6a3a9517/libs/milvus/langchain_milvus/vectorstores/milvus.py#L657

After modifying my configuration file, I was able to get successful results.

vectordb:
- name: autorag_2024_xx_xx
  db_type: milvus
  embedding_model: openai
  collection_name: autorag_2024_xx_xx
  uri: http://192.xxx.xxx.xxx:19530
  embedding_batch: 50
  similarity_metric: l2
  # index_type: hnsw
  params: 
    nlist : 16384
    # M : 8
    # efConstruction: 50
node_lines:
- node_line_name: retrieve_node_line
  nodes:
    - node_type: retrieval
      strategy:
        metrics: [retrieval_f1, retrieval_ndcg, retrieval_map]
      top_k: 3
      modules:
        - module_type: bm25
          bm25_tokenizer: [ ko_kiwi, ko_okt, ko_kkma ]
        - module_type: vectordb
          vectordb: autorag_2024_xx_xx
        - module_type: hybrid_rrf
        - module_type: hybrid_cc
          normalize_method: [ mm, tmm ]
- node_line_name: post_retrieve_node_line
  nodes:
    - node_type: prompt_maker
      strategy:
        metrics:
          - metric_name: rouge
          - metric_name: sem_score
            embedding_model: openai
#          - metric_name: bert_score
#            lang: ko
        generator_modules:
          - module_type: openai_llm
            llm: gpt-4o-mini
      modules:
        - module_type: fstring
          prompt:
          - | 
            단락을 읽고 질문에 답하세요. \n 질문 : {query} \n 단락: {retrieved_contents} \n 답변 :
          - |
            단락을 읽고 질문에 답하세요. 답할때 단계별로 천천히 고심하여 답변하세요. 반드시 단락 내용을 기반으로 말하고 거짓을 말하지 마세요. \n 질문: {query} \n 단락: {retrieved_contents} \n 답변 :
    - node_type: generator
      strategy:
        metrics: # bert_score 및 g_eval 사용 역시 추천합니다. 빠른 실행을 위해 여기서는 제외하고 하겠습니다.
          - metric_name: rouge
          - metric_name: sem_score
            embedding_model: openai
#          - metric_name: bert_score
#            lang: ko
      modules:
        - module_type: openai_llm
          llm: gpt-4o-mini
          temperature: [ 0.1, 1.0 ]
          batch: 16
[11/25/24 14:32:00] INFO     [evaluator.py:127] >>                                                                                                                  evaluator.py:127
                                             _        _____            _____                                                                                                        
                                  /\        | |      |  __ \     /\   / ____|                                                                                                       
                                 /  \  _   _| |_ ___ | |__) |   /  \ | |  __                                                                                                        
                                / /\ \| | | | __/ _ \|  _  /   / /\ \| | |_ |                                                                                                       
                               / ____ \ |_| | || (_) | | \ \  / ____ \ |__| |                                                                                                       
                              /_/    \_\__,_|\__\___/|_|  \_\/_/    \_\_____|                                                                                                       
                                                                                                                                                                                    
                                                                                                                                                                                    
                    INFO     [evaluator.py:128] >> Start Validation input data and config YAML file first. If you want to skip this, put the --skip_validation flag evaluator.py:128
                             or `skip_validation` at the start_trial function.                                                                                                      
                    WARNING  [validator.py:50] >> Minimal Requested sample size (5) is larger than available records (1). Sampling will be limited to 1 records.     validator.py:50
                    INFO     [evaluator.py:228] >> Embedding BM25 corpus...                                                                                         evaluator.py:228
[11/25/24 14:32:14] INFO     [evaluator.py:248] >> BM25 corpus embedding complete.                                                                                  evaluator.py:248
                    INFO     [_client.py:1039] >> HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"                                          _client.py:1039
[11/25/24 14:32:17] INFO     [_client.py:1787] >> HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"                                          _client.py:1787
[11/25/24 14:32:20] INFO     [evaluator.py:205] >> Running node line retrieve_node_line...                                                                          evaluator.py:205
                    INFO     [node.py:55] >> Running node retrieval...                                                                                                    node.py:55
                    INFO     [run.py:165] >> Running retrieval node - semantic retrieval module...                                                                        run.py:165
                    INFO     [base.py:18] >> Initialize retrieval node - VectorDB                                                                                         base.py:18
                    INFO     [base.py:31] >> Running retrieval node - VectorDB module...                                                                                  base.py:31
...

close #882
close #998

@bwook00
Copy link
Contributor

bwook00 commented Nov 25, 2024

@e7217
Thanks for your contribution!

I'll add pymilvus>=2.3.0 at requirements.txt!

Copy link
Contributor

@bwook00 bwook00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Thank you for your contribution 👍

@bwook00 bwook00 merged commit 07f9752 into Marker-Inc-Korea:main Nov 25, 2024
1 check passed
@e7217 e7217 deleted the fix/add-config-param-for-vectordb branch November 25, 2024 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants