-
Notifications
You must be signed in to change notification settings - Fork 231
About efficiency of the model #80
Comments
Hi @AOZMH, |
Thanks for the follow-up, I'll provide my code snippet and the execution logs (time costs) shortly after! |
Using Biencoder+Crossencoder is much slow than Biencoder only. |
Since the transformer is O(n^2), we can infer that the T(crossencoder):T(biencoder)=(n+n)^2:2*n^2=2:1 |
Roughly, we can know that T(cross+bi) : T(bi) = 3:1, ignoring the other process |
To wrap up, I conclude that:
Thanks for all the help! I'll be happy to follow any updates. |
My code snippet was the same as the one in README, as shown below.
Also, I changed the config to below to add faiss-index.
|
Hey, how did you manually change biencoder to gpu? Could you share the snippets? |
You can simply revert this commented code to revert the transition to GPU (# .to(device) => .to(device)) and manually put the corresponding model input tensors to GPU & that should work. It may need a few days for me to clean up my (experimental) codes so maybe you can give it a try given the aforementioned ideas; as far as I can recall, that requires <20 lines of code changes. Anyway, if you still have any problem please feel free to reply me & I'll try to embark on my code snippet. |
I am facing the same memory issue you did. Could you elaborate your third point - On how to lower memory usage? |
I tried two ways as follow:
|
BTW, I really hope the developers of BLINK can look into this issue to solve the faiss index problem I mentioned before, thanks in advance! |
I wonder if this is also a feasible solution - Splitting the candidate_encoding when you pass it to gpu and then concatenating the splittled scores and continuing with the code? That way memory passed onto the gpu at each call is reduced without removing entities |
That should work properly, but the time cost would be considerable: the basic assumption is that the whole candidiate_encoding cannot fit into the gpu memory, now, if you split it into A and B, still, you cannot simultaneously put it into gpu memory. Thus, in that sense, for each execution (instead of each model initialization), we need to first transfer A to gpu, execute on A, then we need to delete A from gpu memory and transfer B to gpu and then execute B & delete B from gpu memory. That is, such splitting approach requires a transfer between main memory and gpu memory for each EXECUTION, which would be costly. However, that should be a good idea if you have multiple gpus, e.g. putting A and B PERMANENTLY on gpu 0 & 1 and then we do not need such per execution transition. |
I think a possible solution is to encode all the queries and all the candidates with GPU and save them. Then build a faiss index with cpu to find the nearest entities. Faiss is much more efficient with satisfying results. But this costs much efforts and means the pipeline will be reconstructed. BTW, if you use one 32GB V100, the problem will not occur. |
Hi @AOZMH, could you please share your snippets of using gpu for biencoder? I did that but the speed is still slow, maybe I did wrongly... |
Hey, same for me too! Changing it for GPU was actually slower than that of CPU |
Hi @AOZMH, can you please share how you converted the BLINK to work on FP16. I am getting errors. |
I met the same problem. when I add faiss index path, it becomes slower |
Hi, You may want to make some changes to the codebase and add support for more sparse indexes. Currently, BLINK codebase only supports flat indices. I am currently using a sparse index For e.g., this is what my This is how I load the models. config = {
"interactive": False,
"fast": False,
"top_k": 8,
"biencoder_model": models_path + "biencoder_wiki_large.bin",
"biencoder_config": models_path + "biencoder_wiki_large.json",
"crossencoder_model": models_path + "crossencoder_wiki_large.bin",
"crossencoder_config": models_path + "crossencoder_wiki_large.json",
"entity_catalogue": models_path + "entities_aliases_with_ids.jsonl",
"entity_encoding": models_path + "all_entities_aliases.t7",
"faiss_index": "OPQ32_768,IVF4096,PQ32x8",
"index_path": models_path + "index_opq32_768_ivf4096_pq32x8.faiss",
"output_path": "logs/", # logging directory
}
self.args = argparse.Namespace(**config)
logger.info("Loading BLINK model...")
self.models = main_dense.load_models(self.args, logger=logger) |
Hi,
Thanks for the great repo, I enjoy a lot exploring it!
However, when I tried to run the code in the "Use BLINK in your codebase" chapter in README, I found the speed of running the model relatively slow (in fast=False mode).
To be more specific, when I execute "main_dense.run", the first stage of processing proceeded relatively slow (~2.5 seconds per item) while the later stage (printing "Evaluation") proceeded ~ 5 items per second. Also, I tried adding indices as below.
However, the performance of the first stage became even worse (~20 seconds per item). I'm wondering if I'm setting something wrong (especially for the faiss index) which resulted in the low speed. If there are any corrections/methods to speed up? Thanks for your help!
(I'll post the performance logs below if needed!)
The text was updated successfully, but these errors were encountered: