-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataset Annotation #6
Comments
Hi @tzktok, thanks for your interest! As stated in the paper, we used publicly available datasets while training UniTable. I will share the papers of these datasets below and their annotation processes may be helpful to you! PubTabNet: https://github.com/ibm-aur-nlp/PubTabNet |
I have used my own data to fine-tune the model, and the results have been very good. Thank you for your efforts. However, the inference speed does not meet my requirements. Are there any good methods to speed up inference? I have tried using TensorRT, but the improvement was not significant. Should I consider adding a KV cache to reduce the time spent on inference? |
Glad to know the finetuning went well! Yes, UniTable was implemented with vanilla transformer architecture. A kv-cache like the llama3 architecture here will largely speed up the inference. Interested in opening a PR? |
I will try to add this part, and when all goes well I will submit the pr~ |
Thanks! I would recommend starting from implementing the kv-cache logic in the pipeline notebook and compare speed. |
How u annotate your own dataset ? |
I'm also interested in training using my own dataset but have no idea where to start for annotating it. Any advice? I originally tried using the full_pipeline notebook but it did not create an accurate table from the image. |
I also wants to train with custom dataset. |
@whalefa1I |
@whalefa1I May I ask how much data did you use to train in your scenario? |
@whalefa1I |
30k maybe?Only Bbox model~ |
Maybe as long as you find the corresponding option in the CONFIG.mk file and configure it when running the Makefile with the exp name [EXP_$*], it should work, right? Do you want to convert it into a regular training script instead of using Hydra for configuration? |
Our data annotation format differs from the open-source TSR task annotation method, but both are composed of two coordinate points.
|
In my case table have around 1000 cell so I don't know it will be good to fine-tune only by increase maxlen only work fine or not? |
It seems that because the decoder has only 4 layers or there may be an error in my implementation, the acceleration effect is not significant, achieving only a 7% speedup (varying with the number of bboxes). Due to the differences between the custom implementation of attention and the native torch attention (the MAE loss of the two types of attention is below e-8 in the first layer, but increases to 0.9 after subsequent cross-attention), it may be necessary to retrain the model. Additionally, I have replaced components using the llama decoder. If you are interested, I can send it to you. |
Thank you for your sharing. |
This is an interesting issue. I am currently using the llama decoder to reproduce the model, and its special positional encoding might have some capability for length-extension. However, for your case, I think it might be difficult. The out-of-distribution (OOD) phenomenon is likely to be significant, and you may need more data to support 4k token output. |
This is related to our annotation format. We generate HTML tags from bbox annotations using a set of heuristic rules, so the entire process only requires a bbox model. |
could you please let me know the process or code of heuristic rules to generate HTML from labelme json format? it will be really helpful for me. |
Thank you for your reply, I would also like to ask you a question, in your scenario, what are the advantages of using unitable, which obtains bbox coordinates through autoregressive methods, compared to using object detection models (such as YOLO)? BTW, I added a decoder with kv-cache in this PR #11, which can achieve about a 30% improvement in inference speed with batch_size=1. |
|
I checked the results of the images in the dataset/mini_pubtabnet/val directory through full_pipeline.ipynb, and based on the visualization results, the output is the same as the original model. |
Hey @whalefa1I I'm wondering if you can assist. I have a dataset that comprises of PDFs with matching XML in SVG tag format that is D3.js derived. I have bbox and tokens for all the text, but since the images have to be resized, how do I ensure that the existing annotations will correspond with the downsampled images when fine-tuning? Is the SVG tag structure useful? Would I need to add the SVG tags to the existing HTML vocab file? Also, some tables overflow into different pages. When converting pdf2image, how can I maintain consistency of box locations for each image to source PDF? |
|
Hey @whalefa1I
Sure. I've shared a sample PDF with matching XML doc (SVG tag).
Thanks, will try it this out. |
Hi, have you trained bbox with your own dataset? Can you share the specific steps?
Hi, have you trained bbox with your own dataset? Can you share the specific steps? |
I want fine tune the unitable model for my custom dataset...How to do the annotaion process is any tool available for ur annotation methods..
@matthewdhull @polochau @haekyu @helblazer811 @ShengYun-Peng
The text was updated successfully, but these errors were encountered: