-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New research #15
Comments
We are currently working on improving the quality of the data representation, which could be much more optimized! After that, collecting more data is under our radar. Also, combining datasets is also interesting (for example, mind2web and aitw are interesting datasets to add).
Right now we have WebLINX (https://arxiv.org/abs/2402.05930) but more papers will take a while! However feel free to keep an eye on the release notes and discussions on the weblinx repo as well as here.
We have a few experiments with multimodal and image-to-text models. Pix2Act is interesting since it's very small but performs somewhat well on weblinx evals.
I'm not sure what DRM models are. can you expand? |
hey @xhluca ! thanks for your reply! Sorry btw, it was a typo, I was referring to the Dense Markup Ranking (DMR) models, the ones you mention on the paper here: https://arxiv.org/pdf/2402.05930 Please, if you have any kind of discord or telegram group or somehow an option to be more involved, I would love to be part of it. I love the topic and I think this has a huge potential :) |
Yes, we are interested in building better DMR variants! We are still looking into different ways we can approach the candidate selection problem. Regarding discord, I think it's a great idea to create one! I will look into it and discuss with collaborators! |
Hey @xhluca ! Any news on this? Are you looking into the multi modal llama 3.2 for this? If I can help somehow, just let me know! |
Hey! We are all actively working on improving weblinx. Llama 3.2 is definitely under our radar, but we are waiting to streamline our new eval pipeline and augment the training data before proceeding. That said, if you are working on llama 3.2 and would like to contribute a PR that adds the vision capability, I'd be happy to review the results & merge! |
Hey! Is there anything new you guys are working on? More data? I love this because I think multion is actually doing a decent work on this kind of tasks. I really think this is the future of agents.
Do you have maybe other papers or more actual information on what is currently happening on this topic?
Do you know anything else that is working with a computer vision approach or maybe with a multi modal model?
Any new research on the DRM models?
Sorry for so many questions but I find this fascinating!
The text was updated successfully, but these errors were encountered: