Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long-Context Experiments #29

Open
ClashLuke opened this issue May 9, 2022 · 0 comments
Open

Long-Context Experiments #29

ClashLuke opened this issue May 9, 2022 · 0 comments
Labels
engineering Software-engineering problems that don't require ML-Expertise ML Requires machine-learning knowledge (can be built up on the fly)
Milestone

Comments

@ClashLuke
Copy link
Member

Currently, our model can train with a context of 2 million tokens (at 1B parameters) on a v3-8. However, our demo uses 4096 tokens (characters at the time of writing), a significantly shorter context. Instead of using such an unimpressive context, we could scale up and demonstrate that we can few-shot learn from an entire book.
This issue tracks the progress of creating and deploying such a model.

@ClashLuke ClashLuke added the engineering Software-engineering problems that don't require ML-Expertise label May 9, 2022
@ClashLuke ClashLuke added this to the First Release milestone May 9, 2022
@ClashLuke ClashLuke added the ML Requires machine-learning knowledge (can be built up on the fly) label May 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
engineering Software-engineering problems that don't require ML-Expertise ML Requires machine-learning knowledge (can be built up on the fly)
Projects
None yet
Development

No branches or pull requests

1 participant