Long-Context Experiments #29
Labels
engineering
Software-engineering problems that don't require ML-Expertise
ML
Requires machine-learning knowledge (can be built up on the fly)
Milestone
Currently, our model can train with a context of 2 million tokens (at 1B parameters) on a v3-8. However, our demo uses 4096 tokens (characters at the time of writing), a significantly shorter context. Instead of using such an unimpressive context, we could scale up and demonstrate that we can few-shot learn from an entire book.
This issue tracks the progress of creating and deploying such a model.
The text was updated successfully, but these errors were encountered: