Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] Synthetic data corpus generator #617

Open
gkamat opened this issue Aug 12, 2024 · 0 comments
Open

[META] Synthetic data corpus generator #617

gkamat opened this issue Aug 12, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@gkamat
Copy link
Collaborator

gkamat commented Aug 12, 2024

Is your feature request related to a problem? Please describe

While OSB does come with a number of workloads and associated data corpora, they are limited in scope and size. Being able to generate documents synthetically based upon a specification provided by the user would be a useful capability. This will permit arbitrary-sized corpora to be created as well, and will work well with the data-stream model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Next Quarter
Development

No branches or pull requests

1 participant