Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbolic Text Generation #82

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Symbolic Text Generation #82

wants to merge 1 commit into from

Conversation

HallerPatrick
Copy link
Collaborator

Initial draft/concept for symbolic text generation with Outlines.

  • The draft currently only includes the generation of JSON text (Outline also supports integers, choices, and regexes)
  • I build a thin wrapper around the PromptNode. The default Node would still be haystack (but now imported with from fabricator.nodes import PromptNode)
  • The new node GuidedPromptNode loads a local model for text generation (I think outline also supported OpenAI, etc)

Here an example script:

from fabricator import DatasetGenerator
from fabricator.nodes import GuidedPromptNode
from fabricator.prompts import BasePrompt

from pydantic import BaseModel


# Define a pydantic model for the prompt, can also be just a JSON string 
# if you don't want to use pydantic
class MovieReview(BaseModel):
    movie_title: str
    movie_review: str

prompt = BasePrompt(
    task_description="Generate a short movie review.",
)

# Guided prompt node for JSON generation, pass in the pydantic model
guided_prompt_node = GuidedPromptNode("EleutherAI/pythia-70m", MovieReview, max_length=100)

generator = DatasetGenerator(guided_prompt_node)
generated_dataset = generator.generate(
    prompt_template=prompt,
    max_prompt_calls=1,
)

print(generated_dataset[0]["text"])
# Out: '{ "movie_title": "Movie_title", "movie_review": "Low bit. MT Number"\n}'

Outlines has more features, such as validation of output format and also return the corresponding class.

Current problems with Outlines are dependency conflicts with haystack, because of pydantic. I must check if this still works with the build pipeline.

@HallerPatrick HallerPatrick linked an issue Nov 2, 2023 that may be closed by this pull request
@HallerPatrick HallerPatrick self-assigned this Nov 2, 2023
@HallerPatrick HallerPatrick added the enhancement New feature or request label Nov 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Guided Generation for Syntax dependent tasks
1 participant