Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in python could we sometimes use default_factory instead of default? #2076

Open
raj-open opened this issue Aug 20, 2024 · 1 comment
Open

Comments

@raj-open
Copy link

When generating a definitions from a schema, if defaults are used for dictionaries or arrays, datamodel-code-generator currently creates models using the default keyword for fields. In my opinion, this should really be the default_factory keyword.

NOTE: Fortunately, when using pydantic the use of default as opposed to default_factory does not result in the typical issue of the shared instances. However, I am not entirely sure of this is intentional or merely a happy coincidence. If were to switch things to factories, datamodel-code-generator would be a tick safer.

Example

Suppose I have a definition

openapi: 3.0.3
info:
  version: x.y.z
  title: My schema
servers:
  - url: "https://company.org"
paths: {}
components:
  schemas:
    Book:
      description: |-
        A class about books
      type: object
      required:
        - title
      properties:
        title:
          type: string
        isbn:
          description: |-
            The book's ISBN
          type: string
          default: "0000"
        authors:
          description: |-
            The book author or authors
          type: array
          items:
            type: string
          default: []
        editors:
          description: |-
            The editors of the book
          type: array
          items:
            type: string
          default: ["nobody"]
      additionalProperties: false

then, when using datamodel-code-generator, I expect that the defaults of lists be treated as factories. Instead, when one calls, e.g.

python3 -m datamodel_code_generator \
    --input-file-type openapi \
    --output-model-type pydantic_v2.BaseModel \
    --encoding "UTF-8" \
    --disable-timestamp \
    --use-schema-description \
    --use-standard-collections \
    --collapse-root-models \
    --use-default-kwarg \
    --field-constraints \
    --capitalise-enum-members \
    --enum-field-as-literal one \
    --set-default-enum-member \
    --use-subclass-enum \
    --allow-population-by-field-name \
    --snake-case-field \
    --strict-nullable \
    --use-double-quotes \
    --target-python-version 3.11 \
    --input myschema.yaml \
    --output src/mymodels.py

one gets

# generated by datamodel-codegen:
#   filename:  myschema.yaml

from __future__ import annotations

from pydantic import BaseModel, ConfigDict, Field


class Book(BaseModel):
    """
    A class about books
    """

    model_config = ConfigDict(
        extra="forbid",
        populate_by_name=True,
    )
    title: str
    isbn: str = Field(default="0000", description="The book's ISBN")
    authors: list[str] = Field(default=[], description="The book author or authors")
    editors: list[str] = Field(default=["nobody"], description="The editors of the book")

The request here, is that this be changed to:

    authors: list[str] = Field(default_factory=list, description="The book author or authors")
    editors: list[str] = Field(default_factory=lambda: ["nobody"], description="The editors of the book")

What could go wrong

By not using factories, two instances of a class can end up sharing the same attribute. E.g.

class Book:
    """
    A class about books
    """
    title: str
    isbn: str = "0000"
    authors: list[str] = []
    editors: list[str] = ["nobody"]

book1 = Book()
book1.title = "The Hobbit"
book1.isbn = "9780007440849"
book1.authors.append("Tolkien, JRR")

print(book1.authors) # ["Tolkien, JRR"]

book2 = Book()
book2.title = "Harry Potter and the Philospher's Stone"
book2.isbn = "9781408855652"
book2.authors.append("Rowling, JK")

print(book1.authors) # ["Tolkien, JRR", "Rowling, JK"]
print(book2.authors) # ["Tolkien, JRR", "Rowling, JK"]

Note again, that I am not claiming that this happens with the pydantic models. In fact, the above does not happen. But, as stated at the start, it is not clear to me whether this be by design (of pydantic's BaseModels/Fields) simply a happy coincidence.

Further context

  • OS: irrelevant (as same behaviour on linux, OSX, windows)
  • Language: python>=3.11
  • Packages:
    • datamodel-code-generator>=0.25.9
    • pydantic>=2.8.2
@raj-open
Copy link
Author

EDIT: I am aware that you can use e.g. (for python)

    Book:
      ...
      properties:
        ...
        authors:
          description: |-
            The book author or authors
          type: array
          items:
            type: string
          default-factory: 'list'
        editors:
          description: |-
            The editors of the book
          type: array
          items:
            type: string
          default-factory: 'lambda: ["nobody"]'

however this is not a language-agnostic approach, this not imo an appropriate solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant