AvscReader
class calls .read()
to accomplish the following two things:
- read and serialize .avsc file into JSON object and append it to a list stored in the reader
- build a namespace tree where each node in the tree has a name, a dictionary of file objects, and a dictionary of children nodes
- each node represents a namespace; in the
RecordWithNestedUnion
test example, the namespace tree that is ultimately built consists of a root node, a child of the root node namedrecords
, and a child ofrecords
namednested
- each node represents a namespace; in the
Once the namespace tree has been built, it is passed to the AvroWriter
class. This class traverses the tree and writes the information in each node's file objects to Python files as Python classes.
AvscReader
is given an .avsc file to parse. Below is an example of the contents ofRecordWithNestedUnion.avsc
{"type": "record", "name": "RecordWithNestedUnion", "namespace": "records", "fields": [
{"name": "nestedUnion", "type": ["null",
{"type": "record", "name": "NestedUnion", "namespace": "records.nested", "fields": [
{"name": "categories", "type": ["null",
{"type": "array", "items": {"type": "record", "name": "CommonReference", "fields": [
{"name": "group", "type": "int"},
{"name": "isApproved", "type": ["null", "boolean"],
"default": null
},
{"name": "index", "type": ["null", "int"]}
]
}
}
],
"default": null
}
]
}
],
"default": null
},
{"name": "nestedUnion2", "type": ["null",
{"type": "record", "name": "NestedUnion2", "namespace": "records.nested", "fields": [
{"name": "categories2", "type": ["null",
{
"type": "array",
"items": "CommonReference"
}
],
"default": null
}
]
}
],
"default": null
}
]
}
- Each element of type
record
orenum
in the .avsc file has its own .avsc file associated with it. Therefore, each element of typerecord
orenum
will generate a Python fie and needs to be added as a file object to one of he nodes in the namespace tree AvscReader
creates these files in a Breadth-First Search order- in the example above,
AvscReader
creates the file objects forRecordWithNestedUnion
,NestedUnion
,NestedUnion2
, andCommonReference
in that order
- in the example above,
- As
AvscReader
traverses through each element in the file, it will add each record it comes across into aqueue
to be processed later queue
is initially populated with whatever was serialized byAvscReader
. In the case of the above example, thequeue
is initially populated withRecordWithNestedUnion
- As
- In general, the overall flow of building the namespace tree is as follows:
- create empty
root_node
- populate
queue
- until
queue
is empty: - grab the first
item
in thequeue
- get the
namespace
of theitem
- create or set
current_node
tonamespace
- create skeleton
file
for theitem
- if
item
is a record - traverse its
fields
- for each
field
infields
- get
type
offield
- get
type
of nested elements if needed (for arrays, unions, etc.) - create that nested element if necessary
- get
- get
- create that
field
- add the
field
tofile
- if
field
was a record, add it to thequeue
- for each
- traverse its
- if
- if
item
is an enum - add enum symbols to
file
- add enum symbols to
- if
- add
file
tocurrent_node
- grab the first
- until
- set
file_tree
attribute inAvscReader
toroot_node
- create empty
- The resulting namespace tree after reading
RecordWithNestedUnion.avsc
is structured as follows: - root_node
- name=''
- files={}
- children={Node<'records'>}
- records
- name='records'
- files={File<'RecordWithNestedUnion'>}
- children={Node<'nested'>}
- nested
- name='nested'
- files={File<'NestedUnion'>, File<'NestedUnion2'>, File<'CommonReference'>}
- children={}
- The resulting namespace tree after reading
- the current node represents the
namespace
or directory of the .avsc file - the
File
class represents the contents and meta information of the .avsc file - the
item
in thequeue
is the contents of an individual .avsc file - everything that gets added to the
queue
will have a Python file created for it
- everything that gets added to the
- the
get_field_type
identifies the type of the element being parsed (str, arrays, unions, etc.)