The Structured Object Definition Language (SODL) is a domain-specific language designed for defining data structures, relationships, and constraints in a clear and organized manner. Files written in SODL use the .sodl
extension, reflecting the language's focus on structured object definitions. SODL provides a rich set of constructs that allow developers to model complex data relationships while maintaining type safety and data integrity.
The name "Structured Object Definition Language" reflects the language's core purpose: it provides a structured way to define objects and their relationships in a data model. The "Structured" aspect emphasizes its systematic approach to organizing data definitions, while "Object Definition" highlights its primary focus on defining data objects and their properties.
SODL uses several fundamental building blocks to construct data models:
-
Basic Types: The language supports various numeric types (uint8 through uint64, int8 through int64, float32, float64), strings, booleans, and timestamps. These serve as the foundation for more complex data structures.
-
Complex Types: SODL supports fixed-size lists (
[Type; size]
) and TLV (Type-Length-Value) structures (tlv<Type>
), allowing for more sophisticated data organization. -
User-Defined Types: Developers can create custom types using several constructs, which we'll explore in detail.
SODL features a modular import system that allows code reuse and organization. The syntax supports:
import { UUID, Timestamp } from "common_types"
This system allows for:
- Selective imports with specific type names
- Wildcard imports using
*
- Import aliasing for naming flexibility
Enums provide a way to define a set of named constants. Each enum value can optionally have an explicit integer value:
enum UserRole {
Admin = 1,
Editor = 2,
Viewer = 3,
Guest = 4
}
Unions represent a type that could be one of several possibilities. They're particularly useful for modeling variable data types:
union ContactMethod {
Email,
Phone,
Address
}
Structs are collections of fields, each with its own type. They're used to create compound data types:
struct Address {
street: string,
city: string,
state: string,
zipCode: string,
country: string
}
Keys define unique identifiers and indexing structures. They can include multiple fields and metadata:
key UserProfile {
userId: type = UUID,
username: type = string,
email: type = string
}
Objects are the most complex and feature-rich construct in SODL. They represent entities with various properties and constraints:
object UserAccount {
userId: type = UUID, assigned = counter, required, key;
username: type = string, required;
email: type = string, required;
profile: type = UserProfile;
}
Object fields can have several properties:
required
: Field must have a valuekey
: Field is part of the object's keyassigned
: Specifies automatic value assignmentdefault
: Provides a default valueoptional
: Field may be omitted
KeyMaps define relationships between different objects using their keys:
keymap UserProfile:UserAccount {
username -> username,
email -> email
}, primary, name = "ProfileToAccount";
KeyMaps support properties like:
primary
: Indicates a primary relationshipname
: Provides a descriptive namecascadeDelete
: Specifies deletion behavior
SODL supports advanced type constraints:
- Range constraints:
range(min, max)
- Pattern matching:
pattern = "regex_pattern"
Fields can be marked as strict with explicit values:
field: string, strict = "specific_value"
-
Hierarchical Data Modeling Create clear hierarchies using objects and relationships. The example shows this with UserAccount and UserPreferences.
-
Relationship Modeling Use KeyMaps to establish clear relationships between entities. The example demonstrates this with organization membership.
-
Type Safety Leverage the type system to ensure data integrity. The example uses specific types like UUID and Timestamp.
The provided example could be enhanced with:
-
Pattern Constraints The example doesn't show usage of pattern matching for strings (e.g., email validation).
-
Range Constraints Numeric range constraints aren't demonstrated (e.g., age limits).
-
TLV Types The example doesn't utilize TLV types, which could be useful for variable-length data.
-
Nested Complex Types While it shows arrays of structs, it doesn't demonstrate nested arrays or more complex type combinations.
Here's an example of what these missing features might look like:
object EnhancedUserProfile {
email: string, pattern = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$";
age: uint8, range(13, 120);
metadata: tlv<string>;
skills: [string; 10];
certifications: [[string; 2]; 5]; // Array of certification pairs
}
-
Overuse of Optional Fields While optional fields provide flexibility, overusing them can lead to data inconsistency.
-
Complex Key Structures Keep key structures simple and focused on natural identifying properties.
-
Missing Relationships Ensure all necessary relationships are properly modeled using KeyMaps.
-
Type System
- SODL: Provides more precise numeric types (uint8-uint64, int8-int64) and built-in support for timestamps
- Protobuf: Simpler type system with fewer numeric types, requires manual timestamp handling
-
Relationship Modeling
- SODL: First-class support for relationships through KeyMaps
- Protobuf: Relationships must be modeled through message references
-
Validation
- SODL: Built-in support for range constraints and pattern matching
- Protobuf: Limited validation capabilities, requires additional tooling
-
Schema Evolution
- SODL: Explicit strict/optional field marking
- Protobuf: Built-in versioning through field numbers
-
Structure
- SODL: More concise syntax, focused on object definitions
- JSON Schema: More verbose, JSON-based syntax
-
Type Safety
- SODL: Strong static typing with precise numeric types
- JSON Schema: Dynamic typing with broader type categories
-
Validation
- SODL: Built-in constraints with a focus on data integrity
- JSON Schema: Extensive validation capabilities but more complex syntax
-
Tooling
- SODL: Purpose-built for data modeling with relationship support
- JSON Schema: Broader ecosystem with various validation tools
-
Focus
- SODL: Object-oriented data modeling with rich relationships
- SQL DDL: Relational data modeling with tables and foreign keys
-
Constraints
- SODL: Built-in support for range and pattern constraints
- SQL DDL: Extensive constraint system with CHECK clauses
-
Type System
- SODL: Consistent cross-platform types
- SQL DDL: Database-specific type systems
-
Relationships
- SODL: KeyMaps provide flexible relationship modeling
- SQL DDL: Foreign key constraints with referential integrity
-
Purpose
- SODL: Focus on data modeling and storage
- GraphQL: Focus on API design and query patterns
-
Type System
- SODL: Rich built-in types with constraints
- GraphQL: Simpler type system with custom scalar support
-
Relationships
- SODL: Explicit relationship modeling through KeyMaps
- GraphQL: Implicit relationships through field types
-
Validation
- SODL: Built-in validation constraints
- GraphQL: Requires custom directive implementation
-
Precision in Data Modeling
- Fine-grained type system
- Built-in constraint support
- Clear relationship modeling
-
Data Integrity
- Strong validation capabilities
- Explicit strict/optional field marking
- Type safety across implementations
-
Maintainability
- Clear, readable syntax
- Modular import system
- Structured relationship definitions
-
Flexibility
- Support for various data modeling patterns
- Rich type composition
- Extensible through imports
While SODL and Parquet might seem very different at first glance, they both deal with data structure definition and organization, albeit for different purposes and at different stages of the data lifecycle. Let's explore their similarities and differences:
-
Primary Purpose
SODL serves as a language for defining data structures and relationships at the application and service level, focusing on how data should be organized and validated during its active use. Parquet, on the other hand, is a columnar storage format designed for efficient data storage and retrieval, particularly for analytical workloads.
-
Schema Definition Approach
SODL provides a rich, human-readable syntax for defining complex data structures with relationships and constraints. Here's how a simple user record might look in SODL:
object User { userId: type = UUID, assigned = counter, required, key; age: type = uint8, range(0, 120); email: type = string, pattern = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"; }
Parquet schemas are typically defined using message type definitions in protocol buffers style:
message User { required binary user_id (UTF8); required int32 age; required binary email (UTF8); }
-
Data Organization
SODL focuses on logical organization:
- Hierarchical structure through nested objects
- Explicit relationship definitions via KeyMaps
- Support for complex types and constraints
- Focus on data integrity and validation
Parquet focuses on physical organization:
- Columnar storage for better compression
- Column chunks and row groups
- Statistics and encoding at the column level
- Optimization for read performance
-
Type System
SODL provides:
- Fine-grained numeric types (uint8 through uint64)
- Built-in support for complex types like UUID and Timestamp
- User-defined types and unions
- Constraint definitions within the type system
Parquet supports:
- Basic types (INT32, INT64, FLOAT, DOUBLE, BOOLEAN)
- Encoded types through logical types
- Repeated and required fields
- Limited validation capabilities
-
Use Cases
SODL is ideal for:
- Application data model definition
- Service interface contracts
- Data validation rules
- Relationship modeling
- Business logic constraints
Parquet excels at:
- Big data storage
- Analytical query optimization
- Data warehousing
- Column-oriented processing
- Compression efficiency
-
Schema Evolution
SODL handles evolution through:
- Explicit optional/required field marking
- Strict mode for field values
- Import system for type reuse
- Clear relationship versioning through KeyMaps
Parquet manages evolution through:
- Adding/removing optional fields
- Column addition/removal
- Schema merging capabilities
- Backward compatibility support
-
Complementary Usage
SODL and Parquet often complement each other in a data pipeline:
[Application Layer]
SODL Definitions
- Define data structure
- Validate input
- Manage relationships
↓
[Processing Layer]
Data Transformation
- Convert to columnar format
- Optimize for analytics
↓
[Storage Layer]
Parquet Storage
- Efficient storage
- Fast analytical queries
- Performance Characteristics
SODL focuses on:
- Compile-time type safety
- Runtime validation efficiency
- Relationship integrity
- Memory-efficient representations
Parquet optimizes for:
- Disk space efficiency
- Read performance for analytical queries
- CPU efficiency in data processing
- Memory efficiency in column handling
- Integration Patterns
A typical integration might look like this:
// SODL definition for active data
object AnalyticsEvent {
eventId: type = UUID, assigned = counter, required, key;
timestamp: type = Timestamp, required;
userId: type = UUID, required;
eventType: type = string, required;
properties: type = tlv<string>;
}
// This data might later be stored in Parquet format:
// message AnalyticsEvent {
// required binary event_id (UTF8);
// required int64 timestamp (TIMESTAMP_MILLIS);
// required binary user_id (UTF8);
// required binary event_type (UTF8);
// optional group properties (MAP) {
// repeated group key_value {
// required binary key (UTF8);
// required binary value (UTF8);
// }
// }
// }
This comparison highlights how SODL and Parquet serve different but complementary roles in a data architecture. SODL provides the rich, constrainable schema definitions needed for application-level data modeling, while Parquet offers the optimized storage format needed for analytical processing. Understanding these differences helps architects and developers choose the right tool for each part of their data pipeline.
SODL is particularly well-suited for:
- Complex data models with intricate relationships
- Systems requiring strong data validation
- Projects needing clear documentation of data structures
- Applications with strict type safety requirements
- Systems with complex business rules around data integrity
The language's focus on structured definitions makes it especially valuable in enterprise environments where data modeling clarity and maintainability are crucial.
SODL provides a robust framework for data modeling with strong typing and relationship management. Its structured approach to object definition makes it particularly well-suited for projects that require clear, maintainable data models while enforcing data integrity and structure. The .sodl
file extension helps identify these definition files within a project's codebase, making it easier to manage and organize data models.