Considerations include:
- future advances in data technology
- changes to business requirements
- awareness of current state and how to migrate the design to a future state
- data modeling
- tradeoffs
- distributed systems
- schema design
Considerations include:
- future advances in data technology
- changes to business requirements
- awareness of current state and how to migrate the design to a future state
- data modeling
- tradeoffs
- system availability
- distributed systems
- schema design
- common sources of error (eg. removing selection bias)
Considerations include:
- future advances in data technology
- changes to business requirements
- awareness of current state, how to migrate the design to the future state
- data modeling
- tradeoffs
- system availability
- distributed systems
- schema design
- capacity planning
- different types of architectures: message brokers, message queues, middleware, service-oriented
Considerations include:
- data cleansing
- batch and streaming
- transformation
- acquire and import data
- testing and quality control
- connecting to new data sources
Considerations include:
- provisioning resources
- monitoring pipelines
- adjusting pipelines
- testing and quality control
Considerations include:
- data collection and labeling
- data visualization
- dimensionality reduction
- data cleaning/normalization
- defining success metrics
Considerations include:
- feature selection/engineering
- algorithm selection
- debugging a model
Considerations include:
- performance/cost optimization
- online/dynamic learning
Considerations include:
- working with business users
- gathering business requirements
Considerations include:
- resizing and scaling resources
- data cleansing, distributed systems
- high performance algorithms
- common sources of error (eg. removing selection bias)
Considerations include:
- verification
- building and running test suites
- pipeline monitoring
5.2 - Assessing, troubleshooting, and improving data representations and data processing infrastructure.
- planning (e.g. fault-tolerance)
- executing (e.g., rerunning failed jobs, performing retrospective re-analysis)
- stress testing data recovery plans and processes
Considerations include:
- automation
- decision support
- data summarization, (e.g, translation up the chain, fidelity, trackability, integrity)
Considerations include:
- Identify and Access Management (IAM)
- data security
- penetration testing
- Separation of Duties (SoD)
- security control
Considerations include:
- legislation (e.g., Health Insurance Portability and Accountability Act (HIPAA), Children’s Online Privacy Protection Act (COPPA), etc.)
- audits