v1.3.0
- Optimized concat, batched softmax, pad, sub, broadcast, along with other ops
- Refactored and improved operator splitting
- Cleaned up loading from flash to support loading weights from flash, tile, or DDR
- Added memory analysis report
- Added more examples
- Updated documentation