如何贡献:欢迎直接PR问题到这个FAQ,或者提交PR回答这个FAQ中没有回答的问题。
https://github.com/nvdla/doc/blob/35056106d8a13f7783ccae024710f6eec2c82b44/doc/hwarch.rst
搜索 stripe_
关键字。
https://github.com/nvdla/doc/blob/35056106d8a13f7783ccae024710f6eec2c82b44/doc/hwarch.rst
搜索 WMB
关键字。
memory xx interface?
convolution interface?
question could be found from nvdla/hw#14
asked by apolonic:
It means the weight of the convolution is loacal and not shared. Could you enlight me how it could be supported?
nvdsmith replied:
This isn't one of the hardware layers targeted by the NVDLA core. Locally connected convolution can be mapped to the NVDLA, but given the large number of weights needed, bandwidth would limit MAC utilization. This might be fine in an IoT device depending on the network size. Is this something that's important to your application? Also, with batching I'd expect MAC utilization would be good with current weight bandwidth. Will need to map this out in more detail at some point though if it's important.
question could be found from nvdla/hw#13
asked by apolonic:
Could you enlight me about how the dilate converlution is supported ? ps. CSC reguster D_DILATION_EXT takes the parameters.
nvdsmith replied:
The convolution sequence the hardware follows is updated based on this parameter. Basically, the sequencer fetches the activations corresponding to the dialation setting.
question could be found from nvdla/hw#11
asked by apolonic:
PDP has a dedicated memory interface to fecth input map from memory and output map directly to memory. Does this "memory" means the memory outside of DLA? If it does, Is it CPU to reshape the data? Why not to fetch the input map from the conv buffer? What would be the cost by implementing this design?
nvdsmith replied:
The SDP block has a number of operations arranged in a pipeline. So multiple operations can be chained together within the SDP block without affecting the throughput.
question could be found from nvdla/hw#10
nvdsmith replied:
PDP will get the output from the accumulator directly without needing to first travel through memory. It's memory interface would be used for other data like bias. This data could be stored on an on-chip SRAM or in DRAM.
真不是。因为做了量化?