Hotspots in function rosMsgToPointMatcherCloud(PointCloud2, bool) #2

YoshuaNava · 2020-07-20T19:29:48Z

Hi,
As part of my efforts to benchmark libpointmatcher, I ran a ROS node that employs libpointmatcher_ros/point_cloud (the old version from ethzasl_icp_mapping) to serialize and deserialize point cloud data. I implemented a ROS node that receives a point cloud message, deserializes it, and applies a few filters, to finally publish the resulting point cloud, run for 100+ seconds.

I found head-first that the most expensive method called in my program (even more than a surface normal data points filter run every iteration) was rosMsgToPointMatcherCloud(sensor_msgs::PointCloud2, bool) from point_cloud.cpp.

I used Intel VTune community edition for finding hotspots and Intel Advisor for vectorization advice. In the following lines I describe my search for hotspots and a short analysis.

Hotspots

CPU

Memory access

Memory writing

Vectorization advice

Analysis

(The code in this repo might not be the same as the one from ethzasl_icp_mapping. I'll try to update my analysis)

I found 3 main CPU-time hostpots:

Casting of contiguous values from an array to fill up features/descriptors is done in a cuatri-loop (ros Msg fields -> point cloud height -> point cloud width -> data length). This takes ~12% of CPU time. (Line 194
Pre-filling of empty point cloud containers with "padding" values. (This takes ~3% of CPU time. (Line 92

In terms of memory access, number 1 from the above list is also a strong hotspot. When it comes to memory writing, all paged memory is cleared by the function, and the allocations are neither big or too many (comparing to other methods, e.g. ROS TCP)

Intel Advisor recommends optimizing [the "RGB loop"]https://github.com/norlab-ulaval/libpointmatcher_ros/blob/master/src/PointMatcher_ROS.cpp#L113) first of all, the cuatri-loop described in point 1 of the CPU hotspots, as well as a loop in libnabo.

The text was updated successfully, but these errors were encountered:

YoshuaNava · 2020-07-20T20:07:15Z

Proposals

I will try to optimize 3 of the loops in the deserialization function. In order of resource consumption:

Data reading cuatri-loop

As the datatype of every field is known in advance (thanks to the iterator it), remove the switch case from the inner loop. The data type of the pointers would be based on the a priori knowledge of it.
Pre-declare the data field length based on the datatype.
Fill the view of the matrices by the stride of the data (row_step).
Move the declaration of fPtr out of the width-loop so that we (hopefully) block that variable in cache.
Change or enforce using the x iterator. It is currently unused.

Based on this article Eigen matrices are stored in column-major order. It could be very productive to write up the data in column order, as columns are stored (and loaded) contiguously. A loop of this type would be easier to unroll or vectorize for the compiler.

Padding after output container creation

I'm unsure about the current use of padding, so I'm going through the code atm.

RGB filling loop

Currently the x iterator is unused. I would propose to either change it (iterate at point_step) or start using it actively.
Set the value of the vector in column major order.
Remove the conditional from the loop, it could be creating a branch that prevents vectorization.

YoshuaNava · 2020-07-23T13:18:55Z

I'm going to tackle this in the next 2 weeks. I'll give updates on my progress.

pomerlef · 2020-07-23T15:37:08Z

great! something that is awfully missing on the ROS side of libpm are unit tests. If you're doing some for this work, let us know so we can integrate them properly.

For padding, libpm use homogeneous coordinates to apply rigid transformation to the whole point cloud using matrix multiplication instead of a loop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hotspots in function rosMsgToPointMatcherCloud(PointCloud2, bool) #2

Hotspots in function rosMsgToPointMatcherCloud(PointCloud2, bool) #2

YoshuaNava commented Jul 20, 2020

YoshuaNava commented Jul 20, 2020 •

edited

Loading

YoshuaNava commented Jul 23, 2020

pomerlef commented Jul 23, 2020

Hotspots in function rosMsgToPointMatcherCloud(PointCloud2, bool) #2

Hotspots in function rosMsgToPointMatcherCloud(PointCloud2, bool) #2

Comments

YoshuaNava commented Jul 20, 2020

Hotspots

CPU

Memory access

Memory writing

Vectorization advice

Analysis

YoshuaNava commented Jul 20, 2020 • edited Loading

Proposals

Data reading cuatri-loop

Padding after output container creation

RGB filling loop

YoshuaNava commented Jul 23, 2020

pomerlef commented Jul 23, 2020

YoshuaNava commented Jul 20, 2020 •

edited

Loading