Replies: 16 comments 35 replies
-
Firstly, I thought the actual algorithm presented was really clever, especially the method of incrementing a register along a minimal set of edges to produce unique sums. The first thought that came to my mind was that the extra information over edge profiling would be extremely helpful for reordering blocks and setting branch fallthroughs. Edge and block profiling can help, but having concrete knowledge of which are the hot paths to try and keep them contiguous in the generated code seems much better. One motivation I wasn't totally convinced of, however, was using this for code coverage. While ensuring path coverage in tests would definitely be exhaustive, it also seems overboard. I think ensuring line and branch coverage would yield a much higher value-to-time ratio than path coverage and because of this be more adhered to by developers in a professional setting. |
Beta Was this translation helpful? Give feedback.
-
Another comment mentioned how clever the solution was. I was also kind of surprised that it is possible to have a different integer result for each path from 0..n-1. However, the proof was very simple and elegant, which is always a nice result for a surprising outcome. It is also nice to see a paper deviate from the status quo, and show it is possible to do profiling (from edge --> path) a different way. |
Beta Was this translation helpful? Give feedback.
-
I felt that the algorithm to derive an acyclic CFG from a cyclic one in pages 52 and 53 was clever. The key intuition was to remove backedges while maintaining a bijective map between all paths in a CFG to those in the resulting CFG. I was also surprised that this algorithm would work for any CFGs, including those with irreducible loops. I wonder how applicable this reduction technique was for analysis tools that came after this paper. One part I found unsatisfying is how they deferred how they handle early termination to another paper, Bal94. It wasn't clear upon my first skim of the paper why the event counting algorithm handled early termination -- all I found was:
|
Beta Was this translation helpful? Give feedback.
-
As others have mentioned, the algorithm discussed in the paper is certainly elegant - even after reading into the algorithm it's surprising that such an efficient algorithm exists for CFGs with cycles. I was unaware of the "overlapping paths" inaccuracy of edge profiling and basic block profiling and the fact that these inaccuracies were ignored because it was assumed that path profiling would be prohibitively expensive. One of the main themes of this paper is "Overhead vs Accuracy". On one end of the spectrum are basic block profiling and edge profiling which appear to have lower accuracy but also have less overhead. Alternatively, we have the path profiling presented in this paper which has more overhead but is also more accurate. For the use case of program optimizations and performance tuning, I believe the additional overhead of path profiling would be outweighed by the performance gains from using path profiling instead of relying on heuristics. I'm not so sure about the test coverage application for reasons mentioned by others on this thread. |
Beta Was this translation helpful? Give feedback.
-
Reading this paper had two positive points for me:
Following the first point above, I am wondering about self-organizing, dynamic instrumentation and whether it makes sense to think about it. It would have to happen at runtime and through many iterations of executing a piece of code. In a nutshell, some runtime agent would have to execute the program in a controlled environment, collect data, and suggest some instrumentation for the next run. The process would continue either until convergence or until some measure of complexity or error has been reached. The result would be an instrumentation of the program, which would now be ready for the actual path profiling. A problem that I had with the paper was realizing that the algorithm itself provides the uniqueness guarantee for every path starting from every vertex in the CFG. This is what forces the authors to add some steps to the case in which loops in the CFG are broken via removing their backedges. While I understand how the extra steps help, I do not know why we need to preserve the uniqueness guarantee for every path - even ones not starting at ENTRY or not ending at EXIT. |
Beta Was this translation helpful? Give feedback.
-
Reading this paper reminded me of the importance in having a variety of profiling tools. For instance, when I look at a flame graph in my IDE I'm willing to have, say, 100% overhead (aka 2x factor in wall-clock time) in running my code to generate that visualization, because I don't generate it very often. I'll use this rough summary to rewrite a large amount of code (e.g. rethink my choice of data structures for a function that is taking up a lot of the execution time). I'm curious about real-world systems that use profile-directed compilation. I'm also not sure what the standards are for overhead--it's quite impressive that the overhead numbers they reported are small, but I feel like in most cases, it should be possible to use path profiles to measure the execution once on one version of the code and then use the results to guide optimizations to get the next version. |
Beta Was this translation helpful? Give feedback.
-
I found this research paper to be really interesting and shifted the way that I think about compilers. This paper made me realize the importance of a profile driven compiler, and how crucial profiling is for optimization. There were a few sections in particular that stood out to me that I had a few thoughts about. First, the extensions section stood out to me as the most exciting part of this algorithm. It suggests various directions for future applications. Which of these extensions do people find most promising and do you think any of these ideas have been explored further in later research? Another part that was interesting to me was the mention of path profiling being useful in areas beyond program optimization, like testing. One part of compilers that interests me is debugging, so I wonder if this algorithm could extend to adding debug support in compilers. |
Beta Was this translation helpful? Give feedback.
-
As a JIT enthusiast, this was an amazing read! I did have a couple open-ended questions/thoughts:
|
Beta Was this translation helpful? Give feedback.
-
I was very intrigued while reading this paper; path profiling in and of itself seems like quite a difficult task, and being able to efficiently compute and represent what could be up to an exponential number of potential paths seems like a near-impossible task at first. The algorithm that the paper presented is quite clever in that it uses the structure of the path itself to store its number, and as such, successfully takes advantage of the many cases where the number of paths in a program is more manageable, and the hash table optimization further takes advantage of the cases where the number of theoretical paths is high, but the number of actually executed paths is lower. This had me thinking about a more general idea; throughout all of my time as an undergrad CS student, worst-case complexity was emphasized when analyzing algorithms. Thus, my first instinct when I am asked to analyze an algorithm is to evaluate its worst-case complexity. However, as this paper implicitly takes advantage of when designing their algorithm, the worst-case complexity of an algorithm is not always indicative of its performance in practice (for instance, there are almost certainly comparatively very few programs in the real world for which this path profiling algorithm needs to truncate paths in order to run successfully). As such, when I first read the paper, my first instinct was to dismiss the algorithm as requiring exponential space "in the worst case," when in reality, not only do most cases not run into this issue, but the algorithm uses clever tricks to deal with those large cases, as opposed to designing the algorithm solely around the worst case. This general paradigm of designing algorithms for the average case and dealing with the worst-case separately seems quite useful when designing practical algorithms; is it the case that most algorithms for program analysis tasks like these follow this paradigm? |
Beta Was this translation helpful? Give feedback.
-
The paper is interesting to put into the context of today, where we're using more complicated and heterogeneous architectures for a lot of HPC problems. I think all the small progressions in modern computing, like SoC architectures, multi-core synchronization, parallel programming , etc., all add up and make this type of profiling less useful for optimizing compilers of today. After all, the SPEC CINT95 programs that the paper shows the most improvements on (in terms of % correct) are the ones that have explosions in the number of CF paths they create, like the game of go benchmark versus something like compression. I just don't think problems like that are very representative of the cutting edge today. Let me know if you disagree, I might have too narrow a perspective on this. Nonetheless, I definitely see the JIT argument for a profiler like this. But for AOT, my first thought still is to just use more structured control flow. That sounds easier than converting the CFG to a DAG, to a DAG with "chord" edges, then having different instrumentation for each edge type. |
Beta Was this translation helpful? Give feedback.
-
The authors propose an elegant algorithm and also did a great job explaining it. When I first came to the algorithm in Figure 5, it seems hard to understand. But when I look at Figure 6 just besides it showing a demonstration of Figure 5, I quickly get the point of this algorithm. I can also try to add up the values on the edge in my mind to test the algorithm. Figure 6 also makes it clear how the algorithm works efficiently: if there are m paths from a node to the end, then each new path point to it has to increase m to leave a space for these paths. It's also an elegant idea to use count[r]++ to count the number of times a path get visited. The paper was published almost 30 years ago, do we still see a lot of elegant algorithms in compiler research today? Or we are not seeing so many as problems get more complicated. |
Beta Was this translation helpful? Give feedback.
-
As many other comments have mentioned, I thought the algorithm described by the paper is a very neat approach to a problem that sounds pretty complicated. The results presented in the paper are also quite impressive, but I'm curious as to how optimizations utilizing path profiling perform compared to alternatives. I'm also wondering how these optimizations would be performed alongside some of the analysis we've discussed in class |
Beta Was this translation helpful? Give feedback.
-
One of the most though-provoking aspects of the paper is the trade-off between edge profiling and path profiling in terms of overhead. It mentions that path profiling has negligible overhead or is more performant for procedures with a small number of potential paths, while edge profiling could be more suitable for procedures with a large number of potential paths, as increase in overhead is correlated with path profiling on larger programs. For me, this seems a little counterintuitive -- if this is an analysis that can lead to more powerful program optimizations, it should be taken advantage better in large programs, but it is far slower for large programs. How might this affect the choice of profiling technique in real-world software development projects, and what factors would you consider when making this decision? Aside from this, I felt that the algorithm of placing instrumentation + assigning unique value paths was a little hard to follow, so I would love to go over it in discussion! |
Beta Was this translation helpful? Give feedback.
-
This was a really interesting topic to read about. As it's a very novel idea for me to use profiling information to optimize code at compile time. Beyond just the neatness and simplicity of the algorithm (which others have commented on), I think some of the applications of this beyond compiler optimizations are really interesting. One that the paper touched on is for performance analysis by identifying performance bottlenecks in specific paths of the program. Another that would be really interesting is to provide better debugging information, like determining what path the code took before a panic or segmentation fault. |
Beta Was this translation helpful? Give feedback.
-
I really enjoyed this paper! I guess I am a bit surprised that this did not exist before 1996 - the techniques used to come up with the algorithm and the proof are covered in introductory data structures and algorithms courses. I think it is pretty neat that a reasonably clever undergraduate today could conceivably come up with something like this? I found it particularly interesting how many areas that this algorithm could be applied to. Optimizations are obviously a big one, but it also helped make advanced in debugging and testing as well as code layout (for example, you can place frequently executed paths of instructions in contiguous memory locations to better utilize caching). I am particularly interested in seeing what research has been done in this area since this paper was released - I'd imagine that limitations such as loops and instrumentation complexity made it difficult to immediately adopt these results in industry. But we clearly use profilers a lot now, so I wonder what other advancements led to the technology becoming. more practical. |
Beta Was this translation helpful? Give feedback.
-
If I have know this in my CS4120 project, I would have produce a better solution for the Tree Register Allocation which I implemented it as our compiler backend. In our tree formation portion (cutting off some edges of CFG), there're several way to construct it, including EBB(extended basic block), dominator tree, and maximum spanning tree with path profiling. The path profiling would achieve the best performance experimented by path profiling. But we didn't have much idea how to implement it. |
Beta Was this translation helpful? Give feedback.
-
This is a discussion thread for the Profiling reading.
Please read the selected paper
Efficient Path Profiling
Thomas Ball and James R. Larus. MICRO 1996.
We will have in-class discussion on Tuesday September 19.
Some high-level questions to consider include but are certainly not limited to -
I (@rcplane) am the discussion leader and will try to answer any of your questions or direct you to useful informative resources.
Beta Was this translation helpful? Give feedback.
All reactions