Replies: 18 comments 27 replies
-
Find my implementation here. I implemented LICM Implementation:Pretty much just the lecture notes, with some extra hints from the 4120 lecture notes, which actually present the analysis in an inverted way (start with everything LI, and mark it as non-LI). I had to figure out how to merge loops with the same header, and tried to do this conservatively, but ended up just merging every loop with the same header into one. There is probably a better way to do this, but this worked. Results:Running on the benchmarks, the average % speedup was 3.9, which seems nice. Most of this is coming from the programs generated by the typescript compiler, as they redeclare a lot of constants. The better handwritten ones would often have a slowdown of an instruction or two since the preheader being inserted caused some overhead. I guess I could've taken it out, but it caused enough headaches i just left it. Testing:brench + benchmarks for final testing, again most actual debugging happened in the REPL Challenges:Not to sound like a broken record, but I still have a poor cfg design. If I were to do this class over I would make a better abstraction that more easily facilitates adding/removing nodes/edges, as I have a lot of hidden invariants right now.
HOWEVER, this checks w/ pointer equality, if i want structural equality I have to write
which is hard to remember late at night. This is not limited to the Another challenge was loops with the same header, and how to insert preheaders. The 4120 notes say to merge them if they are disjoint, but I thought it would be more optimal not to when they are nested. This broke, so I ended up merging them anyways, and still have a 4% speedup. I also had some issues inserting the preheader as I naively assumed it was the unique predecessor to the header, but it's really the unique predecessor to the header outside the loop, which I learnt the hard way (this was also hella broken because of my bad inconsistent CFG) |
Beta Was this translation helpful? Give feedback.
-
SummaryCode is here, here and benchmarks where the benchmarks link contains the tests/benchmarks, the full csv with data and plotted bar chart. For this lesson, implemented loop invariant code motion. I tested the reuslts against a small test suite I wrote, and then also used the brenchmarks for the Bril core language as testing. In addition, I also used the same benchmarks to measure the impact of optimization. The optimization itself can be run via I had some challenges along the way, such as changing all jumps to the loop preheader rather than to the header in the case of backedges in the loop. This caused the optimization to perform poorly, and once I realized this mistake, fixing it also showed a performance improvement on the benchmarks. Writing the tests also revealed another challenge, in which I had problems trying move instructions out from the loop to the preheader. I found that I needed to move instructions from the loop in a specific order, where the operands to a instruction had to already be moved, before the instruction could be moved. Another issue is dealing with nested loops. At the present moment, I do not distinguish between inner loops and outer loops, with the inner loop nested in the outer one. Doing an optimization on the inner loop first might allow instructions moved to the inner loop's preheader to be once again moved to the preheader of the outer loop, allowing for further optimization impact. As an aside, to learn more about loop optimization, which I found interesting, I also choose to implement induction variable elimination, which I tested on some trivial examples to see whether it could detect simple instances of induction variables. Because I did not implement the optimizations to target the Memory extension of the Bril language, the induction variable elimination effect is much more limited. For this optimization, I also faced challenges, many of them pertaining to what kinds of basic induction variables I could detect, and how I could move their definitions. This also forced a more basic optimization. Because it was not the focus of this week's assignment, I did only basic tests on it, and did not measure the impact of the optimization. Optimization ResultsI found that the loop invariant code motion optimization was able to achieve an average of 3.5% improvement in the number of dynamic Bril instructions executed by the interpreter (with standard deviation of 5.6%), when compared to running the benchmark without loop invariant code motion. I use the same arguments that the benchmark writer speicified. The results come from evaluating the optimization on the Bril benchmarks that only use the Bril core language (Excluding Floating Point extensions and Memory Extensions). Because I measure the number of dynamic instructions using the interpreter, I do not have to measure many runs (unlike measuring an LLVM optimization, which when run on a machine, would have variability in timing.) A sample of the table with the data is included here, to avoid making the post too large, but note that this sample is not representative across all benchmarks, it is just the first 8 on my benchmarks ordering: Here benchmark refers to the benchmark name, run is either baseline, referring to the benchmark without loop invariant code motion applied, licm refers to the benchmark with loop invariant code motion applied, and result is the number of reported dynamic instructions when the brili interpreter is used.
A plot is also shown in the repository, and the full CSV is also in the repository. Overall, across all the benchmarks evaluated, Loop Invariant Code Motion reported no increase in the number of dynamic instructions. Indeed, for a majority of the benchmarks (15 of the 23 benchmarks), the number of dynamic insturctions executed was exactly the same as the number of dynamic instructions executed for the baseline. For a 8 of the 23 total benchmarks, a decrease in number of dynamic optimizations was achieved. For the benchmarks where a decrease was observed, the decrease was generally a result of pulling out instructions that were defining constants; these constants were then removed to the loop header, thus decreasing dynamic instruction count. Some limitations of these measurements is that working with dynamic instruction count fails to take into account other factors into the time the program executes. For instance, branching can affect the time by forcing the instruction pipeline to be flushed, and instructions may not all take the same amount of time to execute. In summary, the loop invariant code motion optimization improves performance of Bril Core programs by sometimes decreasing the number of dynamic instructions executed. A more interesting analysis would be to see if further optimization can be achieved by pairing loop invariant code motion with other optimizations, that eliminate dead code in the loops, or whether there are different loop scenarios that can be optimized even further. |
Beta Was this translation helpful? Give feedback.
-
OK, I tried implementing LICM in LLVM and got some weird results that I would like to discuss. My initial implementation is here IdeaI derived the following algorithm to implement LICM:
LimitationThe biggest limitation of this approach is moving of sub-basic blocks inside the loop if the whole sub-basic blocks are invariants. E.g. if there is a branch inside the loop that is also invariant as a whole with all its successors. IssuesAfter implementation I faced the issue that my optimization is not moving any code on the examples that I've tried. The reason is because Now the question is why clang generates literally all explicit assignments in my program as For a C program
clang generates
So the question is how to deal with this. I understand that using stack if totally fine for assignment operations, but how to detect invariants in this case? If I just replace Not workable workaroundI managed to bypass the UpdI tried replacing Upd_1Thanks @orkosinha for the suggestion, I was able to run the optimization after doing -mem2reg pass. Here is my list of commands for future reference:
|
Beta Was this translation helpful? Give feedback.
-
My implementation is here for LICM. I learned LLVM has some really powerful utilities, and one that I guess trivializes some aspects of LICM implementation is the LLVM Round 2So in my lesson 7 implementation, I ran into the same issue as @barabanshek ran into with new
with LLVM IR looking like
going to
Awesome, then I spent a couple hours back and forth trying to figure out how to integrate it into PassManager and what to So some success was had... Some brief implementationSo with the all powerful
to
So it's not very intelligent, working on that BenchmarkingI ran embench-iot and I randomly got this fluke result against this baseline. I do not think I'm running the benchmarks correctly but I will update once I get better data. I do have some pretty basic tests in my test directory, and the loop invariants are indeed in code motion out of their original loops. This resulted in an average 0.97 % speedup as per SummaryStill a good amount of work to do if I really want to dive deep with using LLVM. My implementation of LICM should be more rigorous and benchmarking (outside of fluke result) should be... also more rigorous. |
Beta Was this translation helpful? Give feedback.
-
I implemented LICM in llvm. The code can be found here. ImplementationOne unfortunate side effect of using llvm is that the -O1 already performs LICM along with many other analyses and the -O0 option leaves the code in a faux ssa form. I was able to significantly simplify the implementation of LICM by first running the code through the llvm mem2reg pass. This pass converts the clang generated llvm code that puts all variables onto the stack into a more traditional ssa style with virtual registers and phi nodes. This can be done by simply adding the pass to the pass manager when registering the LICM pass. To allow optimization under -O0, clang must also be provided with the Once the llvm code is in a "real" ssa form, LICM becomes much simpler. The pass is implemented as an llvm loop pass and makes use of the utilities for determining if an instruction has side effects and whether a value is loop invariant. In this setup, the code effectively follows the pseudocode provided in lecture. EvaluationI evaluated on a small set of benchmarks that I found on benchmarks game. These benchmarks were easier to work with because they are self contained, opposed to a more realistic benchmark suite like embench. For each benchmark, I profiled the application with the
We can see that LICM provides a small but reasonable speedup in most cases, with an average speedup of 1.4%. It is also likely that there are other benchmarks that would see a larger speedup, especially combined with other optimizations. The one weird case is madelbrot which saw an average slowdown of 2%. I don't quite understand why this is happening, other than potential variance in the results because of unpredictable behavior. SummaryOverall, I found working with LLVM to be pretty pleasant. Most issues I had were well documented with simple fixes. The hardest part was figuring out how to get llvm code into the "real" ssa form with mem2reg. |
Beta Was this translation helpful? Give feedback.
-
IntroductionI implemented LICM using LLVM. Setting upWhen compiling sample programs to LLVM IR, I realized that clang was generating code that used
I used
Note the disable-O0-optnone option. `clang`` started to add optnone attribute to each function, which prevents further optimizations afterwards including mem2reg pass. To prevent that, add -Xclang -disable-O0-optnone to clang source Now we are ready to write our pass. Setting up the passI created a The first interesting thing we needed to do was to run
The At this point we can use In
TestingTo test, I took benches from the embench-iot benchmars For each of these benchmarks I calculated the time it took with no optimization, the time it took with optimization. The results are summarized by the following table:
You can see that for some programs we saw a speedup and for some we saw a slight slow down. SummaryOverall, I enjoyed working in LLVM. The existing architecture made my life much easier. I do wish the documentation for LLVM was better! Next steps with this implementation would be to assert the same functionality between Base and LICM benchmarks. I was not sure how to do this so I wrote some sample programs to assert by hand. |
Beta Was this translation helpful? Give feedback.
-
In this task, I implemented a loop reordering/interchanging pass in LLVM. My code can be found here. I reused some facilities from my loop analysis pass in Lesson 7, and again compiled and run the program using LLVM 14 with the new pass manager. ImplementationAt first, I thought it would be easy to interchange two loops just like cutting and pasting the for-loop header in an IDE, but later I found it was very troublesome to do it in LLVM IR. Since a loop is organized as several basic blocks as shown in the following figure (source from LLVM Tutorial), we need to carefully move those blocks and change their inter-relationship. The inputs of my reordering pass are two loops (outer loop and inner loop). I firstly extract all the loop preheaders, headers, latches, and exits from the loops, and then use There is one more tricky thing here. If the loop to be reordered is the top-level loop, its preheader may be the entry block of the function. In this case, we need to separate the block and create a new preheader block for that loop. Also, the exit block may also connect with the following blocks, so it needs to be separated as well. For the programming interface, I allow users to specify which loops to be interchanged and can do multiple interchanges in one pass. EvaluationAs a golden test case, I tested the three-nested-loop GEMM on my machine, and consider 6 possible loop permutations to see the performance of different traversal orders. I set the matrix size to be 1024x1024, and my CPU is Intel Xeon Silver 4214 with 16.5MB L3 cache. for (int i = 0; i < SIZE; ++i)
for (int j = 0; j < SIZE; ++j)
for (int k = 0; k < SIZE; ++k)
C[i][j] += A[i][k] * B[k][j]; The results are shown below. The speedups are normalized by the time of the original
Based on this table, we can obtain the following speedup comparison. The conclusion is the same as the GEMM example (Fig 6.46) in the famous introductory book CSAPP. If the traversal order is the same as how the data is organized (
|
Beta Was this translation helpful? Give feedback.
-
Loop OptimizationI also went the route of implementing LICM in LLVM. It was surprisingly easy given LLVM utilities like BenchmarkI wrote a Makefile to compile my programs with and without my pass (i.e
On these three programs I do not see any significant improvement in wall-clock execution times. Loop Optimization: Optimized AgainI sat down to implement LICM for Bril and I went pretty far. There are some corner cases that I currently ignore and possibly a bug somewhere that makes this particular benchmark (cholesky.bril) incorrect but I am giving up for now. My code is here. The implementation was fairly close to what the lecture describes. The function There are some caveats though:
Feelings and EvaluationSSA is downright pleasant to work with. I did not even need to implement a reaching definitions pass, I just checked whether a variable was defined outside the body of the loop to check if it was trivially invariant. I ran my pass over all Bril benchmarks except the one mentioned before. I would say, though, that seeing the optimized programs with all the right instructions hoisted to the right place (especially in programs with nested loops) was more rewarding than the 3.617% average speedup.
|
Beta Was this translation helpful? Give feedback.
-
My code is here. Phew, I found this assignment rather challenging. I didn't actually have to write very much code, largely because I was able to use built-in LLVM methods to do the heavy lifting. However, this also meant that I was playing with lots of code that I didn't understand. There was lots of monkey-see-monkey-do using all manner of internet sources, followed by silly experimentation of my own. Note that I didn't use the I tinkered with The following resource was a real lifesaver, although it steered me towards I then got into trouble with I grabbed the Embench test suite and ran my pass on all of it. Some functions were still segfaulting mysteriously (I suspect I am moving other delicate things, no just branches...) but I just nixed those programs. Some very handy cherrypicking! Finally, I don't think I deserve any points for evaluation. I have tested my pass by eye, basically just by printing out which instructions I moved up. However, I struggled to figure out an actual evaluation tool fully linked up and deployed. None of the existing guides for |
Beta Was this translation helpful? Give feedback.
-
My code is here LICM: Loop-Invariant Code MotionI use LLVM to implement LICM for this task. In the beginning, I'm trying to follow the pseudo code for identifying loop-invariant instructions by checking the reaching definitions. But then looking through the method of Therefore, the implementation is just simply go over all the instructions of the loop, and call Rigorously evaluate its performance impact.If you choose LLVM, select an existing (small!) benchmark suite such as Embench. I'm still figuring out how to use Commands
DiscussionsAs LLVM already has corresponding helper function to handle loop-invariant detecting, the hardest part of this task for me is those tricky things with |
Beta Was this translation helpful? Give feedback.
-
Bril - PythonMy implementation is summarized in the following files: In fact, I tried to create something that looks like with the corresponding abstractions in LLVM. The main classes are the following: LoopRepresents a loop in the input code. It consists of the blocks that belong to that specific loop, and contains some functionality like the following:
LoopPassSimilar to the LLVM's Loop Pass. The ExampleAn example of my
Experimental EvaluationI selected a couple of benchmarks of the
LLVMI also did quick implementation in LLVM as well, however, due to it's simplicity I doubt that it's correct. My code can be found here. I am following the algorithm presented in the class, which iterates until convergence. In order to check if an instruction is invariant, I am checking if all of its arguments are invariant in the following loop: bool invariant = true;
for (op = 0; op < inst.getNumOperands(); ++op) {
if (!L->isLoopInvariant(inst.getOperand(op))) {
invariant = false;
}
} If so, I am marking the instruction as invariant as follows: if (invariant) {
L->makeLoopInvariant(&inst, changed);
} I took some ideas from the LLVM's LICM implementation, which looks way more complex. I did a quick performance test using this loop, but I am not seeing any improvements, in terms of execution time. |
Beta Was this translation helpful? Give feedback.
-
My implementation of loop optimization is here: link. Loop Invariant Code MotionI chose LICM as the loop optimization pass to implement. I wrote a loop pass that checks if an instruction is loop-invariant, and then moves the loop-invariant instructions to the pre-header. Testing with a real-life benchmarkI was able to figure out how to setup the Embench benchmark on my machine. I also modified its compilation script and added an option to run my pass during compilation:
ResultsNote that the Embench test suite by default use
DiscussionFor most test cases, we observe that LICM consistently brings speedup ranging from 0.01ms – 3.44ms. A few cases has a bit slow down, which could be caused by the change in instruction alignment. The other thing I learned is that clang doesn't generate SSA form IR by default. Instead, we get pointers load/stores. I noticed that in the last assignment but I just went on with it. This time, I learned that there's an LLVM pass called |
Beta Was this translation helpful? Give feedback.
-
I implemented LICM in Bril. My implementation, script for evaluation, and data from evaluation can be found here. Running LICMMy implementation of LICM can be ran simply by running:
This will produce a JSON representation of the LICM version of the original program. Challenges and LimitationsI had a lot of challenges trying to figure out/deal with the various specifics of the algorithm: finding the natural loops, finding loop invariants (especially the part about tracking the reaching definitions), and filtering out loop invariants that couldn't be moved out of the loop. But, grappling through each of these sub-parts really helped me understand the precise definition of natural loops and what a loop invariant consists of. So I consider this exercise to have helped me further my understanding! (I anticipate there are lots of bugs in the current version still, which I hope to come back to and improve after I work on SSA, as noted below) It's also astonishing how much one has to deal with to perform a seemingly simple optimization such as LICM. An unfortunate limitation due to me catching up right now is that I haven't implemented SSA yet! So I bet that there are a lot more instructions I would be able to optimize, and with more cleaner approaches, if I was working in the SSA form of Bril. One would notice that my LICM pass didn't have much of an effect on the Bril benchmark's performance, and I'm guessing lack of SSA is a major reason for that. I hope to run this LICM pass on the SSA form once I get that implemented, and fix bugs when I find them. EvaluationI ran both the non-LICM version and LICM versions of all bril benchmarks, and compared the number of dynamic instructions executed. The experiment can be replicated by running the command:
where
|
Beta Was this translation helpful? Give feedback.
-
My terrible LICM implementation is here ImplementationI did loop invariant code motion. However judging from the benchmarks there doesn't seem to be much motion going on... Anyways this assignment was much more challenging than I expected and making my main data structures immutable definitely made the job 10 times harder. My workflow is roughly:
Testing and performance analysisI did incremental testing. First of all, if I just call Currently if I perform LICM, a majority of the benchmarks from the bril repo fail. I suspect this is because there is dependency between some of the loop invariant instructions in some loop, and I didn't hoist them in the right order or so. Unfortunately despite spending many hours on this assignment, the result wasn't as ideal as I hoped. ReplicationTo replicate my experiment, simply run |
Beta Was this translation helpful? Give feedback.
-
I implemented LICM! My code is here. ImplementationMy main challenge for this implementation was getting sick. After I recovered reasonably well I was able to finish the assignment. Finding loopsBefore I got sick, I got most of the way to finding the loops. I split this into finding the back edges, then finding the blocks in the loop for each back edge. Finding backedgesThis was pretty straightforward, since a backedge is just whenever one node A dominated by another node B has that dominating node B as onee of its successors. In other words, whenever one node dominates a node which can go back to it, there is a backedge. I just used my old code to get the dominators of each node, and whenever a node's dominator was also one of its successors, we had a backedge! Finding loop contentsThis was trickier because we had a definition but not an algorithm. Then I realized that the fact that we wanted a minimum set meant that I could just start with the two elements definitely in the set (the header and the backedge node), and then add everything necessary for the definition to fit, unless I ran into anything that broke the natural loop. So I just did an iterate-until-convergence of adding the predecessors for every node except the header, unless that predecessor wasn't dominated by the header. If the predecessor wasn't dominated by the header, this broke the natural loop, since it meant there were multiple entry points, and the whole set couldn't be a natural loop. If every predecessor was dominated by the header, then this meant that it was part of the natural loop. Iterating to find all such predecessors then gives the minimal set such that for each node all of the set predecessors are in the set, or its the header. Implementing LICMAfter recovering, I actually implemented the code that did loop-invariant code motion. There are some comments on the analysis, and then I talk about the two main parts where I marked instructions loop invariant and then moved them if they were safe to move. Code analysis cachingNoticing that I was passing around the whole function contents and then frequently re-analyzing it to find predecessors, successors, dominators, etc., and inspired by LLVM, I started a tiny refactor to have the option of caching and reusing analyses. The basic design is that each function has access to an analysis dict which it can use to either find an analysis that already happened, or add to. By passing around the analysis object between functions that should share a context (i.e. analyzing the same function), this can cache the analysis so that things only need to recompute the analysis when they need to. This code is mostly here. I didn't rewrite all of my code to use this pattern, but all of this assignment is written that way. Reachability and other analysesFirst, I realized that I needed to do analyses of variable use and definition, so I implemented those. Then I noticed that I didn't actually know where the loop exits were, so I said that a loop node was an exit node if it had any successors not in the loop. Using Python's I managed to reuse my reachability analysis from before. My previous reachability analysis from the dataflow assignment was just PreheadersMy basic block list is set up as an OrderedDict, so it always remembers the order that it read the basic blocks in, so that makes it pretty easy to just concatenate all the instructions together in order to rebuild the code. I just represented the preheader as a list of instructions that I would splice in after a label when reconstructing the code. Marking instructions loop invariantI did a pretty direct translation of the pseudocode, with a maybe-wonky control flow based on Python's Moving instructions when its safeThis was pretty okay, I don't really have comments on this. One more thing...One question that I didn't really know what the answer to was how to handle the fact that the same blocks can be in multiple lines. My implementation had a bug where I would rename all of the labels pointing to a loop header from outside the loop to point to the preheader instead. This makes sense, but if two loops shared a header (like in To fix this, whenever loops had the same header and shared any nodes I merged them into one big loop. This seemed to work, but I wouldn't be that surprised if it decreased the amount of code that it could move since it might care too much about what happens in other loops. Another way to deal with this would probably just have been to say that if a AnalysisI finally used You can run my brench evaluation with Besides a few programs that were missing in the base case (
Basically it looks like whenever loop invariant code motion can be used, it helps at least a little. Quadratic somehow gets this huge 16% improvement, probably since 3/21 instructions in |
Beta Was this translation helpful? Give feedback.
-
My implementation of LICM is here ImplementationI'm sorry that this is so late. Hopefully the arsonist will chill out and my room won't flood in the near future 🥴. Natural loopsTo find natural loops, I followed the pseudocode in these notes, so this task was pretty straightforward Reaching DefinitionsI should have known reaching definitions would be useful later in the semester! I did not implement this for the data flow analysis assignment, so I had to implement it for this assignment. This was not too difficult, but I didn't realize until later on that I should be storing the reaching definitions by line rather than by block. So I did have to make some changes after I realized this. LICMMost of the LICM implementation wasn't too difficult, but I did have a lot of trouble with moving instructions to the preheader. At first I was appending safe-to-move instructions to the end of the preheader, but this didn't work if the last instruction was a branch or return. I was getting errors for undefined values, which makes sense because the program was branching before executing the loop-invariant instruction. Then I decided to add terminators to each block before inserting instructions as the second-to-last element in the preheader. For some reason this approach didn't work either. I tried to figure out why for about an hour or two before giving up (I didn't like this approach anyways because for programs that don't identify any loop-invariant code to move, the number of instructions executed actually increases from the baseline). I ultimately went with the approach of checking whether the last instruction is a terminator. If it is, I insert the loop-invariant instruction right before the terminator, and if not, I append it to the end of the preheader. I'm don't think that all the benchmarks cover all the cases for determining whether an instruction is safe to move. I could have come up with some test cases to see if these cases are accounted for. However, I am already very late on this assignment, and all the benchmarks have the right output, so I am happy with that. EvaluationI used brench to test my implementation. The command is
A lot of benchmarks have 0% improvement due to lack of loop-invariant code, but I'm really impressed by these results! |
Beta Was this translation helpful? Give feedback.
-
My implementation of LICM in LLVM is here ImplementationI mainly used analysis tools provided by
If all the operands are loop invariant while the instruction is not in a subloop and does not create any side effects, we should be able to do LICM on this value safely. I have also seen some other definitions on LICM safety restrictions, however, I was not quite sure if theirs is more restrictive or less restrictive or equivalent to what I implemented, since the checking of loop invariant is done by LLVM and is shadowed by multiple layers of LLVM implementation. There is also one rule which I believe should be added but I couldn't figure out how. Basically we would like to make sure that the instruction we are moving out of the loop would actually run within the loop. Evaluation
The performance looks horrible. It's even slower after LICM. I think the reason might be that the benefit from LICM cannot justify the extra cost. Or maybe my implementation isn't right. |
Beta Was this translation helpful? Give feedback.
-
My implementation of LICM can be found here. ImplementationTo hoist loop invariant expressions, we must first identify the loops and then identify all loop invariant expressions. To identify loop invariant expressions, I used a kind of dataflow analysis that begins by assuming that all expressions are loop invariant and marking as non-loop invariant the variables which are defined more than once within the loop and the expressions which rely on variable which are not-loop invariant variable or that are live into the loop. EvaluationI used brench to evaluate the performance of this optimization. On average, LICM resulted in a 1.7% speedup on the benchmarks. Here are the results for the benchmarks with the largest percent incrase:
A lot of the benchmarks don't see any improvement, presumably because they lack loop-invariable code, but overall I am pleased with these results! DrawbacksAs discussed above, this implementation forfeits several opportunities for optimization. Additionally, it doesn't consider nested loops with the same head. It simply may not have come across this in the benchmarks; however, this would create two blocks with the same label which could cause trouble. I did not run into this specific issue on the benchmarks. |
Beta Was this translation helpful? Give feedback.
-
Which loop optimization did you pick?
Beta Was this translation helpful? Give feedback.
All reactions