Compute and process Differentiation Request graph

Plan for dynamic graph - The relations between different differentiation requests can be modelled as a graph. For example, if `f_a` calls `f_b`, there will be two differentiation requests `df_a` and `df_b`, the edge between them can be understood as `created_because_of`. This also means that the functions called by the users to be explicitly differentiated (or `DiffRequests` created because of these) are the source nodes, i.e. no incoming edges. In most cases, this graph aligns with the call graph, but in some cases, the graph depends on the internal implementation, like the Hessian computation, which requires creating multiple `fwd_mode` requests followed by a `rev_mode` request. - We can use this graph to order the computation of differentiation requests. This was already being done implicitly in the initial recursive implementation. Whenever we encountered a call expression, we started differentiation of the called function; this was sort of like a depth-first search strategy. - This had problems, as `Clang` reported errors when it encountered a new function scope (of the derivative of the called function) in the middle of the old function scope (of the derivative of the callee function). It treated the nested one like a lambda expression. The issue regarding this: #745. - To fix this, an initial strategy was to eliminate the recursive approach. Hence, a queue-based breadth-first approach was implemented in this PR: #848. - Although it fixed the problem, the graph traversal was still implicit. We needed some way to compute/store the complete graph and possibly optimize it, such as converting edges to model the `requires_derivative_of` relation. Using this, we could proceed with differentiation in a topologically sorted ordering. - It also required one caveat: although we don't differentiate the called function completely in a recursive way, we still need to declare it so that we can have the call expression completed (i.e. `auto t0 = f_pushforward(...)`). - To move towards the final stage of having the complete graph computed before starting the differentiation, we need the complete information on how the `DiffRequest` will be formed inside the visitors (including arguments or `DVI` info). This whole approach will require activity analysis in the first pass. - As an incremental improvement, the first requirement was to implement infrastructure to support explicit modelling of the graph and use that to have a breadth-first traversal (and eventually topological ordering). This is the initial PR for capturing the differentiation plan in a graphical format. However, the traversal order is still breadth-first, as we don't have the complete graph in the first pass - mainly because of a lack of information about the args required for `pushforward` and `pullbacks`. This can be improved with the help of activity analysis to capture the complete graph in the first pass, processing the plan in a topologically sorted manner and pruning the graph for user-defined functions. I started this with this approach, and the initial experimental commit is available here for future reference: vaithak@82c0b42.
vgvassilev · Apr 30, 2024 · 3c248d1 · 3c248d1
1 parent d879f1b
commit 3c248d1
Show file tree

Hide file tree

Showing 12 changed files with 410 additions and 67 deletions.
diff --git a/include/clad/Differentiator/DerivativeBuilder.h b/include/clad/Differentiator/DerivativeBuilder.h
@@ -38,9 +38,6 @@ namespace clad {
     class CladPlugin;
     clang::FunctionDecl* ProcessDiffRequest(CladPlugin& P,
                                             DiffRequest& request);
-    // FIXME: This function should be removed and the entire plans array
-    // should be somehow made accessible to all the visitors.
-    void AddRequestToSchedule(CladPlugin& P, const DiffRequest& request);
   } // namespace plugin
 
 } // namespace clad
@@ -87,6 +84,7 @@ namespace clad {
     plugin::CladPlugin& m_CladPlugin;
     clang::ASTContext& m_Context;
     const DerivedFnCollector& m_DFC;
+    clad::DynamicGraph<DiffRequest>& m_DiffRequestGraph;
     std::unique_ptr<utils::StmtClone> m_NodeCloner;
     clang::NamespaceDecl* m_BuiltinDerivativesNSD;
     /// A reference to the model to use for error estimation (if any).
@@ -137,7 +135,8 @@ namespace clad {
 
   public:
     DerivativeBuilder(clang::Sema& S, plugin::CladPlugin& P,
-                      const DerivedFnCollector& DFC);
+                      const DerivedFnCollector& DFC,
+                      clad::DynamicGraph<DiffRequest>& DRG);
     ~DerivativeBuilder();
     /// Reset the model use for error estimation (if any).
     /// \param[in] estModel The error estimation model, can be either
@@ -172,6 +171,16 @@ namespace clad {
     ///
     /// \returns The derived function if found, nullptr otherwise.
     clang::FunctionDecl* FindDerivedFunction(const DiffRequest& request);
+    /// Add edge from current request to the given request in the DiffRequest
+    /// graph.
+    ///
+    /// \param[in] request The request to add the edge to.
+    void AddEdgeToGraph(const DiffRequest& request);
+    /// Add edge between two requests in the DiffRequest graph.
+    ///
+    /// \param[in] from The source request.
+    /// \param[in] to The destination request.
+    void AddEdgeToGraph(const DiffRequest& from, const DiffRequest& to);
   };
 
 } // end namespace clad

diff --git a/include/clad/Differentiator/DiffMode.h b/include/clad/Differentiator/DiffMode.h
@@ -15,6 +15,42 @@ enum class DiffMode {
   reverse_mode_forward_pass,
   error_estimation
 };
+
+/// Convert enum value to string.
+inline const char* DiffModeToString(DiffMode mode) {
+  switch (mode) {
+  case DiffMode::forward:
+    return "forward";
+  case DiffMode::vector_forward_mode:
+    return "vector_forward_mode";
+  case DiffMode::experimental_pushforward:
+    return "pushforward";
+  case DiffMode::experimental_pullback:
+    return "pullback";
+  case DiffMode::experimental_vector_pushforward:
+    return "vector_pushforward";
+  case DiffMode::reverse:
+    return "reverse";
+  case DiffMode::hessian:
+    return "hessian";
+  case DiffMode::jacobian:
+    return "jacobian";
+  case DiffMode::reverse_mode_forward_pass:
+    return "reverse_mode_forward_pass";
+  case DiffMode::error_estimation:
+    return "error_estimation";
+  default:
+    return "unknown";
+  }
+}
+
+/// Returns true if the given mode is a pullback/pushforward mode.
+inline bool IsPullbackOrPushforwardMode(DiffMode mode) {
+  return mode == DiffMode::experimental_pushforward ||
+         mode == DiffMode::experimental_pullback ||
+         mode == DiffMode::experimental_vector_pushforward ||
+         mode == DiffMode::reverse_mode_forward_pass;
+}
 }
 
 #endif
diff --git a/include/clad/Differentiator/DiffPlanner.h b/include/clad/Differentiator/DiffPlanner.h
@@ -1,10 +1,11 @@
 #ifndef CLAD_DIFF_PLANNER_H
 #define CLAD_DIFF_PLANNER_H
 
-#include "clad/Differentiator/DiffMode.h"
-#include "clad/Differentiator/ParseDiffArgsTypes.h"
 #include "clang/AST/RecursiveASTVisitor.h"
 #include "llvm/ADT/SmallSet.h"
+#include "clad/Differentiator/DiffMode.h"
+#include "clad/Differentiator/DynamicGraph.h"
+#include "clad/Differentiator/ParseDiffArgsTypes.h"
 
 namespace clang {
   class ASTContext;
@@ -90,9 +91,33 @@ struct DiffRequest {
   ///   3) If no argument is provided, a default argument is used. The
   ///      function will be differentiated w.r.t. to its every parameter.
   void UpdateDiffParamsInfo(clang::Sema& semaRef);
+
+  /// Define the == operator for DiffRequest.
+  bool operator==(const DiffRequest& other) const {
+    // either function match or previous declaration match
+    return (Function == other.Function ||
+            Function->getPreviousDecl() == other.Function ||
+            Function == other.Function->getPreviousDecl()) &&
+           BaseFunctionName == other.BaseFunctionName &&
+           CurrentDerivativeOrder == other.CurrentDerivativeOrder &&
+           RequestedDerivativeOrder == other.RequestedDerivativeOrder &&
+           CallContext == other.CallContext && Args == other.Args &&
+           Mode == other.Mode && EnableTBRAnalysis == other.EnableTBRAnalysis &&
+           DVI == other.DVI && use_enzyme == other.use_enzyme &&
+           DeclarationOnly == other.DeclarationOnly;
+  }
+
+  // String operator for printing the node.
+  operator std::string() const {
+    std::string res = BaseFunctionName + "__order_" +
+                      std::to_string(CurrentDerivativeOrder) + "__mode_" +
+                      DiffModeToString(Mode);
+    if (EnableTBRAnalysis)
+      res += "__TBR";
+    return res;
+  }
 };
 
-  using DiffSchedule = llvm::SmallVector<DiffRequest, 16>;
   using DiffInterval = std::vector<clang::SourceRange>;
 
   struct RequestOptions {
@@ -106,9 +131,9 @@ struct DiffRequest {
     ///
     DiffInterval& m_Interval;
 
-    /// The diff step-by-step plan for differentiation.
+    /// Graph to store the dependencies between different requests.
     ///
-    DiffSchedule& m_DiffPlans;
+    clad::DynamicGraph<DiffRequest>& m_DiffRequestGraph;
 
     /// If set it means that we need to find the called functions and
     /// add them for implicit diff.
@@ -120,12 +145,24 @@ struct DiffRequest {
 
   public:
     DiffCollector(clang::DeclGroupRef DGR, DiffInterval& Interval,
-                  DiffSchedule& plans, clang::Sema& S, RequestOptions& opts);
+                  clad::DynamicGraph<DiffRequest>& requestGraph, clang::Sema& S,
+                  RequestOptions& opts);
     bool VisitCallExpr(clang::CallExpr* E);
 
   private:
     bool isInInterval(clang::SourceLocation Loc) const;
   };
 }
 
+// Define the hash function for DiffRequest.
+template <> struct std::hash<clad::DiffRequest> {
+    std::size_t operator()(const clad::DiffRequest& DR) const {
+      // Use the function pointer as the hash of the DiffRequest, it
+      // is sufficient to break a reasonable number of collisions.
+      if (DR.Function->getPreviousDecl())
+        return std::hash<const void*>{}(DR.Function->getPreviousDecl());
+      return std::hash<const void*>{}(DR.Function);
+    }
+};
+
 #endif
diff --git a/include/clad/Differentiator/DynamicGraph.h b/include/clad/Differentiator/DynamicGraph.h
@@ -0,0 +1,159 @@
+#ifndef CLAD_DIFFERENTIATOR_DYNAMICGRAPH_H
+#define CLAD_DIFFERENTIATOR_DYNAMICGRAPH_H
+
+#include <algorithm>
+#include <functional>
+#include <iostream>
+#include <queue>
+#include <set>
+#include <unordered_map>
+#include <unordered_set>
+#include <vector>
+
+namespace clad {
+template <typename T> class DynamicGraph {
+private:
+  // Storing nodes in the graph. The index of the node in the vector is used as
+  // a unique identifier for the node in the adjacency list.
+  std::vector<T> m_nodes;
+
+  // Store the nodes in the graph as an unordered map from the node to a boolean
+  // indicating whether the node is processed or not. The second element in the
+  // pair is the id of the node in the nodes vector.
+  std::unordered_map<T, std::pair<bool, size_t>> m_nodeMap;
+
+  // Store the adjacency list for the graph. The adjacency list is a map from
+  // a node to the set of nodes that it has an edge to. We use integers inside
+  // the set to avoid copying the nodes.
+  std::unordered_map<size_t, std::set<size_t>> m_adjList;
+
+  // Set of source nodes in the graph.
+  std::set<size_t> m_sources;
+
+  // Store the id of the node being processed right now.
+  int m_currentId = -1; // -1 means no node is being processed.
+
+  // Maintain a queue of nodes to be processed next.
+  std::queue<size_t> m_toProcessQueue;
+
+public:
+  DynamicGraph() = default;
+
+  // Add an edge from src to dest
+  void addEdge(const T& src, const T& dest) {
+    std::pair<bool, size_t> srcInfo = addNode(src);
+    std::pair<bool, size_t> destInfo = addNode(dest);
+    size_t srcId = srcInfo.second;
+    size_t destId = destInfo.second;
+    m_adjList[srcId].insert(destId);
+  }
+
+  // Add a node to the graph
+  std::pair<bool, size_t> addNode(const T& node, bool isSource = false) {
+    if (m_nodeMap.find(node) == m_nodeMap.end()) {
+      size_t id = m_nodes.size();
+      m_nodes.push_back(node);
+      m_nodeMap[node] = {false, id}; // node is not processed yet.
+      m_adjList[id] = {};
+      if (isSource) {
+        m_sources.insert(id);
+        m_toProcessQueue.push(id);
+      }
+    }
+    return m_nodeMap[node];
+  }
+
+  // Adds the edge from the current node to the destination node.
+  void addEdgeToCurrentNode(const T& dest) {
+    if (m_currentId == -1)
+      return;
+    addEdge(m_nodes[m_currentId], dest);
+  }
+
+  // Set the current node to the node with the given id.
+  void setCurrentProcessingNode(const T& node) {
+    if (m_nodeMap.find(node) != m_nodeMap.end())
+      m_currentId = m_nodeMap[node].second;
+  }
+
+  // Mark the current node as processed.
+  void markCurrentNodeProcessed() {
+    if (m_currentId != -1) {
+      m_nodeMap[m_nodes[m_currentId]].first = true;
+      for (size_t destId : m_adjList[m_currentId])
+        if (!m_nodeMap[m_nodes[destId]].first)
+          m_toProcessQueue.push(destId);
+    }
+    m_currentId = -1;
+  }
+
+  // Get the nodes in the graph.
+  std::vector<T> getNodes() { return m_nodes; }
+
+  // Check if two nodes are connected in the graph.
+  bool isConnected(const T& src, const T& dest) {
+    if (m_nodeMap.find(src) == m_nodeMap.end() ||
+        m_nodeMap.find(dest) == m_nodeMap.end())
+      return false;
+    size_t srcId = m_nodeMap[src].second;
+    size_t destId = m_nodeMap[dest].second;
+    return m_adjList[srcId].find(destId) != m_adjList[srcId].end();
+  }
+
+  // Print the graph in a human-readable format.
+  void print() {
+    // First print the nodes with their insertion order.
+    for (const T& node : m_nodes) {
+      std::pair<bool, int> nodeInfo = m_nodeMap[node];
+      std::cout << (std::string)node << ": #" << nodeInfo.second;
+      if (m_sources.find(nodeInfo.second) != m_sources.end())
+        std::cout << " (source)";
+      if (nodeInfo.first)
+        std::cout << ", (done)\n";
+      else
+        std::cout << ", (unprocessed)\n";
+    }
+    // Then print the edges.
+    for (int i = 0; i < m_nodes.size(); i++)
+      for (size_t dest : m_adjList[i])
+        std::cout << i << " -> " << dest << "\n";
+  }
+
+  // Topological sort of the directed graph. If the graph is not a DAG, the
+  // result will be a partial order. Use a recursive dfs heler function to
+  // implement the topological sort. If a->b, then a will come before b in the
+  // topological sort. In reverseOrder mode, the result will be in reverse
+  // topological order, i.e a->b, then b will come before a in the result.
+  std::vector<T> topologicalSort(bool reverseOrder = false) {
+    std::vector<T> res;
+    std::unordered_set<size_t> visited;
+
+    std::function<void(size_t)> dfs = [&](size_t node) -> void {
+      visited.insert(node);
+      for (size_t dest : m_adjList[node])
+        if (visited.find(dest) == visited.end())
+          dfs(dest);
+      res.push_back(m_nodes[node]);
+    };
+    for (size_t source : m_sources)
+      if (visited.find(source) == visited.end())
+        dfs(source);
+
+    if (reverseOrder)
+      return res;
+    std::reverse(res.begin(), res.end());
+    return res;
+  }
+
+  // Get the next to process node from the queue of nodes to be processed.
+  T getNextToProcessNode() {
+    if (m_toProcessQueue.empty())
+      return T();
+    size_t nextId = m_toProcessQueue.front();
+    m_toProcessQueue.pop();
+    return m_nodes[nextId];
+  }
+};
+} // end namespace clad
+
+#endif // CLAD_DIFFERENTIATOR_DYNAMICGRAPH_H
diff --git a/lib/Differentiator/BaseForwardModeVisitor.cpp b/lib/Differentiator/BaseForwardModeVisitor.cpp
@@ -1168,8 +1168,8 @@ StmtDiff BaseForwardModeVisitor::VisitCallExpr(const CallExpr* CE) {
       // into the queue.
       pushforwardFnRequest.DeclarationOnly = false;
       pushforwardFnRequest.DerivedFDPrototype = pushforwardFD;
-      plugin::AddRequestToSchedule(m_CladPlugin, pushforwardFnRequest);
     }
+    m_Builder.AddEdgeToGraph(pushforwardFnRequest);
 
     if (pushforwardFD) {
       if (baseDiff.getExpr()) {