-
Notifications
You must be signed in to change notification settings - Fork 3
Vertex Edge Proposals
We currently allow direct modification of both Edge endpoint vertices and Vertex edgelists. This is an opportunity for graphs to become incoherent, as references are stored in both directions (Vertex <--> Edge). I propose that the constructors for Vertex and Edge take references to their enclosing entity(ies) as parameters, and add themselves automatically to the enclosing entity upon creation. If an edge is later re-attached to a different vertex, the edgelist for that vertex is automatically updated. Detaching an Edge at both ends will update the corresponding edge lists and leave it free-floating for garbage collection. There would be no other way to modify references in the enclosing entity than by going through the public methods of the enclosed entity (and perhaps doing so should be avoided after creation). Edgelists are then treated as an entirely internal detail.
Likewise, a Vertex would belong to a specific Graph at instantiation. Since the edgelists are now in the Vertex itself, each Vertex has a privileged relationship with a particular "main graph" (as opposed to overlay graph) and its internal edges are considered to be "in" that main graph. This is the graph that is passed to the Vertex constructor.
All the logic to do this can be in the Vertex and Edge abstract base classes (and Graph) which can be placed in a package apart from the concrete classes.
As a result, the (rather messy) idiom:
Graph graph = new Graph();
Vertex v1 = new SomeVertex(label, x, y, name);
v1 = graph.add(v1);
Vertex v2 = new SomeVertex(label, x, y, name);
v2 = graph.add(v2);
Edge e = new SomeEdge(v1, v2);
graph.addEdge(v1, v2);
becomes:
Graph graph = new Graph();
Vertex v1 = new SomeVertex(graph, x, y, name);
Vertex v2 = new SomeVertex(graph, x, y, name);
Edge e = new SomeEdge(v1, v2);
This is already implemented in the branch and seems to work fine.
It would be possible to just add temporary (StreetLocation) vertices and edges to the graph instead of checking a hashtable of extra edges at every iteration in the search algorithm. We would have to avoid concurrent access to edge lists. Something like Java.util.concurrent's copy-on-write arraylist would work well in our case since there should be very few modifications relative to reads.
Judging by a look through the source code, each CopyOnWriteArrayList would have its own reentrant lock object, which is a bit of a waste. Intrinsic locks on the edgelist should be sufficient - they are apparently almost as fast as explicit locks in recent JVMs, only add/remove methods would be synchronized, and we would have to be handling an immense number of requests before lock contention would be an issue anyway since edges are added/removed only once per request.
We could also provide special TemporaryVertex/Edge classes that could be cleaned out automatically if some misbehaving code left them laying around in the graph.
Related question: Do we consider the main Graph to be immutable by convention once it is serialized?
Vertex labels serve the same purpose as Java language features, but are less rigorous. The way labels are used now sometimes amounts to run-time ersatz typing, identifiers, and equality checking, and is error-prone.
The labels have their roots in Graphsever, where they provide a hashtable key and information about the type of vertex (sta- prefix for station or psv- for patternstop), and associate vertices with records in external databases. They also allow finding an equivalent, existing vertex later in the graph building process. These are not really concerns in OTP.
Additionally, label strings take up space and don't really need to be stored in the vertices. Short, unique, machine-and-human-readable labels might be of use in debugging, but rather than storing a user-defined label provided to the constructor, each Vertex should be able to build its own unique label based on its fields and its type.
I am also wary of the Graph.getVertex method that returns a reference to either a newly created vertex or an existing one in the graph, based solely on the label. Again, this makes a lot of sense in the context of minimalist, C-language Graphserver, but not so much in OTP. Uniqueness of vertices could be maintained by properly defining hashcodes and equality methods for vertices, but it's not even clear that enforcing uniqueness of vertices is a concern. The only time this is used is in graph building, during which we can maintain maps from Stop and OSM IDs to vertices. If we need to look up vertices by stopid, storing the stopID->vertex map in the graph is no problem.
In combination with proposal 1, the idiom:
IntersectionVertex tlIn =
(IntersectionVertex) graph.addVertex(
new IntersectionVertex("tl in", -74.1, 40.1));
becomes:
IntersectionVertex tlIn = new IntersectionVertex(graph, -74.1, 40.1);
I have been able to refactor graphbuilders to function without using any labels at all. Vertices can be recalled by name in tests by simply using clear Java identifiers or potentially a Map<String, Vertex>. Vertices would of course still have toString and getName methods. Together with unique integer ids these should be fine for debugging:
<Vertex id=12002345 (-122.1234, 45.3210) TransitStop agency=Trimet stopid=1234>
Edges are being cast to DirectEdges or EdgeNarratives all over the place to get their toVertex or other information. The code becomes much easier to read and understand if these three classes/interfaces are merged.
I think this change would only interfere with edges that produce multiple traversal results. I would personally prefer to stick to the definition of an edge as leading from one specific vertex to another, and address the multi-result case in a totally different way: Multiple traversal results could be handled by vertices that dynamically generate their edge iterables, or even a message-passing style where the actual method by which a particular class of vertex produces the next generation of traverse results would be encapsulated and irrelevant. See proposals 6 and 7 below.
In the mean time, awaiting discussion and consensus on a deeper solution, could an edge that does not lead to a specific vertex just have its toVertex set to null?
We currently store the geometry of each street segment twice, once for each direction. The turn-graph vertices at either end of a bidirectional bundle of turn edges could both refer to the same street segment object to avoid duplicating the geometry information.
Taking this a step further, we could keep entire OSM ways intact as objects, with their per-way safety features, names etc. Each street vertex could just contain a reference to a way object, along with a starting node offset and a length in number of nodes - positive for forward and negative for reverse. Split streets could even use non-integer lengths. Length in meters, turn costs, etc could be calculated using data in the way object and cached in the turn vertex. Methods to return street name, facility type-related data, and so on could just delegate to the way object.
If the details of edge lists are hidden (proposal 1), getOutgoing and getIncoming need not always return a static collection of edges. They could build the list on demand, or even return an iterable backed by something other than a collection, over anything implementing the Edge interface. I see this as a more truly general replacement for multi-destination edges. For example, a trip planner integrating real-time arrival information could extend TransitStop, overriding getOutgoing and getIncoming to return an iterator over objects implementing the Edge interface.
Taking proposal 6 one step further, note that a particular type of vertex often has only specific types of edges leading out of it. For example, turn vertices' edge lists could contain only turn edges, leading to other turn vertices. A given turn vertex might have 3 outgoing edges to 3 other turn vertices, all of which pass through the same street segment. If the turn vertex had a list of all adjacent turn vertices and turn restriction information, and a reference to the street segment it was positioned on, it could directly (without referencing any intervening edge objects) produce states for all adjacent vertices by traversing the street segment once, then starting from that single (temporary) result state, traversing all three of the (implicit) turns onto adjacent streets. Encapsulating the traversal mechanism allows it to be tailored to a given vertex/edge class. Often it would not even be necessary to store and reference edge objects.
As another example, consider transit stop pattern vertices. They always have one edge to the next pattern vertex, and one alight edge to an offboard stop. If pattern stop vertices had fields for the corresponding offboard stop and the next pattern stop, the actual edge objects would not even be necessary.
We could of course keep explicit edges around for the general case, but they would be hidden from the routing algorithms. The basic vertex class would still keep a list of its outgoing edges and have a method to return an iterable over the results of traversing them all. Other implementations with implicit edges would implement that method differently.
So a Vertex instance method traverseOutgoing(State s) would return Iterable. But should traverse really take a State parameter? That allows a contradiction between the initial vertex and the State passed in. Since States contain a reference to both their Vertex and the TraverseOptions that made them, including traversal direction, a new 0-arg instance method State.traverse() could return an Iterator over the results of all single-step transitions in the search space.
Beyond this one concrete change, the possibility I am raising with this proposal is that of moving core OTP toward a more truly object-oriented, message-passing style, rather than an essentially procedural one with a few hidden fields and get/set methods.