You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In both of these, we could merge char and range transitions in a single RangeMap field. When the transition happens on a character the range will include a single character. This has a few advantages:
It simplifies things and adds no new or special cases to consider, as range transitions can already have just one character today.
I think this case may be handled poorly in the code generator today, e.g. maybe we're generating guards like x >= 'a' && x <= 'a'. We should check this.
When we implement DFA minimization (Implement DFA minimization #38) we will have to iterate all inputs of a partition (i.e. set of states). There we will have to merge (or actually, take difference of) characters and ranges. For example, in a partition, if a state has 'a' as input, another has ['a'-'z'] as input, we will have to test the inputs 'a' and ['b'-'z'] on the partition. I don't know how to implement this yet, but clearly we need to consider range-range overlaps and also range-char overlaps. Removing range vs. char distinction means less cases to handle.
Actually, all states in a partition need to handle the same range, otherwise they can be distinguished and so should be moved to separate partitions. So we will have [RangeMap] (one for each state in the partition), split the partition into new partitions where for every state in a new partition, the RangeMaps have the same domain. Then we can use any of the RangeMaps to make sure they map same inputs to same partitions. Having just RangeMaps simplifies this.
Disadvantage is that for char transitions we will need heap allocation for the Vec<Range>. Two ways to avoid that:
Make RangeMap an enum, with a variant for single-character ranges. With this we will still need to treat single-char ranges differently, but the handling will be encapsulated in RangeMap module. Outside of RangeMap we will only have ranges. (except in the codegen we will need to ask if a range is single-char to generate simpler conditionals)
The text was updated successfully, but these errors were encountered:
Currently we have these transitions in NFAs:
(I was confused for a few seconds here on why we have both
empty_transitions
andend_of_input_transitions
, we should document that "empty" is epsilon)and these in DFAs:
In both of these, we could merge char and range transitions in a single
RangeMap
field. When the transition happens on a character the range will include a single character. This has a few advantages:It simplifies things and adds no new or special cases to consider, as range transitions can already have just one character today.
x >= 'a' && x <= 'a'
. We should check this.When we implement DFA minimization (Implement DFA minimization #38) we will have to iterate all inputs of a partition (i.e. set of states). There we will have to merge (or actually, take difference of) characters and ranges.
For example, in a partition, if a state has'a'
as input, another has['a'-'z']
as input, we will have to test the inputs'a'
and['b'-'z']
on the partition. I don't know how to implement this yet, but clearly we need to consider range-range overlaps and also range-char overlaps. Removing range vs. char distinction means less cases to handle.Actually, all states in a partition need to handle the same range, otherwise they can be distinguished and so should be moved to separate partitions. So we will have
[RangeMap]
(one for each state in the partition), split the partition into new partitions where for every state in a new partition, theRangeMap
s have the same domain. Then we can use any of theRangeMap
s to make sure they map same inputs to same partitions. Having justRangeMap
s simplifies this.Disadvantage is that for char transitions we will need heap allocation for the
Vec<Range>
. Two ways to avoid that:SmallVec<[Range; 1]>
RangeMap
an enum, with a variant for single-character ranges. With this we will still need to treat single-char ranges differently, but the handling will be encapsulated inRangeMap
module. Outside ofRangeMap
we will only have ranges. (except in the codegen we will need to ask if a range is single-char to generate simpler conditionals)The text was updated successfully, but these errors were encountered: