-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generating strings with Instaparse #82
Comments
I've started work along these lines. I think I've covered most of the opts needed to generate all the strings from a simple ENBF. Ordered choice looks like no problem (same as alt) and +/- lookahead looks possible with a filter (just need to figure out the right way to create the appropriate predicate). The only really tricky part that seems left is generating strings based on regular expression. I think writing a compiler that takes regular expressions and turns them into combinatiors from instaparse, the same as was done in cfg.clj and a nbf.clj, should be enough to generate strings (assuming there would be no new opts to cover). |
Hi Zack, This is a really cool idea. Ever since your initial post several days ago Thanks! On Wed, Oct 1, 2014 at 6:11 PM, Citizen [email protected] wrote:
|
Excellent! When I was doing my initial research into generating strings the On Fri, Oct 3, 2014 at 12:01 AM, aengelberg [email protected]
|
https://github.com/aengelberg/instagenerate Presenting "instagenerate", a core.logic implementation of instaparse that --Alex On Fri, Oct 3, 2014 at 6:01 AM, Citizen [email protected] wrote:
|
This is nice. I've played with it some and it does what it says! It does an interesting number of things as I look closer. I don't know core.logic well enough to understand this, but it looks like it could complete partial strings of input as well. As in, you gave it both a partial input and a partial output and it could infer the results for both. Very neat! I think support for regex (and the jump to more general usefulness) has to be inside instaparse proper. Which, stepping back for a second, is an interesting project by itself. Combining regular expressions and core.logic (via instagenerate) seems like it could be an interesting crossroads. |
I've thought about regular expressions a bit, and one thing I'm not sure On Sunday, October 5, 2014, Citizen [email protected] wrote:
--Alex |
Infinity is something that has been perplexing me with regards to string generation. It's possible to get stuck on a I wonder, do you think it is possible to constrain the length of the string results and return them ordered by length? Like, yes we do have infinite strings, but we will only return the strings of length one then two then three and so on, in the spirit of iterative deepening depth first search. This seems possible with core.logic. If you add in the constraint that the string must be of length N and do something like: (->> (iterate inc 1) (map (partial generate-possible-strings-of-length-n parser)) flatten) |
I coincidentally came across this and thought I would mention that I've been working on a decently-robust test.check generator which takes a regex and generates matching strings -- I'm hoping to release it in this library in a day or two. It's occurred to me several times that a similar problem is generating strings from grammars, which was of course the original question in this thread. One thing to keep in mind wrt test.check in particular is that you get the most value when your generators are created using the built-in combinators so that you can take advantage of shrinking. My assumption is that core.logic approaches don't go this route. |
As part of research I'm working on for my PhD (browser render testing), I recently created instacheck which is basically instaparse meets test.check. It allows you to define EBNF grammars (instaparse compatible) which are translated into test.check generators. It has some command line modes for generating samples, running test cases/inputs against a test program (with shrinking), and also generating the translated clj generator code for a grammar. It even supports regex in the grammar thanks to the awesome test.chuck from @gfredericks. I just got back from presenting the project at Strange Loop. I think videos should post within the next week. One critical feature that instacheck supports is the ability to adjust the weights that are used for alternation/choice points in the grammar. This is super useful and probably critical for a lot of applications (although this wasn't obvious until I started using it in anger). Note that for the moment, instacheck requires patched version of test.check and instaparse. The test.check change to to get the frequency generator to shrink towards the more common weights rather than shrinking to whatever happens to be defined first. I should be able to get rid of this requirement by defining my own frequency generator that basically does weight sorting prior to calling the underlying test.check frequency generator. The changes I have to instaparse are to support the ability to tune the grammar weights: #183, #184. The first allows weights to be configured via comments embedded directly in the grammar. The second patch enables logging of the paths that are taken through a grammar. This allows you to tune weights according to existing test cases by parsing them with the grammar. |
Wow! That's awesome to see. Way to go, that's seriously cool, thank you for posting about it on this thread. |
Great work! Now that I understand the purpose of the pull requests, it will be easier to evaluate those. Thanks for this. |
I would like to generate a specific string from a speciifc parse tree (given a grammar, of course). Is this the right ticket for that? |
@lvh I'm not sure what you mean by that; you already have the string? and the parse tree? what "generation" is there left to do? maybe an example would help |
@gfredericks Right, the idea is you'd either generate a parse tree from scratch, or more likely modify a parse tree and then reserialize. So you can produce a semantic patch () but render it as a diff, for example. |
are you just saying you want to unparse a tree? for some grammars there would be more than a single valid output |
Sorry, lost track of this. Yes, that's what I'm saying. I'd be happy with any output; also happy to make the grammar capture more details of the input file to make the output as close as possible. |
I'm interested in generating strings with instaparse. By this, I mean I'd like to give instaparse a data structure to fill in and a grammar and have it return to me strings that could parse to something that would parse to the given data structure. We use instaparse a good deal in ECHELON and I'd like to use
test.check
to generate data for testing. I'm still learning how the QuickCheck style libraries work, but I think if we could generate strings via instaparse we could usetest.check
without too much trouble. I'm happy to develop this myself (and will continue to explore further), but I wanted to check in first to see if it is possible (or has already been quietly done).For clarification, I'd like a function that does something akin to the following:
I suppose I'd like to give instaparse a partial data structure corresponding to a "skeleton" of a parse tree and have it generate data to fill in the rest of the tree. I haven't thought through how this would play with transformations and the hiding of tags in instaparse yet. Does this seem possible? Is it a dumb thing to ask for?
The text was updated successfully, but these errors were encountered: