ParseTree is great, it accesses the runtime AST (abstract syntax tree) and makes it possible to convert any object to ruby code & S-expression, BUT ParseTree doesn’t work for 1.9.* & JRuby.
RubyParser is great, and it works for any rubies (of course, not 100% compatible for 1.8.7 & 1.9.* syntax yet), BUT it works only with static code.
I truely enjoy using the above tools, but with my other projects, the absence of ParseTree on the different rubies is forcing me to hand-baked my own solution each time to extract the proc code i need at runtime. This is frustrating, the solution for each of them is never perfect, and i’m reinventing the wheel each time just to address a particular pattern of usage (using regexp kungfu).
Enough is enough, and now we have Sourcify, a unified solution to extract proc code. When ParseTree is available, it simply works as a thin wrapper round it, otherwise, it uses a home-baked ragel-generated scanner to extract the proc code. Further processing with RubyParser & Ruby2Ruby to ensure 100% with ParseTree (yup, there is no denying that i really like ParseTree).
The religiously standard way:
$ gem install ParseTree sourcify
Or on 1.9.* or JRuby:
$ gem install ruby_parser file-tail sourcify
Returns the code representation of the proc:
require 'sourcify' lambda { x + y }.to_source # >> "proc { (x + y) }" proc { x + y }.to_source # >> "proc { (x + y) }"
Like it or not, a lambda is represented as a proc when converted to source (exactly the same way as ParseTree). It is possible to only extract the body of the proc by passing in {:strip_enclosure => true}:
lambda { x + y }.to_source(:strip_enclosure => true) # >> "(x + y)" lambda {|i| i + 2 }.to_source(:strip_enclosure => true) # >> "(i + 2)"
Returns the S-expression of the proc:
require 'sourcify' x = 1 lambda { x + y }.to_sexp # >> s(:iter, # >> s(:call, nil, :proc, s(:arglist)), # >> nil, # >> s(:call, s(:lvar, :x), :+, s(:arglist, s(:call, nil, :y, s(:arglist)))))
To extract only the body of the proc:
lambda { x + y }.to_sexp(:strip_enclosure => true) # >> s(:call, s(:lvar, :x), :+, s(:arglist, s(:call, nil, :y, s(:arglist)))))
Unlike Proc#to_source, which returns code that retains only functional aspects, fetching of raw source returns the raw code enclosed within the proc, including fluff like comments:
lambda do |i| i+1 # (blah) end.to_raw_source # >> "proc do |i| # >> i+1 # (blah) # >> end"
NOTE: This is extracting of raw code, it relies on static code scanning (even when running in ParseTree mode), the gotchas for static code scanning always apply.
By default, this is only available on 1.9.*, it is added (as a bonus) to provide consistency under 1.8.*:
# /tmp/test.rb require 'sourcify' lambda { x + y }.source_location # >> ["/tmp/test.rb", 5]
IMPORTANT: These only work for MRI-1.9.2, as currently, only it supports (1) discovering of the original source location with Method#source_location, and (2) reliably determinig a method’s parameters with Method#parameters. Attempting to use these methods on other rubies will raise Sourcify::PlatformNotSupportedError.
NOTE: The following works for methods defined using both def .. end & Module#define_method. However, when a method is defined using the later approach, sourcify uses Proc#to_source to handle the processing, thus, the usual gotchas related to proc source extraction apply.
Returns the code representation of the method:
require 'sourcify' class MyMath def self.sum(x, y) x + y # (blah) end end MyMath.method(:sum).to_source # >> "def sum(x, y) # >> (x + y) # >> end"
Just like the Proc#to_source equivalent, u can set :strip_enclosure => true to extract only the body within.
Returns the S-expression of the method:
require 'sourcify' class MyMath def self.sum(x, y) x + y # (blah) end end MyMath.method(:sum).to_sexp >> s(:defn, >> :sum, >> s(:args, :x, :y), >> s(:scope, s(:block, s(:call, s(:lvar, :x), :+, s(:arglist, s(:lvar, :y))))))
Just like the Proc#to_sexp equivalent, u can set :strip_enclosure => true to extract only the body within.
Unlike Method#to_source, which returns code that retains only functional aspects, fetching of raw source returns the method’s raw code, including fluff like comments:
require 'sourcify' class MyMath def self.sum(x, y) x + y # (blah) end end MyMath.method(:sum).to_raw_source # >> "def sum(x, y) # >> x + y # (blah) # >> end"
Just like the Proc#to_raw_source equivalent, u can set :strip_enclosure => true to extract only the body within.
Performance is embarassing for now, benchmarking results for processing 500 procs (in the ObjectSpace of an average rails project) yiels the following:
ruby user system total real ruby-1.8.7-p299 (w ParseTree) 10.270000 0.010000 10.280000 ( 10.311430) ruby-1.8.7-p299 (static scanner) 14.120000 0.080000 14.200000 ( 14.283817) ruby-1.9.1-p376 (static scanner) 17.380000 0.050000 17.430000 ( 17.405966) jruby-1.5.2 (static scanner) 21.318000 0.000000 21.318000 ( 21.318000)
Since i’m still pretty new to ragel, the code scanner will probably become better & faster as my knowlegde & skills with ragel improve. Also, instead of generating a pure ruby scanner, we can generate native code (eg. C or java, or whatever) instead. As i’m a C & java noob, this will probably take some time to realize.
Nothing beats ParseTree’s ability to access the runtime AST, it is a very powerful feature. The scanner-based (static) implementation suffer the following gotchas:
Since static code analysis is involved, the subject code needs to physically exist within a file, meaning Proc#source_location must return the expected *[file, lineno]*, the following will not work:
def test eval('lambda { x + y }') end test.source_location # >> ["(eval)", 1] test.to_source # >> Sourcify::CannotParseEvalCodeError
The same applies to *Blah#to_proc* & *&:blah*:
klass = Class.new do def aa(&block); block ; end def bb; 1+2; end end klass.new.method(:bb).to_proc.to_source # >> Sourcify::CannotHandleCreatedOnTheFlyProcError klass.new.aa(&:bb).to_source # >> Sourcify::CannotHandleCreatedOnTheFlyProcError
Sometimes, we may have multiple procs on a line, Sourcify can handle this as long as the subject proc has arity that is unique from others:
# Yup, this works as expected :) b1 = lambda {|a| a+1 }; b2 = lambda { 1+2 } b2.to_source # >> proc { (1 + 2) } # Nope, this won't work :( b1 = lambda { 1+2 }; b2 = lambda { 2+3 } b2.to_source # >> raises Sourcify::MultipleMatchingProcsPerLineError
As observed, the above does not work when there are multiple procs having the same arity, on the same line. Furthermore, this bug under 1.8.* affects the accuracy of this approach.
To better narrow down the scanning, try:
-
passing in the {:attached_to => …} option
x = lambda { proc { :blah } } x.to_source # >> Sourcify::MultipleMatchingProcsPerLineError x.to_source(:attached_to => :lambda) # >> "proc { proc { :blah } }"
-
passing in the {:ignore_nested => …} option
x = lambda { lambda { :blah } } x.to_source # >> Sourcify::MultipleMatchingProcsPerLineError x.to_source(:ignore_nested => true) # >> "proc { lambda { :blah } }"
-
attaching a body matcher proc
x, y = lambda { def secret; 1; end }, lambda { :blah } x.to_source # >> Sourcify::MultipleMatchingProcsPerLineError x.to_source{|body| body =~ /^(.*\W|)def\W/ } # >> 'proc { def secret; 1; end }'
Pls refer to the rdoc for more details.
Under the hood, sourcify relies on RubyParser to yield s-expression, and since RubyParser does not yet fully handle 1.8.7 & 1.9.* syntax, you will get a nasty Racc::ParseError when you have any code that is not compatible with 1.8.6.
When a lambda has been created using the lambda operator “->”, sourcify can’t handle it:
x = ->{ :blah } x.to_source # >> Sourcify::NoMatchingProcError
Sourcify spec suite currently passes in the following rubies:
-
MRI-1.8.*, REE-1.8.7 (both ParseTree & static scanner modes)
-
JRuby-1.6.*, MRI-1.9.* (static scanner ONLY)
Besides its own spec suite, sourcify has also been tested to handle:
ObjectSpace.each_object(Proc) {|o| puts o.to_source }
For projects:
(TODO: the more the merrier)
Projects using sourcify include:
Sourcify is heavily inspired by many ideas gathered from the ruby community:
-
www.justskins.com/forums/breaking-ruby-code-into-117453.html
-
rubyquiz.com/quiz38.html (Florian Groß‘s solution)
The sad fact that Proc#to_source wouldn’t be available in the near future:
-
Fork the project.
-
Make your feature addition or bug fix.
-
Add tests for it. This is important so I don’t break it in a future version unintentionally.
-
Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
-
Send me a pull request. Bonus points for topic branches.
Copyright © 2010 NgTzeYang. See LICENSE for details.