diff --git a/CHANGELOG.md b/CHANGELOG.md index bc75556..cd4b259 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,50 @@ +# v0.6 + +**Breaking**: The rules for transforming chains were simplified. +Before, there was the two-arg block syntax (this was the only syntax originally): + +```julia +@chain x begin + y + z +end +``` + +the inline syntax: + +```julia +@chain x y z +``` + +and the one-arg block syntax: + +```julia +@chain begin + x + y + z +end +``` +All of these are now a single syntax, derived from the rule that any `begin ... end` block in the inline syntax is flattened into its lines. +This means that you can also use multiple `begin ... end` blocks, and they can be in any position, which can be nice for interactive development of a chain in the REPL. + +```julia +@chain x y begin + x + y + z +end u v w begin + g + h + i +end +``` + +This is only breaking if you were using a `begin ... end` block in the inline syntax at argument 3 or higher, but you also had to be using an underscore without chaining in that begin block, which is deemed quite unlikely given the intended use of the package. +All "normal" usage of the `@chain` macro should work as it did before. + +As another consequence of the refactor, chains now do not error anymore for a single argument form `@chain x` but simply return `x`. + # v0.5 **Breaking**: The `@chain` macro now creates a `begin` block, not a `let` block. diff --git a/Project.toml b/Project.toml index 002cf3d..6cf60f0 100644 --- a/Project.toml +++ b/Project.toml @@ -1,7 +1,7 @@ name = "Chain" uuid = "8be319e6-bccf-4806-a6f7-6fae938471bc" authors = ["Julius Krumbiegel"] -version = "0.5.0" +version = "0.6.0" [compat] julia = "1" diff --git a/README.md b/README.md index 91f8926..f680eea 100644 --- a/README.md +++ b/README.md @@ -67,131 +67,150 @@ end ## Summary -Chain.jl defines the `@chain` macro. It takes a start value and a `begin ... end` block of expressions. +Chain.jl exports the `@chain` macro. -The result of each expression is fed into the next one using one of two rules: +This macro rewrites a series of expressions into a chain, where the result of one expression +is inserted into the next expression following certain rules. -1. **There is at least one underscore in the expression** - - every `_` is replaced with the result of the previous expression -2. **There is no underscore** - - the result of the previous expression is used as the first argument in the current expression, as long as it is a function call, a macro call or a symbol representing a function. +**Rule 1** -Lines that are prefaced with `@aside` are executed, but their result is not fed into the next pipeline step. -This is very useful to inspect pipeline state during debugging, for example. +Any `expr` that is a `begin ... end` block is flattened. +For example, these two pseudocodes are equivalent: -## Motivation +```julia +@chain a b c d e f -- The implicit first argument insertion is useful for many data pipeline scenarios, like `groupby`, `transform` and `combine` in DataFrames.jl -- The `_` syntax is there to either increase legibility or to use functions like `filter` or `map` which need the previous result as the second argument -- There is no need to type `|>` over and over -- Any line can be commented out or in without breaking syntax, there is no problem with dangling `|>` symbols -- The state of the pipeline can easily be checked with the `@aside` macro -- The `begin ... end` block marks very clearly where the macro is applied and works well with auto-indentation -- Because everything is just lines with separate expressions and not one huge function call, IDEs can show exactly in which line errors happened -- Pipe is a name defined by Base Julia which can lead to conflicts +@chain a begin + b + c + d +end e f +``` -## Example +**Rule 2** -An example with a DataFrame: +Any expression but the first (in the flattened representation) will have the preceding result +inserted as its first argument, unless at least one underscore `_` is present. +In that case, all underscores will be replaced with the preceding result. -```julia -using DataFrames, Chain +If the expression is a symbol, the symbol is treated equivalently to a function call. -df = DataFrame(group = [1, 2, 1, 2, missing], weight = [1, 3, 5, 7, missing]) +For example, the following code block -result = @chain df begin - dropmissing - filter(r -> r.weight < 6, _) - groupby(:group) - combine(:weight => sum => :total_weight) +```julia +@chain begin + x + f() + @g() + h + @i + j(123, _) + k(_, 123, _) end ``` -The chain block is equivalent to this: +is equivalent to ```julia -result = begin - local var"##1" = dropmissing(df) - local var"##2" = filter(r -> r.weight < 6, var"##1") - local var"##3" = groupby(var"##2", :group) - local var"##4" = combine(var"##3", :weight => sum => :total_weight) +begin + local temp1 = f(x) + local temp2 = @g(temp1) + local temp3 = h(temp2) + local temp4 = @i(temp3) + local temp5 = j(123, temp4) + local temp6 = k(temp5, 123, temp5) end ``` -## Alternative one-argument syntax +**Rule 3** -If your initial argument name is long and / or the chain's result is assigned to a long variable, it can look cleaner if the initial value is moved into the chain. -Here is such a long expression: +An expression that begins with `@aside` does not pass its result on to the following expression. +Instead, the result of the previous expression will be passed on. +This is meant for inspecting the state of the chain. +The expression within `@aside` will not get the previous result auto-inserted, you can use +underscores to reference it. ```julia -a_long_result_variable_name = @chain a_long_input_variable_name begin - do_something - do_something_else(parameter) - do_other_thing(parameter, _) +@chain begin + [1, 2, 3] + filter(isodd, _) + @aside @info "There are \$(length(_)) elements after filtering" + sum end ``` -This is equivalent to the following expression: +**Rule 4** + +It is allowed to start an expression with a variable assignment. +In this case, the usual insertion rules apply to the right-hand side of that assignment. +This can be used to store intermediate results. ```julia -a_long_result_variable_name = @chain begin - a_long_input_variable_name - do_something - do_something_else(parameter) - do_other_thing(parameter, _) +@chain begin + [1, 2, 3] + filtered = filter(isodd, _) + sum end + +filtered == [1, 3] ``` -## One-liner syntax +**Rule 5** -You can also use `@chain` as a one-liner, where no begin-end block is necessary. -This works well for short sequences that are still easy to parse visually without being on separate lines. +The `@.` macro may be used with a symbol to broadcast that function over the preceding result. ```julia -@chain 1:10 filter(isodd, _) sum sqrt +@chain begin + [1, 2, 3] + @. sqrt +end ``` -## Variable assignments in the chain - -You can prefix any of the expressions that Chain.jl can handle with a variable assignment. -The previous value will be spliced into the right-hand-side expression and the result will be available afterwards under the chosen variable name. +is equivalent to ```julia -@chain 1:10 begin - _ * 3 - filtered = filter(iseven, _) - sum +@chain begin + [1, 2, 3] + sqrt.(_) end - -filtered == [6, 12, 18, 24, 30] ``` -## The `@aside` macro -For debugging, it's often useful to look at values in the middle of a pipeline. -You can use the `@aside` macro to mark expressions that should not pass on their result. -For these expressions there is no implicit first argument spliced in if there is no `_`, because that would be impractical for most purposes. +## Motivation -If for example, we wanted to know how many groups were created after step 3, we could do this: +- The implicit first argument insertion is useful for many data pipeline scenarios, like `groupby`, `transform` and `combine` in DataFrames.jl +- The `_` syntax is there to either increase legibility or to use functions like `filter` or `map` which need the previous result as the second argument +- There is no need to type `|>` over and over +- Any line can be commented out or in without breaking syntax, there is no problem with dangling `|>` symbols +- The state of the pipeline can easily be checked with the `@aside` macro +- Flattening of `begin ... end` blocks allows you to split your chain over multiple lines +- Because everything is just lines with separate expressions and not one huge function call, IDEs can show exactly in which line errors happened +- Pipe is a name defined by Base Julia which can lead to conflicts + +## Example + +An example with a DataFrame: ```julia +using DataFrames, Chain + +df = DataFrame(group = [1, 2, 1, 2, missing], weight = [1, 3, 5, 7, missing]) + result = @chain df begin dropmissing filter(r -> r.weight < 6, _) groupby(:group) - @aside println("There are $(length(_)) groups after step 3.") combine(:weight => sum => :total_weight) end ``` -Which is again equivalent to this: +The chain block is equivalent to this: ```julia result = begin local var"##1" = dropmissing(df) local var"##2" = filter(r -> r.weight < 6, var"##1") local var"##3" = groupby(var"##2", :group) - println("There are $(length(var"##3")) groups after step 3.") local var"##4" = combine(var"##3", :weight => sum => :total_weight) end ``` @@ -214,27 +233,3 @@ You can use this, for example, in combination with the `@aside` macro if you nee combine(:weight => sum => :total_weight) end ``` - -## Rewriting Rules - -Here is a list of equivalent expressions, where `_` is replaced by `prev` and the new variable is `next`. -In reality, each new variable simply gets a new name via `gensym`, which is guaranteed not to conflict with anything else. - -| **Before** | **After** | **Comment** | -| :-- | :-- | :-- | -| `sum` | `next = sum(prev)` | Symbol gets expanded into function call | -| `sum()` | `next = sum(prev)` | First argument is inserted | -| `sum(_)` | `next = sum(prev)` | Call expression gets `_` replaced | -| `_ + 3` | `next = prev + 3` | Infix call expressions work the same way as other calls | -| `+(3)` | `next = prev + 3` | Infix notation with _ would look better, but this is also possible | -| `1 + 2` | `next = prev + 1 + 2` | This might feel weird, but `1 + 2` is a normal call expression | -| `filter(isodd, _)` | `next = filter(isodd, prev)` | Underscore can go anywhere | -| `@aside println(_)` | `println(prev)` | `println` without affecting the pipeline; using `_` | -| `@aside println("hello")` | `println("hello")` | `println` without affecting the pipeline; no implicit first arg | -| `@. sin` | `next = sin.(prev)` | Special-cased alternative to `sin.()` | -| `sin.()` | `next = sin.(prev)` | First argument is prepended for broadcast calls as well | -| `somefunc.(x)` | `next = somefunc.(prev, x)` | First argument is prepended for broadcast calls as well | -| `@somemacro` | `next = @somemacro(prev)` | Macro calls without arguments get an argument spliced in | -| `@somemacro(x)` | `next = @somemacro(prev, x)` | First argument splicing is the same as with functions | -| `@somemacro(x, _)` | `next = @somemacro(x, prev)` | Also underscore behavior | - diff --git a/src/Chain.jl b/src/Chain.jl index cda74ab..c6e10f1 100644 --- a/src/Chain.jl +++ b/src/Chain.jl @@ -127,57 +127,130 @@ function rewrite_chain_block(firstpart, block) end """ - @chain(initial_value, block::Expr) + @chain(expr, exprs...) -Rewrites a block expression to feed the result of each line into the next one. -The initial value is given by the first argument. +Rewrites a series of expressions into a chain, where the result of one expression +is inserted into the next expression following certain rules. -In all lines, underscores are replaced by the previous line's result. -If there are no underscores and the expression is a symbol, the symbol is rewritten -to a function call with the previous result as the only argument. -If there are no underscores and the expression is a function call or a macrocall, -the call has the previous result prepended as the first argument. +**Rule 1** -Example: +Any `expr` that is a `begin ... end` block is flattened. +For example, these two pseudocodes are equivalent: +```julia +@chain a b c d e f + +@chain a begin + b + c + d +end e f ``` -x = @chain [1, 2, 3] begin - filter(!=(2), _) - sqrt.(_) - sum + +**Rule 2** + +Any expression but the first (in the flattened representation) will have the preceding result +inserted as its first argument, unless at least one underscore `_` is present. +In that case, all underscores will be replaced with the preceding result. + +If the expression is a symbol, the symbol is treated equivalently to a function call. + +For example, the following code block + +```julia +@chain begin + x + f() + @g() + h + @i + j(123, _) + k(_, 123, _) end -x == sum(sqrt.(filter(!=(2), [1, 2, 3]))) ``` -""" -macro chain(initial_value, block::Expr) - if !(block.head == :block) - block = Expr(:block, block) - end - rewrite_chain_block(initial_value, block) + +is equivalent to + +```julia +begin + local temp1 = f(x) + local temp2 = @g(temp1) + local temp3 = h(temp2) + local temp4 = @i(temp3) + local temp5 = j(123, temp4) + local temp6 = k(temp5, 123, temp5) end +``` -""" - @chain(initial_value, args...) +**Rule 3** -Rewrites a series of argments, either expressions or symbols, to feed the result -of each line into the next one. The initial value is given by the first argument. +An expression that begins with `@aside` does not pass its result on to the following expression. +Instead, the result of the previous expression will be passed on. +This is meant for inspecting the state of the chain. +The expression within `@aside` will not get the previous result auto-inserted, you can use +underscores to reference it. -In all arguments, underscores are replaced by the argument's result. -If there are no underscores and the argument is a symbol, the symbol is rewritten -to a function call with the previous result as the only argument. -If there are no underscores and the argument is a function call or a macrocall, -the call has the previous result prepended as the first argument. +```julia +@chain begin + [1, 2, 3] + filter(isodd, _) + @aside @info "There are \$(length(_)) elements after filtering" + sum +end +``` -Example: +**Rule 4** + +It is allowed to start an expression with a variable assignment. +In this case, the usual insertion rules apply to the right-hand side of that assignment. +This can be used to store intermediate results. + +```julia +@chain begin + [1, 2, 3] + filtered = filter(isodd, _) + sum +end +filtered == [1, 3] ``` -x = @chain [1, 2, 3] filter(!=(2), _) sqrt.(_) sum -x == sum(sqrt.(filter(!=(2), [1, 2, 3]))) +**Rule 5** + +The `@.` macro may be used with a symbol to broadcast that function over the preceding result. + +```julia +@chain begin + [1, 2, 3] + @. sqrt +end ``` + +is equivalent to + +```julia +@chain begin + [1, 2, 3] + sqrt.(_) +end +``` + """ macro chain(initial_value, args...) - rewrite_chain_block(initial_value, Expr(:block, args...)) + block = flatten_to_single_block(initial_value, args...) + rewrite_chain_block(block) +end + +function flatten_to_single_block(args...) + blockargs = [] + for arg in args + if arg isa Expr && arg.head === :block + append!(blockargs, arg.args) + else + push!(blockargs, arg) + end + end + Expr(:block, blockargs...) end function rewrite_chain_block(block) @@ -234,34 +307,6 @@ function reconvert_docstrings!(args::Vector) args end -""" - @chain(block::Expr) - -Rewrites a block expression to feed the result of each line into the next one. -The first line serves as the initial value and is not rewritten. - -In all other lines, underscores are replaced by the previous line's result. -If there are no underscores and the expression is a symbol, the symbol is rewritten -to a function call with the previous result as the only argument. -If there are no underscores and the expression is a function call or a macrocall, -the call has the previous result prepended as the first argument. - -Example: - -``` -x = @chain begin - [1, 2, 3] - filter(!=(2), _) - sqrt.(_) - sum -end -x == sum(sqrt.(filter(!=(2), [1, 2, 3]))) -``` -""" -macro chain(block::Expr) - rewrite_chain_block(block) -end - function replace_underscores(expr::Expr, replacement) found_underscore = false diff --git a/test/runtests.jl b/test/runtests.jl index 6ab075f..431d0c6 100644 --- a/test/runtests.jl +++ b/test/runtests.jl @@ -90,19 +90,15 @@ end # the begin block will be different from the normal chain block here # only the last statement matters + # EDIT: this has changed with the simplification in 0.6, the begin block in the middle is flattened out y = @chain x begin _ .+ 1 _ .+ 2 end sum - @test y == sum(x .+ 2) + @test y == sum(x .+ 1 .+ 2) end @testset "invalid invocations" begin - # just one argument - @test_throws LoadError eval(quote - @chain [1, 2, 3] - end) - # let block @test_throws LoadError eval(quote @chain [1, 2, 3] let @@ -436,6 +432,22 @@ end end end +@testset "multiple begin end blocks" begin + @test "-9" == @chain 1:3 reverse begin + first + end _ ^ 2 begin + - + string + end + + @test_throws LoadError @eval @chain 1:3 reverse begin + first + _ ^ 2 + end begin + 123 # this can't be inserted into, only allowed in a first begin block + end +end + @testset "variable assignment syntax" begin result = @chain 1:10 begin x = filter(iseven, _)