Skip to content

Commit

Permalink
Simplify chain logic (#54)
Browse files Browse the repository at this point in the history
* flatten begin blocks anywhere in the chain

* adjust readme

* remove redundant readme parts

* add test

* bump version

* add changelog entry
  • Loading branch information
jkrumbiegel authored Feb 22, 2024
1 parent f9fae78 commit df30824
Show file tree
Hide file tree
Showing 5 changed files with 259 additions and 160 deletions.
47 changes: 47 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,50 @@
# v0.6

**Breaking**: The rules for transforming chains were simplified.
Before, there was the two-arg block syntax (this was the only syntax originally):

```julia
@chain x begin
y
z
end
```

the inline syntax:

```julia
@chain x y z
```

and the one-arg block syntax:

```julia
@chain begin
x
y
z
end
```
All of these are now a single syntax, derived from the rule that any `begin ... end` block in the inline syntax is flattened into its lines.
This means that you can also use multiple `begin ... end` blocks, and they can be in any position, which can be nice for interactive development of a chain in the REPL.

```julia
@chain x y begin
x
y
z
end u v w begin
g
h
i
end
```

This is only breaking if you were using a `begin ... end` block in the inline syntax at argument 3 or higher, but you also had to be using an underscore without chaining in that begin block, which is deemed quite unlikely given the intended use of the package.
All "normal" usage of the `@chain` macro should work as it did before.

As another consequence of the refactor, chains now do not error anymore for a single argument form `@chain x` but simply return `x`.

# v0.5

**Breaking**: The `@chain` macro now creates a `begin` block, not a `let` block.
Expand Down
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "Chain"
uuid = "8be319e6-bccf-4806-a6f7-6fae938471bc"
authors = ["Julius Krumbiegel"]
version = "0.5.0"
version = "0.6.0"

[compat]
julia = "1"
Expand Down
179 changes: 87 additions & 92 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,131 +67,150 @@ end

## Summary

Chain.jl defines the `@chain` macro. It takes a start value and a `begin ... end` block of expressions.
Chain.jl exports the `@chain` macro.

The result of each expression is fed into the next one using one of two rules:
This macro rewrites a series of expressions into a chain, where the result of one expression
is inserted into the next expression following certain rules.

1. **There is at least one underscore in the expression**
- every `_` is replaced with the result of the previous expression
2. **There is no underscore**
- the result of the previous expression is used as the first argument in the current expression, as long as it is a function call, a macro call or a symbol representing a function.
**Rule 1**

Lines that are prefaced with `@aside` are executed, but their result is not fed into the next pipeline step.
This is very useful to inspect pipeline state during debugging, for example.
Any `expr` that is a `begin ... end` block is flattened.
For example, these two pseudocodes are equivalent:

## Motivation
```julia
@chain a b c d e f

- The implicit first argument insertion is useful for many data pipeline scenarios, like `groupby`, `transform` and `combine` in DataFrames.jl
- The `_` syntax is there to either increase legibility or to use functions like `filter` or `map` which need the previous result as the second argument
- There is no need to type `|>` over and over
- Any line can be commented out or in without breaking syntax, there is no problem with dangling `|>` symbols
- The state of the pipeline can easily be checked with the `@aside` macro
- The `begin ... end` block marks very clearly where the macro is applied and works well with auto-indentation
- Because everything is just lines with separate expressions and not one huge function call, IDEs can show exactly in which line errors happened
- Pipe is a name defined by Base Julia which can lead to conflicts
@chain a begin
b
c
d
end e f
```

## Example
**Rule 2**

An example with a DataFrame:
Any expression but the first (in the flattened representation) will have the preceding result
inserted as its first argument, unless at least one underscore `_` is present.
In that case, all underscores will be replaced with the preceding result.

```julia
using DataFrames, Chain
If the expression is a symbol, the symbol is treated equivalently to a function call.

df = DataFrame(group = [1, 2, 1, 2, missing], weight = [1, 3, 5, 7, missing])
For example, the following code block

result = @chain df begin
dropmissing
filter(r -> r.weight < 6, _)
groupby(:group)
combine(:weight => sum => :total_weight)
```julia
@chain begin
x
f()
@g()
h
@i
j(123, _)
k(_, 123, _)
end
```

The chain block is equivalent to this:
is equivalent to

```julia
result = begin
local var"##1" = dropmissing(df)
local var"##2" = filter(r -> r.weight < 6, var"##1")
local var"##3" = groupby(var"##2", :group)
local var"##4" = combine(var"##3", :weight => sum => :total_weight)
begin
local temp1 = f(x)
local temp2 = @g(temp1)
local temp3 = h(temp2)
local temp4 = @i(temp3)
local temp5 = j(123, temp4)
local temp6 = k(temp5, 123, temp5)
end
```

## Alternative one-argument syntax
**Rule 3**

If your initial argument name is long and / or the chain's result is assigned to a long variable, it can look cleaner if the initial value is moved into the chain.
Here is such a long expression:
An expression that begins with `@aside` does not pass its result on to the following expression.
Instead, the result of the previous expression will be passed on.
This is meant for inspecting the state of the chain.
The expression within `@aside` will not get the previous result auto-inserted, you can use
underscores to reference it.

```julia
a_long_result_variable_name = @chain a_long_input_variable_name begin
do_something
do_something_else(parameter)
do_other_thing(parameter, _)
@chain begin
[1, 2, 3]
filter(isodd, _)
@aside @info "There are \$(length(_)) elements after filtering"
sum
end
```

This is equivalent to the following expression:
**Rule 4**

It is allowed to start an expression with a variable assignment.
In this case, the usual insertion rules apply to the right-hand side of that assignment.
This can be used to store intermediate results.

```julia
a_long_result_variable_name = @chain begin
a_long_input_variable_name
do_something
do_something_else(parameter)
do_other_thing(parameter, _)
@chain begin
[1, 2, 3]
filtered = filter(isodd, _)
sum
end

filtered == [1, 3]
```

## One-liner syntax
**Rule 5**

You can also use `@chain` as a one-liner, where no begin-end block is necessary.
This works well for short sequences that are still easy to parse visually without being on separate lines.
The `@.` macro may be used with a symbol to broadcast that function over the preceding result.

```julia
@chain 1:10 filter(isodd, _) sum sqrt
@chain begin
[1, 2, 3]
@. sqrt
end
```

## Variable assignments in the chain

You can prefix any of the expressions that Chain.jl can handle with a variable assignment.
The previous value will be spliced into the right-hand-side expression and the result will be available afterwards under the chosen variable name.
is equivalent to

```julia
@chain 1:10 begin
_ * 3
filtered = filter(iseven, _)
sum
@chain begin
[1, 2, 3]
sqrt.(_)
end

filtered == [6, 12, 18, 24, 30]
```

## The `@aside` macro

For debugging, it's often useful to look at values in the middle of a pipeline.
You can use the `@aside` macro to mark expressions that should not pass on their result.
For these expressions there is no implicit first argument spliced in if there is no `_`, because that would be impractical for most purposes.
## Motivation

If for example, we wanted to know how many groups were created after step 3, we could do this:
- The implicit first argument insertion is useful for many data pipeline scenarios, like `groupby`, `transform` and `combine` in DataFrames.jl
- The `_` syntax is there to either increase legibility or to use functions like `filter` or `map` which need the previous result as the second argument
- There is no need to type `|>` over and over
- Any line can be commented out or in without breaking syntax, there is no problem with dangling `|>` symbols
- The state of the pipeline can easily be checked with the `@aside` macro
- Flattening of `begin ... end` blocks allows you to split your chain over multiple lines
- Because everything is just lines with separate expressions and not one huge function call, IDEs can show exactly in which line errors happened
- Pipe is a name defined by Base Julia which can lead to conflicts

## Example

An example with a DataFrame:

```julia
using DataFrames, Chain

df = DataFrame(group = [1, 2, 1, 2, missing], weight = [1, 3, 5, 7, missing])

result = @chain df begin
dropmissing
filter(r -> r.weight < 6, _)
groupby(:group)
@aside println("There are $(length(_)) groups after step 3.")
combine(:weight => sum => :total_weight)
end
```

Which is again equivalent to this:
The chain block is equivalent to this:

```julia
result = begin
local var"##1" = dropmissing(df)
local var"##2" = filter(r -> r.weight < 6, var"##1")
local var"##3" = groupby(var"##2", :group)
println("There are $(length(var"##3")) groups after step 3.")
local var"##4" = combine(var"##3", :weight => sum => :total_weight)
end
```
Expand All @@ -214,27 +233,3 @@ You can use this, for example, in combination with the `@aside` macro if you nee
combine(:weight => sum => :total_weight)
end
```

## Rewriting Rules

Here is a list of equivalent expressions, where `_` is replaced by `prev` and the new variable is `next`.
In reality, each new variable simply gets a new name via `gensym`, which is guaranteed not to conflict with anything else.

| **Before** | **After** | **Comment** |
| :-- | :-- | :-- |
| `sum` | `next = sum(prev)` | Symbol gets expanded into function call |
| `sum()` | `next = sum(prev)` | First argument is inserted |
| `sum(_)` | `next = sum(prev)` | Call expression gets `_` replaced |
| `_ + 3` | `next = prev + 3` | Infix call expressions work the same way as other calls |
| `+(3)` | `next = prev + 3` | Infix notation with _ would look better, but this is also possible |
| `1 + 2` | `next = prev + 1 + 2` | This might feel weird, but `1 + 2` is a normal call expression |
| `filter(isodd, _)` | `next = filter(isodd, prev)` | Underscore can go anywhere |
| `@aside println(_)` | `println(prev)` | `println` without affecting the pipeline; using `_` |
| `@aside println("hello")` | `println("hello")` | `println` without affecting the pipeline; no implicit first arg |
| `@. sin` | `next = sin.(prev)` | Special-cased alternative to `sin.()` |
| `sin.()` | `next = sin.(prev)` | First argument is prepended for broadcast calls as well |
| `somefunc.(x)` | `next = somefunc.(prev, x)` | First argument is prepended for broadcast calls as well |
| `@somemacro` | `next = @somemacro(prev)` | Macro calls without arguments get an argument spliced in |
| `@somemacro(x)` | `next = @somemacro(prev, x)` | First argument splicing is the same as with functions |
| `@somemacro(x, _)` | `next = @somemacro(x, prev)` | Also underscore behavior |

Loading

2 comments on commit df30824

@jkrumbiegel
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JuliaRegistrator
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registration pull request created: JuliaRegistries/General/101429

Tip: Release Notes

Did you know you can add release notes too? Just add markdown formatted text underneath the comment after the text
"Release notes:" and it will be added to the registry PR, and if TagBot is installed it will also be added to the
release that TagBot creates. i.e.

@JuliaRegistrator register

Release notes:

## Breaking changes

- blah

To add them here just re-invoke and the PR will be updated.

Tagging

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if the Julia TagBot GitHub Action is installed, or can be done manually through the github interface, or via:

git tag -a v0.6.0 -m "<description of version>" df30824b1db5321ae29c6b2e02ff1147d4ba0fb3
git push origin v0.6.0

Please sign in to comment.