Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a parser for BUGS #52

Merged
merged 33 commits into from
Aug 7, 2023
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
f211466
some prototypes, not working yet
sunxd3 Jul 14, 2023
73c10f0
more functions
sunxd3 Jul 15, 2023
dbe0210
old and new progress
sunxd3 Jul 17, 2023
985a3ed
cleanup
sunxd3 Jul 17, 2023
b03ea66
add `parse` function
sunxd3 Jul 17, 2023
cccf2e3
add some comments
sunxd3 Jul 18, 2023
f46e0a8
move tests to test file, add doc
sunxd3 Jul 23, 2023
8fd97af
Merge remote-tracking branch 'origin/master' into parser
sunxd3 Jul 25, 2023
30c6b11
move `parser.jl` to `src`
sunxd3 Jul 25, 2023
b21a29d
Add type restriction to `_eval`
sunxd3 Jul 25, 2023
d62594f
Multiple fix
sunxd3 Jul 25, 2023
416cc1d
formatting
sunxd3 Jul 25, 2023
14e132e
light format tests, need more and better ones
sunxd3 Jul 25, 2023
eff7b22
formatting related to `_eval`
sunxd3 Jul 25, 2023
9b8a606
Move functions
sunxd3 Jul 26, 2023
e213daa
Add Methadone example, improve to parser
sunxd3 Jul 31, 2023
6ff3e95
add some error cases
sunxd3 Jul 31, 2023
0b7be92
Volume 1-4 from MultiBUGS work with parser
sunxd3 Jul 31, 2023
acab8b6
check Julia keywords
sunxd3 Aug 1, 2023
6cba0a1
improve code doc
sunxd3 Aug 1, 2023
fe2ad9e
Some code improvement and notes on error recovery
sunxd3 Aug 1, 2023
6520dbb
Formatting
sunxd3 Aug 1, 2023
e539d50
fix formatting
yebai Aug 2, 2023
aecbb76
add parser to includes
yebai Aug 2, 2023
83c003b
add comments on errors reporting
sunxd3 Aug 5, 2023
c601039
Merge branch 'parser' of https://github.com/TuringLang/SymbolicPPL.jl…
sunxd3 Aug 5, 2023
58c9ec1
notes on panic mode, rename `@bugsast`
sunxd3 Aug 7, 2023
580906d
Apply suggestions from code review
yebai Aug 7, 2023
ea84593
Fix parser related tests.
sunxd3 Aug 7, 2023
04a4ae6
Merge branch 'parser' of https://github.com/TuringLang/SymbolicPPL.jl…
sunxd3 Aug 7, 2023
095fbfa
Remove unwanted file
sunxd3 Aug 7, 2023
53acef4
formatting
sunxd3 Aug 7, 2023
5315921
Finish renaming `@bugsast` to `@bugs`
sunxd3 Aug 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions docs/src/parser.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Strictly speaking, the program is not "parsing", because the program doesn't output a syntax tree.
What is program does is take a token stream, with recursive descent structure, check the correctness of the program.
In the process of the recursive descent, BUGS syntax tokens will be translated into Julia syntax tokens.
The tokens that are already compatible with Julia will be remained, others will be either transformed or removed, also additional tokens may also be added.

The parser will error given a program not in strict BUGS syntax.

the general idea is:
1. use `tokenize` to get the token vector
2. inspect tokens and build the Julia version of the program in the form of a vector of tokens
3. when it is appropriate to do so, just push the token to the Julia version of the program vector
4. at the same time, some errors are detected and diagnostics are pushed to the diagnostics vector; also some tokens may be deleted, combined, or replaced
5. error recovery is very primitive: the heuristic is user forget something instead of put something wrong, a slightly more sophisticated approach is doing two versions: both "discard" and skip
321 changes: 321 additions & 0 deletions parser/parser.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,321 @@
using JuliaSyntax
using JuliaSyntax: @K_str, @KSet_str, tokenize, untokenize, Diagnostic

# called `ProcessState` instead of `ParseState` because it's not really "parsing"
mutable struct ProcessState
token_vec::Vector{Any}
current_index::Int
text::String
julia_token_vec::Vector{Any}
diagnostics::Vector{Diagnostic}
end

function ProcessState(text::String)
token_vec = filter(x -> kind(x) != K"error", tokenize(text))
return ProcessState(token_vec, 1, text, Any[], Diagnostic[])
end

function consume!(ps::ProcessState, substitute=nothing)
if isnothing(substitute)
push!(ps.julia_token_vec, ps.token_vec[ps.current_index])
else
push!(ps.julia_token_vec, substitute)
end
ps.current_index += 1
yebai marked this conversation as resolved.
Show resolved Hide resolved
end

function discard!(ps::ProcessState)
ps.current_index += 1
yebai marked this conversation as resolved.
Show resolved Hide resolved
end

function add_diagnostic(ps, msg::String)
# check if the current token is EOF
if ps.current_index > length(ps.token_vec)
diagnostic = JuliaSyntax.Diagnostic(0, 0, :error, msg)
@assert diagnostic ∉ ps.diagnostics
push!(ps.diagnostics, diagnostic)
return
yebai marked this conversation as resolved.
Show resolved Hide resolved
end
low = first(ps.token_vec[ps.current_index].range)
high = last(ps.token_vec[ps.current_index].range)
diagnostic = JuliaSyntax.Diagnostic(low, high, :error, msg)
if diagnostic in ps.diagnostics # TODO: this check may be too expensive
error("Encounter duplicate diagnostic, suspect infinite loop, stop and fix first.")
end
push!(ps.diagnostics, diagnostic)
yebai marked this conversation as resolved.
Show resolved Hide resolved
end

function peek(ps::ProcessState, n=1)
if ps.current_index+n-1 > length(ps.token_vec)
yebai marked this conversation as resolved.
Show resolved Hide resolved
return K"EndMarker"
end
return kind(ps.token_vec[ps.current_index+n-1])
yebai marked this conversation as resolved.
Show resolved Hide resolved
end

function peek_raw(ps::ProcessState, n=1)
if ps.current_index+n-1 > length(ps.token_vec)
yebai marked this conversation as resolved.
Show resolved Hide resolved
return "EOF"
end
return untokenize(ps.token_vec[ps.current_index+n-1], ps.text)
yebai marked this conversation as resolved.
Show resolved Hide resolved
end

function process_trivia!(ps::ProcessState, skip_newline=true)
deliminators = collect(KSet"Whitespace Comment")
if skip_newline
push!(deliminators, K"NewlineWs")
end
while peek(ps) ∈ deliminators
consume!(ps)
end
end

function process_toplevel!(ps::ProcessState)
expect!(ps, "model", "begin")
expect_and_discard!(ps, "{")
process_statements!(ps)
expect!(ps, "}", "end")
process_trivia!(ps)
yebai marked this conversation as resolved.
Show resolved Hide resolved
end

function process_statements!(ps::ProcessState)
process_trivia!(ps)
while true
if peek(ps) == K"for"
process_for!(ps)
elseif peek(ps) == K"Identifier"
process_assignment!(ps)
else
break
end
process_trivia!(ps, )
yebai marked this conversation as resolved.
Show resolved Hide resolved
end
end

function process_assignment!(ps::ProcessState)
process_variable!(ps)
process_trivia!(ps)
if peek(ps) == K"<--"
discard!(ps)
push!(ps.julia_token_vec, "=")
push!(ps.julia_token_vec, "-")
process_identifier_led_expression!(ps)
return
yebai marked this conversation as resolved.
Show resolved Hide resolved
end

if peek(ps) == K"~"
consume!(ps)
elseif peek(ps) == K"<"
if peek(ps, 2) == K"-"
discard!(ps)
discard!(ps)
push!(ps.julia_token_vec, "=")
else
add_diagnostic(ps, "Expecting <-")
end
else
add_diagnostic(ps, "Expecting <- or ~")
end
process_expression!(ps)
process_trivia!(ps) # consume newline or ;
yebai marked this conversation as resolved.
Show resolved Hide resolved
end

function process_for!(ps)
consume!(ps) # consume the "for"
expect_and_discard!(ps, "(")

process_variable!(ps)
expect!(ps, "in")
process_range!(ps)
expect_and_discard!(ps, ")")

expect_and_discard!(ps, "{")
process_statements!(ps)
expect!(ps, "}", "end")
yebai marked this conversation as resolved.
Show resolved Hide resolved
end

function process_range!(ps)
expect_and_process_atom!(ps)
expect!(ps, ":")
expect_and_process_atom!(ps)
yebai marked this conversation as resolved.
Show resolved Hide resolved
end

# numerals or variables
function expect_and_process_atom!(ps)
process_trivia!(ps)
if peek(ps) ∈ KSet"Integer Float"
consume!(ps)
elseif peek(ps) == K"Identifier"
process_variable!(ps, false)
else
add_diagnostic(ps, "Loop bounds must be numerals or variables")
end
end
# sub cases
# unary +, -
# function call: f(x+z, y)
# variable: x, a.b
function process_expression!(ps::ProcessState, terminators=KSet"; NewlineWs EndMarker")
process_trivia!(ps)
if peek(ps) ∈ KSet"+ -" # only allow a single + or - at the beginning
consume!(ps)
end
process_identifier_led_expression!(ps, terminators)
yebai marked this conversation as resolved.
Show resolved Hide resolved
end

function process_identifier_led_expression!(ps, terminators=KSet"; NewlineWs EndMarker")
process_trivia!(ps)
while true
if peek(ps) ∈ KSet"Integer Float"
consume!(ps)
elseif peek(ps) == K"Identifier"
if peek(ps, 2) == K"("
consume!(ps) # consume the function name
consume!(ps) # consume the "("
process_call_args!(ps)
expect!(ps, ")")
else
process_variable!(ps) # "a.b(args)" falls into this case
if peek(ps) == K"("
process_call_args!(ps)
expect!(ps, ")")
end
end
elseif peek(ps) == K"("
consume!(ps)
process_expression!(ps, KSet")")
expect!(ps, ")")
else
add_diagnostic(ps, "Expecting variable or parenthesized expressions")
end
process_trivia!(ps, false)
if peek(ps) ∈ terminators
if peek(ps) == K";" # others will be consumed by process_trivia!
consume!(ps)
end
return
yebai marked this conversation as resolved.
Show resolved Hide resolved
end
expect!(ps, ("+", "-", "*", "/", "^"))
process_trivia!(ps)
end
end

function process_variable!(ps::ProcessState, allow_indexing=true)
process_trivia!(ps)

if peek(ps, 2) ∉ KSet". ["
consume!(ps)
return
yebai marked this conversation as resolved.
Show resolved Hide resolved
end

if peek(ps, 2) == K"."
variable_name_buffer = String[]
while peek(ps) == K"Identifier"
push!(variable_name_buffer, peek_raw(ps))
discard!(ps)
if peek(ps) != K"."
break
end
push!(variable_name_buffer, ".")
discard!(ps)
end

push!(ps.julia_token_vec, "var\"$(join(variable_name_buffer, ""))\"")
else
consume!(ps)
end

if !allow_indexing
return
yebai marked this conversation as resolved.
Show resolved Hide resolved
end

if peek(ps) == K"["
process_indexing!(ps)
end
end

function process_indexing!(ps::ProcessState)
expect!(ps, "[")
process_trivia!(ps)
while peek_raw(ps) != "," && peek(ps) != "EndMarker"
process_index!(ps)
process_trivia!(ps)
if peek_raw(ps) != ","
break
end
expect!(ps, ",")
process_trivia!(ps)
end
expect!(ps, "]")
yebai marked this conversation as resolved.
Show resolved Hide resolved
end

# index can be expression, or range
function process_index!(ps)
process_expression!(ps, KSet": , ]")
if peek_raw(ps) == ":"
consume!(ps)
process_expression!(ps, KSet", ]")
end
end

function process_call_args!(ps)
process_trivia!(ps)
while peek(ps) != "," && peek(ps) != "EndMarker"
process_expression!(ps, KSet", ) EndMarker")
if peek(ps) == K")"
break
end
expect!(ps, ",")
process_trivia!(ps)
end
end

function expect!(ps::ProcessState, expected::String, substitute=nothing)
process_trivia!(ps)
if peek_raw(ps) != expected
add_diagnostic(ps, "Expecting '$expected'")
else
consume!(ps, substitute)
end
end

function expect!(ps::ProcessState, expected::Tuple, substitute=nothing)
process_trivia!(ps)
if peek_raw(ps) ∉ expected
add_diagnostic(ps, "Expecting '$expected'")
else
consume!(ps, substitute)
end
end

function expect_and_discard!(ps::ProcessState, expected::String)
process_trivia!(ps)
if peek_raw(ps) != expected
add_diagnostic(ps, "Expecting '$expected'")
else
discard!(ps)
end
end

function to_julia_program(julia_token_vec, text)
program = ""
for t in julia_token_vec
if t isa String
program *= t
else
str = untokenize(t, text)
program *= str
end
end
return program
end

function parse(prog::String)
ps = ProcessState(prog)
process_toplevel!(ps)
if !isempty(ps.diagnostics)
io = IOBuffer()
JuliaSyntax.show_diagnostics(io, ps.diagnostics, ps.text)
error("Errors in the program: \n $(String(take!(io)))")
end
return JuliaSyntax.parseall(JuliaSyntax.SyntaxNode, to_julia_program(ps.julia_token_vec, ps.text))
yebai marked this conversation as resolved.
Show resolved Hide resolved
end
Loading