parslet and json 2011-05-11 home comments

Parslet is a small Ruby library for constructing parsers based on Parsing Expression Grammars (PEG). It’s written by Kaspar Schiess and various contributors.

This blog post introduces Parslet with a parser example. Since JSON has very easy to grasp railroad diagrams for its syntax, it might make for a good example.

Please note that the JSON parser here won’t compete for speed with available libraries. No benchmarks here.

Our goal is to take as input JSON strings and output the resulting value.

For the impatient, the end result is at https://gist.github.com/966020

How is an array encoded in JSON ?

How would that look in our parser ?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class Parser < Parslet::Parser

  rule(:spaces) { match('\s').repeat(1) }
    # at least 1 space character (space, tab, new line, carriage return)

  rule(:spaces?) { spaces.maybe }
    # a bunch of spaces or not

  rule(:comma) { spaces? >> str(',') >> spaces? }
    # a comma surrounded by optional spaces

  rule(:array) {
    str('[') >> spaces? >>
    (value >> (comma >> value).repeat).maybe.as(:array) >>
    spaces? >> str(']')
  }
end

What is this value thing ?

string or number or object or …

1
2
3
4
5
6
rule(:value) {
  string | number |
  object | array |
  str('true').as(:true) | str('false').as(:false) |
  str('null').as(:null)
}

All is good, a few parsing rules laters, we have a complete JSON parser, but wait, what does it output ?

1
2
3
4
5
6
p MyJson::Parser.new.parse(%{
  [ 1, 2, 3, null,
    "asdfasdf asdfds", { "a": -1.2 }, { "b": true, "c": false },
    0.1e24, true, false, [ 1 ] ]
})
# => {:array=>[{:number=>"1"@5}, {:number=>"2"@8}, {:number=>"3"@11}, {:null=>"null"@14}, {:string=>"asdfasdf asdfds"@25}, {:object=>{:entry=>{:val=>{:number=>"-1.2"@50}, :key=>{:string=>"a"@46}}}}, {:object=>[{:entry=>{:val=>{:true=>"true"@65}, :key=>{:string=>"b"@61}}}, {:entry=>{:val=>{:false=>"false"@76}, :key=>{:string=>"c"@72}}}]}, {:number=>"0.1e24"@89}, {:true=>"true"@97}, {:false=>"false"@103}, {:array=>{:number=>"1"@112}}]}

Oh well, that is not exactly what we want as final result. Parslet calls the output of its parser a “intermediate tree”. It separates parsing from transformation.

We need a transformer and it looks like :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
class Transformer < Parslet::Transform

  class Entry < Struct.new(:key, :val); end

  rule(:array => subtree(:ar)) {
    ar.is_a?(Array) ? ar : [ ar ]
  }
  rule(:object => subtree(:ob)) {
    (ob.is_a?(Array) ? ob : [ ob ]).inject({}) { |h, e| h[e.key] = e.val; h }
  }

  rule(:entry => { :key => simple(:ke), :val => simple(:va) }) {
    Entry.new(ke, va)
  }

  rule(:string => simple(:st)) {
    st.to_s
  }
  rule(:number => simple(:nb)) {
    nb.match(/[eE\.]/) ? Float(nb) : Integer(nb)
  }

  rule(:null => simple(:nu)) { nil }
  rule(:true => simple(:tr)) { true }
  rule(:false => simple(:fa)) { false }
end

Patterns in the intermediate tree are indentified and replaced, producing a final output (or yet another intermediate result, it’s up to you).

The complete parser (and transformer and small test) is at https://gist.github.com/966020

There isn’t much more I could say. Ah yes, about testing. Kaspar explains it in the tricks, you can directly test parsing rules individually :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
class MyJsonTest < Test::Unit::TestCase
  def parser
    MyJson::Parser.new
  end
  def test_parser_number_integer
    assert_equal 1, parser.number("1")
  end
  def test_parser_number_float
    assert_equal 1.0, parser.number("1.0")
  end
  def test_parser_number_not_a_number
    assert_raise Parslet::ParseFailed do
      parser.number("whatever")
    end
  end
end

Happy parsing (and transforming) !

No animals got benchmarked during the making of this blog post.

© 2011-2013 John Mettraux