Rapid Syntax Prototyping Tool

Table of Contents

I want to develop alternative syntax for Lisp. My idea is to write several possible syntaxes and see if I like them or not.

Step 1 #

I imagine it like this:

write some BNF-like syntax description
generate a parser that will return S-expressions
apply transformations similar to nanopass to get correct Lisp S-expressions
- For example, transform (a + b) to (+ a b)

Parser + transformations give me a “reader”, which I can put in my implementation of Lisp and try to run some examples.

To make it a faster process I want to:

reuse “transformers” - which should be trivial, because they are pure functions
reuse patterns in BNF similar to predefined patterns in Rosie

Step 2 #

I wonder if it is possible to generate pretty printer based on grammar and transformations. If it would be possible I can get a set of Lisp programs and generate programs in new syntax to see if it looks good.

Can we inverse transformers? Related: Bidirectionalization Transformation Based onAutomatic Derivation of View Complement Functions, Combining Syntactic and Semantic Bidirectionalization, relational programming.

Step 3 #

Given grammar and some additional annotations, it should be trivial to generate syntax highligter (inspired by Rosie).

Implementation #

I need to chose parser generator. PEG parser seems like the best candidate for my purpose (Rosie uses it).

Potential issues with PEG:

it is linear in time but needs more memory than others. I don’t think this is important for an experiment
- Possible solution: Packrat Parsing with Elastic Sliding Window, Kimio Kuramitsu, 2015
It doesn’t support left recursion
- Possible solution: Direct Left-Recursive Parsing Expression Grammars, Laurence Tratt, 2010, Pika parsing: reformulating packrat parsing as a dynamic programming algorithm solves the left recursion and error recovery problems, 2020
BNF is non-deterministic, but PEG is deterministic. So if I would naively translate BNF to PEG, there is a chance I would get the wrong parser. Is there a way to check that the two definitions are equivalent? Maybe swap the order of rules in “prioritized choice” and see if it is different grammar or not? If BNF defines an unambiguous language swapping choices should not affect. If there is only one “catch them all” rule it should go last in “prioritized choice”.
- Maybe use something like model checker to generate sentences or counterexamples?
Out of the box it doesn’t have good error messages (because of backtracking)
- Possible solution: Syntax Error Recovery in Parsing Expression Grammars, Automatic Syntax Error Reporting and Recovery in Parsing Expression Grammars.
Not a PEG problem, but if the parser will generate S-expressions instead of AST (with meta-information about line and column) and later they will be transformed it will be hard to find the source of runtime error. Maybe it would be possible to generate something like sourcemaps

Other ideas:

print decision tree (I guess it is more like a graph, rather than tree). Is this useful: good decision trees or lazy decision trees?
print stack traces, like Rosie or arborist (Arborist::GlobalDebug.enable!)
visualise parsing
Implementation of Tamias to Check Production Rules for Parsing Expression Grammar
- PetitParser: Building Modular Parsers
NezCC: A Cross-Language PEG Parser Generator, 2017
- A Declarative Extension of Parsing Expression Grammarsfor Recognizing Most Programming Languages, 2015
Add optional semantic information in grammar, for exmaple
- types: number, string etc.
- specific syntax constructs: matching brackets, comments, like in comby
Macros for Domain-Specific Languages

Tutorials for PEG parsers #

Grammars corpus #

Another approach to finding “ideal” grammar is to take a look at existing grammars and see what good and bad parts they have - to learn from their experience:

Zipf’s law, in probability, assertion that the frequencies f of certain events are inversely proportional to their rank r. The law was originally proposed by American linguist George Kingsley Zipf (1902–50) for the frequency of usage of different words in the English language; this frequency is given approximately by f(r) ≅ 0.1/r. Thus, the most common word (rank 1) in English, which is the, occurs about one-tenth of the time in a typical text; the next most common word (rank 2), which is of, occurs about one-twentieth of the time; and so forth. Another way of looking at this is that a rank r word occurs 1/r times as often as the most frequent word, so the rank 2 word occurs half as often as the rank 1 word, the rank 3 word one-third as often, the rank 4 word one-fourth as often, and so forth. Beyond about rank 1,000, the law completely breaks down.
– britannica

Is it possible to construct some common corpus of syntactic constructs (assume they follow Zipf’s law) and validate new syntax against those cases?