Parsers - ironical situation

Table of Contents

Why do we need parsers? #

The first obvious use case is computer languages. (I use the term computer language because not everybody considers HTML a programming language). This includes:

compilers
interpreters
syntax highlighting
static analysis tools, like linters and language servers

But also:

network protocols
serialization and deserialization
configuration files
command-line arguments
natural language processing
query languages (search box on the website)
validation, like email validation

The irony #

Parser generators have been studied since the ’60s. In practice, this knowledge is mostly used for compilers and interpreters. I guess this is because the subject typically introduced in course about compilers, for example cs143, 15-411/611, CS-321, 322, COP5621.

For other tasks, people typically use ad-hoc solutions and regular expressions. The obvious problem is that ad-hoc solutions are error-prone and harder to maintain.

Regular expressions (I’m talking about the first implementation - grep; not about the formal system by Kleene) were created to be used within the command-line. The command-line interface typically requires one-liners, hence terse notation (though it is possible to open editor and pipe text from the buffer).

I don’t think that author meant it to be used to validate emails, for example, see this answer on StackOverflow.

Two potentially better solutions:

rfc5322 provides formal grammar (ABNF), so it should be possible to generate a parser
Use something like Rosie - the notation is not so terse. Rosie uses PEG, which was introduced only in 2004.

Second example of ad-hoc solution is syntax highlighting with regular expressions, for example TextMatte language grammars. Compare ad-hoc solution (on the left) and real context-free language parser (on the right):