====== Building Parsers ======
The ''relision::parser'' module provides very, //very//, //**very**// simple parser primitives for building recursive descent parsers.
The module provides three key structs and one specialized type.
* ''ParseError'' encapsulates a parse error
* ''Loc'' represents a location in the parse (line and column)
* ''Parser'' provides parsing services around some stream
* ''Result'' specializes the more general result type to parsers
===== Parser Primitives =====
There are two basic parser primitives.
* You can **peek** at upcoming characters
* You can **consume** characters
A simple application of this is recognizing an integer. Here's some pseudocode.
-- The best pseudocode is written in Ada for some reason.
while is_digit(peek) loop
consume;
end loop;
Composites of these primitives are available to simplify things. For example,
the implementation of the above pseudocode might look like the code below, which is complicated by the need to have ''peek'' return one of three things: ''Err(err)'' if there is an error, ''Ok(None)'' if we hit the end of file, and ''Ok(Some(ch))'' when there is a next character.
fn parse_unsigned_integer(parser: &mut parser::Parser) -> parser::Result {
let mut result = String::new();
while let Some(ch) = parser.peek()? {
if ch.is_digit(10) {
result.push(ch);
parser.consume();
} else {
break;
}
}
match result.parse::() {
Ok(number) => Ok(number),
Err(err) => Err(parser.error(err.to_string())),
}
}
This is terrible. Fortunately ''Parser'' provides a ''take_while'' method that handles all this for us.
Here the integer parsing is broken out into a separate function so we can use the ''?'' operator.
use std::io;
use relision::parser;
fn parse_unsigned_integer(parser: &mut parser::Parser) -> parser::Result {
let result = parser.take_while(|ch| ch.is_digit(10))?;
match result.parse::() {
Ok(number) => Ok(number),
Err(err) => Err(parser.error(err.to_string())),
}
}
fn main() {
// Make a new parser and wrap the standard input.
let mut parser = parser::Parser::new("console".to_string(), std::io::stdin());
match parse_unsigned_integer(&mut parser) {
Err(err) => {
println!("{}", err);
}
Ok(value) => {
println!("{}", value);
}
}
}
Here the ''take_while'' peeks at and consumes the next character while it is a digit. It works pretty well.
If nothing is entered we get ''console:1:1: cannot parse integer from empty string''. If we enter a huge number like 99999999999999999999 (that's 20 nines) we get ''console:1:1: number too large to fit in target type''. If we enter something like ''65fred'' then we get 65, with the parser left pointing at the ''f''.
Of course, this probably isn't what we want. Suppose we have a stream of identifiers and numbers, and we want to parse these.
We already know how to parse unsigned integers. Let's see how to parse identifiers.
fn parse_identifier(parser: &mut parser::Parser) -> parser::Result {
let mut result = parser.take_while(|ch| ch.is_alphabetic())?;
result = result + parser.take_while(|ch| ch.is_alphanumeric())?.as_str();
Ok(result)
}
Okay, now let's create a type for our tokens.
#[derive(Debug)]
enum Thing {
Number(u64),
Id(String),
}
Now we can parse the sequence, and create a vector of ''Thing'' instances. Let's do that. We also add some error handling.
fn parse_sequence(parser: &mut parser::Parser) -> parser::Result> {
let mut things = Vec::new();
let _ = parser.consume_whitespace();
while !parser.at_eof() {
match parser.peek() {
Ok(Some(ch)) => {
if ch.is_digit(10) {
things.push(Thing::Number(parse_unsigned_integer(parser)?));
} else if ch.is_alphabetic() {
things.push(Thing::Id(parse_identifier(parser)?));
} else {
return Err(parser.unexpected_char("number or identifier", ch));
}
}
Ok(None) => break,
Err(err) => return Err(err),
}
let _ = parser.consume_whitespace();
}
Ok(things)
}
Now we just need a main function to pull this all together.
fn main() {
let mut parser = parser::Parser::new("console".to_string(), std::io::stdin());
match parse_sequence(&mut parser) {
Err(err) => {
println!("{}", err);
}
Ok(seq) => {
println!("{:?}", seq);
}
}
}
That's pretty much it. If it seems like there's a lot of stuff to build in, that's because there //is//. Fixing that is the purpose of //macros//.