Differences

This shows you the differences between two versions of the page.

Link to this comparison view

lib:parser [2019/01/08 08:37]
sprowell created
lib:parser [2019/01/08 09:42] (current)
sprowell
Line 21: Line 21:
 <code Ada> <code Ada>
 -- The best pseudocode is written in Ada for some reason. -- The best pseudocode is written in Ada for some reason.
-while peek is a digit loop+while is_digit(peekloop
     consume;     consume;
 end loop; end loop;
Line 27: Line 27:
  
 Composites of these primitives are available to simplify things.  For example, Composites of these primitives are available to simplify things.  For example,
-the implementation of the above pseudocode might look like the code below, where the integer parsing is broken out into a separate function so we can use the ''?'' operator.+the implementation of the above pseudocode might look like the code below, which is complicated by the need to have ''peek'' return one of three things: ''Err(err)'' if there is an error, ''Ok(None)'' if we hit the end of file, and ''Ok(Some(ch))'' when there is a next character. 
 + 
 +<code Rust> 
 +fn parse_unsigned_integer<R: io::Read>(parser: &mut parser::Parser<R>) -> parser::Result<u64>
 +    let mut result = String::new(); 
 +    while let Some(ch) = parser.peek()?
 +        if ch.is_digit(10) { 
 +            result.push(ch); 
 +            parser.consume(); 
 +        } else { 
 +            break; 
 +        } 
 +    } 
 +    match result.parse::<u64>() { 
 +        Ok(number) => Ok(number), 
 +        Err(err) => Err(parser.error(err.to_string())), 
 +    } 
 +
 +</code> 
 + 
 +This is terrible.  Fortunately ''Parser'' provides a ''take_while'' method that handles all this for us. 
 + 
 +Here the integer parsing is broken out into a separate function so we can use the ''?'' operator.
  
 <code Rust> <code Rust>
Line 59: Line 81:
 If nothing is entered we get ''console:1:1: cannot parse integer from empty string'' If we enter a huge number like 99999999999999999999 (that's 20 nines) we get ''console:1:1: number too large to fit in target type'' If we enter something like ''65fred'' then we get 65, with the parser left pointing at the ''f''. If nothing is entered we get ''console:1:1: cannot parse integer from empty string'' If we enter a huge number like 99999999999999999999 (that's 20 nines) we get ''console:1:1: number too large to fit in target type'' If we enter something like ''65fred'' then we get 65, with the parser left pointing at the ''f''.
  
 +Of course, this probably isn't what we want.  Suppose we have a stream of identifiers and numbers, and we want to parse these.
 +
 +We already know how to parse unsigned integers.  Let's see how to parse identifiers.
 +
 +<code Rust>
 +fn parse_identifier<R: io::Read>(parser: &mut parser::Parser<R>) -> parser::Result<String> {
 +    let mut result = parser.take_while(|ch| ch.is_alphabetic())?;
 +    result = result + parser.take_while(|ch| ch.is_alphanumeric())?.as_str();
 +    Ok(result)
 +}
 +</code>
 +
 +Okay, now let's create a type for our tokens.
 +
 +<code Rust>
 +#[derive(Debug)]
 +enum Thing {
 +    Number(u64),
 +    Id(String),
 +}
 +</code>
 +
 +Now we can parse the sequence, and create a vector of ''Thing'' instances.  Let's do that.  We also add some error handling.
 +
 +<code Rust>
 +fn parse_sequence<R: io::Read>(parser: &mut parser::Parser<R>) -> parser::Result<Vec<Thing>> {
 +    let mut things = Vec::new();
 +    let _ = parser.consume_whitespace();
 +    while !parser.at_eof() {
 +        match parser.peek() {
 +            Ok(Some(ch)) => {
 +                if ch.is_digit(10) {
 +                    things.push(Thing::Number(parse_unsigned_integer(parser)?));
 +                } else if ch.is_alphabetic() {
 +                    things.push(Thing::Id(parse_identifier(parser)?));
 +                } else {
 +                    return Err(parser.unexpected_char("number or identifier", ch));
 +                }
 +            }
 +            Ok(None) => break,
 +            Err(err) => return Err(err),
 +        }
 +        let _ = parser.consume_whitespace();
 +    }
 +    Ok(things)
 +}
 +</code>
 +
 +Now we just need a main function to pull this all together.
 +
 +<code Rust>
 +fn main() {
 +    let mut parser = parser::Parser::new("console".to_string(), std::io::stdin());
 +    match parse_sequence(&mut parser) {
 +        Err(err) => {
 +            println!("{}", err);
 +        }
 +        Ok(seq) => {
 +            println!("{:?}", seq);
 +        }
 +    }
 +}
 +</code>
 +
 +That's pretty much it.  If it seems like there's a lot of stuff to build in, that's because there //is// Fixing that is the purpose of //macros//.