Note: This site is currently "Under construction". I'm migrating to a new version of my site building software. Lots of things are in a state of disrepair as a result (for example, footnote links aren't working). It's all part of the process of building in public. Most things should still be readable though.

Split A String On A Separator With Escapes In Rust With nom

Code

-- warning

These are getting started notes

There's an issue with other escape characters being
picked up that needs to be addrssed. 

Look at the link seciton of the snippet.rs file 
for Neopolitan for an example of what I ended up with


-- todo

[] Look at this code to see if it can do what you need directly

-- code
-- rust

escaped_transform(none_of("\\|"), '\\', value("|", tag("|"))),




-- h2


Original Notes To Review


-- note

This works, but the above might be a simpler approach


-- p

This is what I'm using to parse out strings that use `|``
characters as separators while allowing them to be escaped
with `\\``


-- code
-- rust

use nom::branch::alt;
use nom::bytes::complete::escaped_transform;
use nom::bytes::complete::tag;
use nom::bytes::complete::take_until;
use nom::character::complete::none_of;
use nom::combinator::eof;
use nom::combinator::rest;
use nom::combinator::value;
use nom::multi::many_till;
use nom::sequence::tuple;
use nom::IResult;
use nom::Parser;

fn main() {
    test1();
    test2();
    println!("done");
}

fn test1() {
    let source = "Lift|the|stone|up|high";
    let expected = vec!["Lift", "the", "stone", "up", "high"];
    let result = split_on_separator_with_escapes(source, "|");
    assert_eq!(expected, result.unwrap().1);
}

fn test2() {
    let source = "Dip\\|the|pail|in\\|the|water";
    let expected = vec!["Dip|the", "pail", "in|the", "water"];
    let result = split_on_separator_with_escapes(source, "|");
    assert_eq!(expected, result.unwrap().1);
}

fn split_on_separator_with_escapes<'a>(
    source: &'a str,
    separator: &'a str,
) -> IResult<&'a str, Vec<String>> {
    let mut separator_with_escape = String::from("\\");
    separator_with_escape.push_str(separator);
    let (_, items) = many_till(
        alt((
            tuple((
                escaped_transform(
                    none_of(separator_with_escape.as_str()),
                    '\\',
                    value(separator, tag(separator)),
                ),
                tag(separator),
            ))
            .map(|x| x.0.to_string()),
            tuple((take_until(separator), tag(separator))).map(|x: (&str, &str)| x.0.to_string()),
            rest.map(|x: &str| x.to_string()),
        )),
        eof,
    )(source)?;
    Ok(("", items.0))
}


Seems like there's probably or more effificent way to do it. 
I got this working though, so I'm rolling with it. 


-- ref
-- id: nom
-- title: The nom Parser Combinator Library 
-- url: https://github.com/rust-bakery/nom

"Eating data byte by byte". This is the Rust library I'm using 
to process my Neopolitan documents for my site.