Note: This site is currently "Under construction". I'm migrating to a new version of my site building software. Lots of things are in a state of disrepair as a result (for example, footnote links aren't working). It's all part of the process of building in public. Most things should still be readable though.

Neopolitan Parser Grammar

This is where I'm working on the Parser grammar to make the AST for the LSP and Tree-Sitter parsers for Neopolitan. (I built the original parser before learning how to define this grammar. Any differences will be normalized to use this moving forward)

*Primitives**

These are all the things that don't call another item. They're what will be used to assemble the full items. They'll be created in the scanner.cc file

[] ATTR_BOOL_VALUE -> not([' ' | '\t' | '\n'])

[] ATTR_DASHES -> ['--']

[] ATTR_KV_KEY -> not([':' | ' ' | '\t' | '\n']+)

[] ATTR_KV_SEPARATOR -> [':']

[] ATTR_KV_VALUE -> /[^\n]+/

[in progress] CONTAINER_TOKEN -> ['/']

[] EOF eof()

[] HTML_BODY_FOR_BASIC_SECTION -> [any_char]+ + lookahead(not(['\n'] + ['--']))

[x] LINE_ENDING -> [' ' | '\t']* + ['\n']

[] LINE_REMAINDER -> anychar+ + not([' '])* + ['\n' | eof]

[] NB_WHITESPACE -> [' ' | '\t']+

[x] SECTION_DASHES -> ['--']

[] SINGLE_CHARACTER_WORD -> [any_char] + lookahead(not([' ' | '\t' | '\n'])

[x] SINGLE_SPACE -> [' ']

[] TODO_BRACKET_END -> [']']

[] TODO_BRACKET_START -> ['[']

*SECTION TOKENS**

These tokens are used for basic sections as well as container sections. The assembly of the container section start and end triggers is done in a later step

[x] CODE_TOKEN -> ['html']

[x] HTML_TOKEN -> ['html']

[x] LIST_TOKEN -> ['list']

[x] P_TOKEN -> ['p']

[x] TITLE_TOKEN -> ['title']

[x] TODO_TOKEN -> ['todo']

*SECTION HEADERS**

There are "SECTION" and "CONTAINER" template. Some section types have one, some have the other, and some have both.

[] TITLE_SECTION_TRIGGER -> SECTION_DASHES + NB_WHITESPACE + TITLE_TOKEN + LINE_ENDING

[] HTML_CONTAINER_START_TRIGGER -> SECTION_DASHES + NB_WHITESPACE + HTML_TOKEN + LINE_ENDING

[] HTML_CONTAINER_END_TRIGGER -> SECTION_DASHES + NB_WHITESPACE + HTML_TOKEN + LINE_ENDING

[] HTML_SECTION_TRIGGER -> SECTION_DASHES + NB_WHITESPACE + HTML_TOKEN + LINE_ENDING

*Full Items**

These are the things that are made from either primitives, other full items, or both. They'll be assembled in the tree-sitter grammar.js file

[] ATTRIBUTE -> BOOLEAN_ATTRIBUTE | KEY_VALUE_ATTRIBUTE

[] BOOLEAN_ATTRIBUTE -> DASHES + not[':'] + SPACE0 + SINGLE_NEWLINE

[] KEY_VALUE_ATTRIBUTE -> DASHES + not[':'] + ':' + any + SPACE0 + SINGLE_NEWLINE

[] PARAGRAPH -> PARAGRAPH_FIRST_WORD + WORD_BREAK + PARAGRAPH_BODY + EMPTY_LINE

note: this is for multi word paragraphs. single word paragraphs will
be addressed

[] PARAGRAPH_BODY ->  (WORD, sep_by, WORDBREAK)

note: this is done a little different in tree-sitter
since there isn't really a seb_by pattern without 
regex I haven't gotten into yet

[] PARAGRAPH_FIRST_WORD -> INLINE_TAG | WORD_WITHOUT_LEADING_DASH | WORD_WITH_ONE_LEADING_DASH

[] WORD_WITH_ONE_LEADING_DASH -> '-' + NB_WHITESPACE + WORD

[] WORD_BREAK -> [NB_WHITESPACE | SINGLE_NEWLINE]+ + lookahead(not(LINE_ENDING))

[] INLINE_TAG -> LINK | SPAN | etc...

[] LIST_ITEM -> "-" + NB_WHITESPACE + PARAGRAPH_BODY

[] LINE_ENDING_OR_EOF -> [NB_WHITESPACE + NEWLINE] || EOF

[] LINK -> tktktkt

[] SPAN -> tktktktk

[] WORD_WITHOUT_LEADING_DASH -> not['-'] + WORD

[] WORD -> not['<' + lookahead(not['<'])] + not(NB_WHITESPACE | LINE_ENDING)

[deprecated?] INITIAL_WORD_CHARS -> NON_LT_CHAR | LT_WITH_NON_LT_CHAR

[deprecated?] LT_WITH_NON_LT_CHAR -> '<' + NON_LT_CHAR

[deprecated?] NON_LT_CHAR -> none_of['< \n\t']

[deprecated?] FOLLOWING_WORD_CHARS -> none_of[' \n\t']

[] ATTR_KV_PAIR -> ATTR_KEY + ATTR_KV_SEPARATOR NB_WHITESPACE + ATTR_VALUE

[] ATTR -> ATTR_DASHES + [KEY_VALUE_ATTR | BOOLEAN_ATTR] + NEWLINE

Code

CMD -> '!bear' ' ' BEAR_COMMAND_LIST
BEAR_COMMAND_LIST -> BEAR_COMMAND | BEAR_COMMAND ',' OPTIONAL_WHITESPACE BEAR_COMMAND_LIST
BEAR_COMMAND -> OP ':' OPTIONAL_WHITESPACE COLOR
OPTIONAL_WHITESPACE -> NOTHING | ' ' OPTIONAL_WHITESPACE
OP -> 'head' | 'body' | 'eyes'
COLOR -> HEXCOLOR | COLORNAME
HEXCOLOR -> '#' HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT
COLORNAME -> 'red' | 'green' | 'blue' | 'white' | ...

References