Neopolitan Parser Grammar

Note

All this stuff is in pre tags becaue to avoid having to comment out all the pipes

This is where I'm working on the Parser grammar to make the AST for the LSP and Tree-Sitter parsers for Neopolitan. (I built the original parser before learning how to define this grammar. Any differences will be normalized to use this moving forward)

Notes

This is kinda a scratch pade since it's working a few different things. It's not a direct right now because the tree-sitter parser in volvues using regex and c and I haven't consolidted those things down yet
The current todo check marks are for adding the specific items to the Tree-Sitter parser
Thing with a 'ts' have been handled in tree sitter

*Primitives**

These are all the things that don't call another item. They're what will be used to assemble the full items. They'll be created in the scanner.cc file

[] ATTR_BOOL_VALUE -> not([' ' | '\t' | '\n'])

[] ATTR_DASHES -> ['--']

[] ATTR_KV_KEY -> not([':' | ' ' | '\t' | '\n']+)

[] ATTR_KV_SEPARATOR -> [':']

[] ATTR_KV_VALUE -> /[^\n]+/

[in progress] CONTAINER_TOKEN -> ['/']

[] EOF eof()

[] HTML_BODY_FOR_BASIC_SECTION -> [any_char]+ + lookahead(not(['\n'] + ['--']))

[x] LINE_ENDING -> [' ' | '\t']* + ['\n']

[] LINE_REMAINDER -> anychar+ + not([' '])* + ['\n' | eof]

[] NB_WHITESPACE -> [' ' | '\t']+

[x] SECTION_DASHES -> ['--']

[] SINGLE_CHARACTER_WORD -> [any_char] + lookahead(not([' ' | '\t' | '\n'])

[x] SINGLE_SPACE -> [' ']

[] TODO_BRACKET_END -> [']']

[] TODO_BRACKET_START -> ['[']

*SECTION TOKENS**

These tokens are used for basic sections as well as container sections. The assembly of the container section start and end triggers is done in a later step

[x] CODE_TOKEN -> ['html']

[x] HTML_TOKEN -> ['html']

[x] LIST_TOKEN -> ['list']

[x] P_TOKEN -> ['p']

[x] TITLE_TOKEN -> ['title']

[x] TODO_TOKEN -> ['todo']

*SECTION HEADERS**

There are "SECTION" and "CONTAINER" template. Some section types have one, some have the other, and some have both.

[] TITLE_SECTION_TRIGGER -> SECTION_DASHES + NB_WHITESPACE + TITLE_TOKEN + LINE_ENDING

[] HTML_CONTAINER_START_TRIGGER -> SECTION_DASHES + NB_WHITESPACE + HTML_TOKEN + LINE_ENDING

[] HTML_CONTAINER_END_TRIGGER -> SECTION_DASHES + NB_WHITESPACE + HTML_TOKEN + LINE_ENDING

[] HTML_SECTION_TRIGGER -> SECTION_DASHES + NB_WHITESPACE + HTML_TOKEN + LINE_ENDING

*Full Items**

These are the things that are made from either primitives, other full items, or both. They'll be assembled in the tree-sitter grammar.js file

[] ATTRIBUTE -> BOOLEAN_ATTRIBUTE | KEY_VALUE_ATTRIBUTE

[] BOOLEAN_ATTRIBUTE -> DASHES + not[':'] + SPACE0 + SINGLE_NEWLINE

[] KEY_VALUE_ATTRIBUTE -> DASHES + not[':'] + ':' + any + SPACE0 + SINGLE_NEWLINE

[] PARAGRAPH -> PARAGRAPH_FIRST_WORD + WORD_BREAK + PARAGRAPH_BODY + EMPTY_LINE

note: this is for multi word paragraphs. single word paragraphs will
be addressed

[] PARAGRAPH_BODY ->  (WORD, sep_by, WORDBREAK)

note: this is done a little different in tree-sitter
since there isn't really a seb_by pattern without 
regex I haven't gotten into yet

[] PARAGRAPH_FIRST_WORD -> INLINE_TAG | WORD_WITHOUT_LEADING_DASH | WORD_WITH_ONE_LEADING_DASH

[] WORD_WITH_ONE_LEADING_DASH -> '-' + NB_WHITESPACE + WORD

[] WORD_BREAK -> [NB_WHITESPACE | SINGLE_NEWLINE]+ + lookahead(not(LINE_ENDING))

[] INLINE_TAG -> LINK | SPAN | etc...

[] LIST_ITEM -> "-" + NB_WHITESPACE + PARAGRAPH_BODY

[] LINE_ENDING_OR_EOF -> [NB_WHITESPACE + NEWLINE] || EOF

[] LINK -> tktktkt

[] SPAN -> tktktktk

[] WORD_WITHOUT_LEADING_DASH -> not['-'] + WORD

[] WORD -> not['<' + lookahead(not['<'])] + not(NB_WHITESPACE | LINE_ENDING)

[deprecated?] INITIAL_WORD_CHARS -> NON_LT_CHAR | LT_WITH_NON_LT_CHAR

[deprecated?] LT_WITH_NON_LT_CHAR -> '<' + NON_LT_CHAR

[deprecated?] NON_LT_CHAR -> none_of['< \n\t']

[deprecated?] FOLLOWING_WORD_CHARS -> none_of[' \n\t']

[] ATTR_KV_PAIR -> ATTR_KEY + ATTR_KV_SEPARATOR NB_WHITESPACE + ATTR_VALUE

[] ATTR -> ATTR_DASHES + [KEY_VALUE_ATTR | BOOLEAN_ATTR] + NEWLINE

Prior Work

This is an example that someone from chat helped me put together

Code

CMD -> '!bear' ' ' BEAR_COMMAND_LIST
BEAR_COMMAND_LIST -> BEAR_COMMAND | BEAR_COMMAND ',' OPTIONAL_WHITESPACE BEAR_COMMAND_LIST
BEAR_COMMAND -> OP ':' OPTIONAL_WHITESPACE COLOR
OPTIONAL_WHITESPACE -> NOTHING | ' ' OPTIONAL_WHITESPACE
OP -> 'head' | 'body' | 'eyes'
COLOR -> HEXCOLOR | COLORNAME
HEXCOLOR -> '#' HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT HEXDIGIT
COLORNAME -> 'red' | 'green' | 'blue' | 'white' | ...

References

Markdown