home ~ projects ~ socials

Guidelines For Building A Static Site Generator

Introduction

I plan to keep working on my static site generator for the next twenty years. This is the list of things I want to keep in mind should I ever decide to start over.

The Guidelines

  • Start with the templates (e.g. make a basic set to procude a sample site and then send hard coded data into them. The idea is to figure out the API you want to use in the templates themselves
  • Be able to access metadata about each level of content from all the levels above it. E.g. using a template structure of

    site_meta - page - sections - blocks - tokens

    The blocks template should know that it's in a page built from the music template. I didn't do this great and it's the biggest thing I'm working on changing.

    There's not really a site_meta template. That's just the metadata for the site that should be available to everything under it. That includes things like the title. Also, things like collections of links

  • section level templates should be able to include other sections.
  • Stick with one way to interact with data in the templates (e.g. always use methods in minijinja instead of sometimes methods and sometimes the data directly. Doing so means there's less syntax to remember)
  • Define what should be included in the raw content source for every piece of content. I use: id, date, status, template.
  • For any section type that can wrap other sections create section_name_start and section_name_end templates. (This worked for me and I'm keeping it in mind but depending on implementation might not make much sense)
  • Provide a way to create a unique ID for each piece of content that doesn't change even if the content/title/whatever does
  • Don't use anything other than the content's ID when creating URLs. Using anything other than a base directory and the unique ID for a post adds a lot of work.
  • Use KSUIDs for the IDs. They sort by date which is nice. (I actually clip mine down to eight characters because that looks nicer, but the base is the same with the date sort followed by some randomness)
  • Don't try to categorize URLs. Just do everything for the main content with something like "/pages/a1b2c3/"
  • Have a config file pretty quickly
  • Stop at the directory level for each page. That is, do this "/pages/a1b2c3/" instead of "/pages/a1b2c3/index.html". That'll let you change up the back end tech without having to setup redirects for the URLs
  • Suck in all the content via ASTs to start
  • Load all the ASTs for all the content on the site before doing any rendering. (i.e. create a universe that you can pull stuff out of)
  • Create templates at the page level, the component level, and the token level (e.g. A page can have a title that shows up in the metadata as well as an H1 on the page itself. By having templates at the token level styling can be added to the H1 while keeping plain-text in the meta tag)
  • Create a set of test templates with various sets of features in them that are independent from the templates you use for the site output
  • Make atom/rss feed templates at the same time you're making the main HTML templates. (Helps ensure a more flexible structure than trying to bolt things on later)
  • Don't compile templates into the builder. Make sure they can be edited independently with no other code changes and applied with only a build run
  • Create an internal content tag to link pages by their ID (e.g. I use <<ilink|some text|a1b2c3>>. That provides auto linking to every page without having to use the URL.
  • Be able to style content pieces
  • Create content items in the AST with abandon
  • Centralize on standard attributes for content items/components. (e.g. id always corresponds to and id attribute and is available in basically everything)
  • Allow for custom attributes. But data-whatever as well as completely custom things
  • Make accessing individual parts of the content AST as fine grained as possible and build up from there. (e.g. a link token would be the href, the link text, and list of attributes, each one of which is addressable individually)
  • Do the Atom/RSS feed sooner rather than later. Like right after you have the initial list of pages showing up on the home page
  • Get something showing up as soon as possible. Minimum setup for me is a home page with no content other than a list of links to individual pages and then the pages themselves. Once that base is built, you can decide where to go from there
  • Use an ID only style system for accessing images. e.g. require only the file name and let the system find the file anywhere inside a dumping ground. That'll let you organize things in a way that makes sense but not have to worry about paths when dropping stuff in
  • Figure out a way to deal with alt text so you can reuse the images in multiple places without having to enter the same text multiple times
  • Make sure your ASTs have a solid test suite
  • Make multiple template types: e.g. home_page, post, feed_post.
  • Be able to query the system to get any content inside any template (sometimes it won't make sense and give you broken looking stuff, but the goal is be able to access anything you need without having to build more cases on top of the existing code)
  • Keep template separate from categories. (e.g. separate page level templates might exist for bookmarks, videos, and posts. Categories might exist for Rust, JavaScript, WebDev. You want to be able to tag any type of content in any template with any of the categories)
  • By using only the unique ID you can shift stuff between categories and or template without having to worry about changing URLs
  • Allow for pages to call custom things in the header (external CSS, JS, etc...)
  • Allow for defining explicit paths for key pages. That is, not everything would be just and ID. Landing pages for anything you want a named URL for should be possible. e.g. every page should automatically have it's own ID, but you should be able to override the URL to anything you want
  • Make sure you can still link to pages internally based off their ID even if they have a custom path. (e.g. <<ilink|some text|a1b2c3>> would point to /band-names/ if the page had been updated to use that path
  • Make sure other content can work on the site. (e.g. if you want to throw a raw HTML page up with a different foundation you should be able to just throw it in it's own directory and work on it directly. Then engine should leave it completely alone)
  • Test ingestion of the content and building the AST independently and first before moving on to the template output which should be testing on its own.
  • Make template at block levels too. E.g. paragraphs would have their own template. So, you could have something like "body_paragraph" and "li_paragraph" that would have the ability to add different default styles to them
  • Provide a way to append classes to the template output. (e.g. if the default output for something is <div class="alfaClass"></div> make sure you can add a class in the content to append to the output so you can get <div class="alfaClass bravoClass"></div>)
  • Don't make things be in a required order in the output. This goes back to the AST. (e.g. be able to make reference sections anywhere in the content and then have them show up in their original position, or all aggregated at the bottom depending on which template you use)
  • Keep a single UUIDv4 for atom feed top level ID that's hard coded (i.e. it's the ID of the feed itself)
  • Use UUIDv5 based off the feed's UUIDv4 and the content ID. That way it'll stay the same between builds and can be used to signal updates to readers.
  • Start by just doing a full site build every time a change is made. Only add in incremental builds if the times get longer than you'd like. (i.e. that's an optimization to add after the site is live)
  • Figure out how you want to handle HTML escaping in each template including code and pre sections
  • Provide blurb functionality for each page that can use explicit text or grabs the first few lines.
  • Automatically use default template for everything but allow overrides. (i.d. don't have to explicitly call template: default or whatever in the code every time only use it if you want to use something else)
  • Maintain a date object for every page that you can use to generate any date format you want. (Dealing with timezones is left as an exercise for the reader)
  • Provide for including content generated at the universal level above the pages via a single call with parameters. e.g. on a posts pages, be able to call site.cagegory_links("music") to get that data.
  • Provide for internal reference links. e.g. be able to just drop in an id in a reference call and have it automatically grab the title text, blurb, and link for the page and drop it in.
  • Provide for a way to show pages in dev that don't show up in prod. (e.g. make a drafts content type with a listing page that only gets generated in dev)
  • I haven't dealt with pagination yet. Notes on that will happen when I get to it
  • Store external IDs instead of URLs where possible. E.g. with YouTube, I grab the ID and only store that. I like being able to assemble the URL in the template instead of having it be what's in the content
  • Let pages be assembled from other pages. (e.g. provide a way at the site level to create a pages showing that last to pages with a template type of "music")
  • Ensure there's a way to delete files when content is removed from the content source.
  • Provide access to every section via # links
  • Provide a way to call external process, send data to them for processing and drop the results into the output
  • Each template should have things it expects (e.g. posts should have titles). Decided for each one if the content should be skipped if it doesn't contain it.
  • Think about content files as individual items instead of necessarily pages. e.g. there might be a piece of content with just an image and the base metadata in it. That could have it's own output page, but might only be part of a different group.
  • Provide for groups of content independent of templates and categories. E.g. a "funk" group for "music" template pages" that builds a collection page. This is independent from the category of funk which would cross all page types (e.g. "posts" and "bookmarks").

    I haven't really started in on this one yet. It may not become a thing, but I keep running into things where something like this would be nice.

  • Create internal link references that auto pull the title for you too. Could do something like <<ilink|some text|a1b2c3>> for linking the text "some text", and then use this <<ilink|EMPYT_SPACE_HERE_THAT_DOES_NOT_PARSE_YET|a1b2c3>> to default to just grabbing the title of the page.
  • Make an auto reference link with title thing like the internal ids. e.g. -- iref with the id that would just make a reference with that page title and blurb. Should be able to override the blurb though
  • Use HTML elements as the starting components and token elements. (e.g. blockquotes, and asides)
  • Allow override template from directly in the content. e.g. for outputting a code section, create a template with and without line numbers
  • Create a top level config page with metadata for the site
  • Ability to add css and javascript in the content that gets added directly into the head of the document (in addition to being able to call external files)
  • Include span as a wrapper for text tokens
  • Provide for data payload in content. E.g. JSON blobs that would be processed by the template.
  • Provide a way to execute inline code on the page to build out content sections. (e.g. provide a -- data section with a JSON list of band names and then a -- exe section with some python code that runs to sort the output.

    I've got this working from inside my notes, but not executing on built yet. It's one of the key reasons I built all the stuff so that I can ensure that my code snippets actually work when I publish them

  • In the templates, wrap every section in a top level element where custom classes and attributes are applied.
  • Be able to send content to dev and prod and whatever else you want output directories
  • Make sure for when parsing things like class attributes to split each class into its own item so they can be parsed individually
  • Create a grammar for the parser to start with at the lowest level first.
  • Create full inline token tags as well as shorthand ones. Identify which one was used in the AST so it can be used to output a valid version of the source with the same token set used
  • Make an ouput that shows the latest edit always at the same URL (/last-edit e.g.) make sure it doesn't show up in the main feed outputs
  • Set up flags to keep individual pages out of lists and feeds
  • Generate lists dynamically as their own data feeds that can be used in their own templates
  • Provided for both an originally published and an updated date on pages
  • Don't require any given section to exist in the main content bin other than metadata which houses the ID.
  • Allow files that are not generated by the engine to be able to be mixed into the same directories as files that are. (e.g. if a content page builds out to /pages/a1b2c3/ allow for manually dropping in a file at /pages/a1b2c3/script.js
  • Add link checking that happens after the build (e.g. generate all the content so it's ready for deploy and then check everything after that so the checks don't delay the build)
-- end of line --