home ~ projects ~ socials

I Want A Personal Database

TL;DR

I want a personal, persistent database. My own thing, in my possession, and under my control that houses all my content. Access to specific parts and pieces available only to those I give explicit permission

It feels inevitable. And, like it's never going to happen. But, a little bit more the former.

NOTE

>The SOLID Project exists>https://solidproject.org> which basically covers all this. I'd forgotten about it while writing this up, but what the hell, I'll post my version anyway

The General Idea

  • A person database
  • Can and does store anything and everything in both structured and non-structured ways (based of what it is)
  • Basically, whenever you interact with a service, you'd start with your database and then send it to them via a feed
  • Lock down, but available everywhere to me
  • Encrypt all the things
  • Free and Open-Source
  • More than just a database. Also a protocol for sharing feeds of content
  • No access to any part for anyone else without explicit permission given to access it
  • Really good search and multiple ways to navigate to find stuff
  • Nobody gets to write to the root level. Everything is partitioned into their own sections and access to each section is controlled independently
  • Permission to read/write to any individual section can be given to arbitrary groups. (e.g you could have several photos apps writing into one type of photos section and then distribute that to a few different aggregation services via feeds)
  • Everybody's DB would have a credentials section where each credential is locked down individually
  • Ability to generate one-time/multi-factor credentials
  • Each service would be responsible for the schema of their section
  • Schemas could be as strict as they like (e.g. a specific set of strongly typed content) or super loose (e.g. just an ID and a blob of binary)
  • Ability to have just write access to parts of the database without read access (and ability to limit read access to just portions of metadata)
  • Sections would be versioned so migrations to add/remove parts of schemas could occur
  • No code execution, just a datastore. (Link, I'm thinking you could use sqlite for this)
  • A section that's a collection of uuid type ids that can be used to verify you (e.g. a bunch of public/private key pairs so you can have several that aren't tied together explicitly)
  • Open source schemas would exist as well. (e.g. a schema for Mastodon post, music playlists, blog posts, videos)
  • HTML blobs would be one data type. Let folks make pages with whatever they want in them.
  • Plugins for schema types (e.g. if your using some type of video feed, various plug-ins could provide ways to do text overlays on top of the raw video
  • This is kinda a COPE (Create Once, Publish Everywhere)
  • Provide for feed outputs that pull form the different content types in the sections. (i.e. separate content from distribution)
  • Feeds can be made fully public, fully private, or anywhere in-between
  • Store the raw assets
  • CDN distribution of content
  • Auto sizing of output content
  • Every piece of content would have a uuid
  • Everything is done through feeds that have their own access and distribution control
  • Feeds would contain raw data as have the ability to send suggested displays. Thinking it would be kinda like sending an RSS feed with your own layout and CSS attached, but as separate things so folks could use it directly or use their own CSS if they want.
  • Some feeds would only have the content and leave it to the receiver to do the layout
  • Make real time communication a thing in it
  • Set up groups for messaging
  • Save threads from conversations (e.g. metadata connections between your content and other folks content where it would point back to the external UUID and also cache copies locally)
  • Version all the content. UUIDs would be the base and then some type of version number attached so folks could link to the general ID and always get the latest or back to a specific version (assuming that's provided) It would have to be a thing, but something you could opt into with a given feed
  • Maybe someway for content without an ID? I'm not sure about the privacy/security of this part, but maybe that shouldn't be in the DB at all? Probably, there's some of that, and then some stuff that would be distributed with only an ID and no metadata identifying the author
  • Aggregation and curation services would sit on top of the content, but always have IDs pointing back to the source content ID and creator
  • Mapping schemas would be a thing. e.g. if you have music playlists in multiple formats (from different services for example) maps could be made between the IDs and used as your primary feed. (could bounce into things like the international started recording code)
  • I suppose at the top level it would be: your content of different types, collections of metadata on that content that can be mixed and matched, and feeds from that metadata providing the content in the specified manner
  • Have centralized systems for aggregation, but make it like mastodon where you opt-into them for where you want to publish your content

Notes

  • Again, >the SOLID Project>https://solidproject.org> is a thing where folks have though a lot more about this than me. This is just me from the outside
  • I firmly believe if we give tools to folks to make more stuff, they will, and the more folks making more stuff, the less angry we'll be at each other an the better the world will be
  • There's lots of hard problem in this wish list: syncing, privacy, accessibility, and security (to name a few) are no joke. I'm not proposing solutions just the experience I'd like to have
  • The user interface to this thing would be no-joke when it came to permissions. Not only am I'm not trying to solve technical details here, I'm not digging into user interface really either
  • Yes, there are some parts up there that and contradictory
  • This is mainly an exercise in thinking about moving ownership and control of data from companies back to ourselves
  • Yeah, this opens up security issues, but it feels like it's no worse than having all your content spread across services that are using it and selling it
  • The single point of entry is a bit scary, but is it any worse than if someone gets into your email?
  • I have no idea how you'd deal with credentials and resetting them. I supposed it comes down to you having to trust someone, but that's already the case
  • Syncing and replication are hard problems
-- end of line --