Thoughts on database versioning

In a database, many versioned resources can be inserted at the top level.
Is there a concept, to have a versioned database ?
That is, if some resource in a database is modified/deleted/created, the databases own version is changed.

It is possible to have one toplevel resource and have all resources as children of that, but I guess that this is not the best/performant way to do this (?)

(I guess my thinking was triggered by DAtomic https://www.datomic.com/)

1 Like

Hey, great question(s) :slight_smile:

I’d probably create a special database resource, which stores the names and respective versions. We’d need to finish work on adding the node transactions to a database-wide transaction though and once committed the resource is also kept up to date.

The slightly harder thing will be to manage failed database-wide transactions, but maybe in the first step we can ignore this and add cleanup methods in a second step.

But in general I think it’s not so much work (when a special .commit-file is not deleted inside a resource-folder it means a transaction started but crashed in the middle – we then need to truncate the resource back to the offset of the most recently committed revision and UberPage, otherwise some storage space would be wasted) :slight_smile: Other than that I think we either could run background threads periodically to do this or wait until a read-write transaction is opened on the resource again.

We’d also have to check the special resource in the most recent revision to truncate the most recent revisions, if a database-wide transaction fails. Maybe we can also manage this with a special .commit file which just denotes if a database-wide transaction failed or not.

I think the first part is nearly finished, but I didn’t like the API and I didn’t come up with a good idea. The thing is

I’d use the existing node transactions on a resource and simple add them to a database-wide transaction. However if someone commits the node transaction in-between it somehow destroys the whole purpose. Somehow we’d need to have an internal commit method on the node transaction but not expose this to the public.

Another idea would be to add a Git dependency for versioning the database wide transaction and as a commit-hook add the Git transaction. However, I think that would be overkill.

simply have a rest-api that only uses database-versioning so that clients can’t commit on resources by any api-calls ?

You are referring to the currently unused function
org.sirix.access.trx.page.PageTrxImpl#truncateTo
right ?

Yes, this function should simply truncate the RandomAccessFile or the storage to the most recently committed revision. That is if the system crashes for some reason during a commit there might be already synced stuff. Or in the case of database-wide transactions we have to lock the most recent revisions until all node transactions opened on the resources have committed and we maybe need to truncate those to most-recent-revision - 1 if one of the node transactions fails.

so basically delaying this call to unLock() here to the last moment
org/sirix/access/trx/node/json/JsonNodeTrxImpl.java:1260

And not calling unLock() will guarantee that the newest revision is not readable anywhere ?

1 Like

Hm, maybe splitting the method as we’d need a two-phase commit protocol. Yes, something like that.

I think I introduced the lock as of now only for scheduler based commits, that they don’t interfere with calls from users to the commit-method. But good catch :slight_smile:

The line numbers also remind me that I want to split this huge transaction classes…

I had already created an issue for this in december last year :wink: