SirixDB Community Forum / Discussions

Thoughts on database versioning

In a database, many versioned resources can be inserted at the top level.
Is there a concept, to have a versioned database ?
That is, if some resource in a database is modified/deleted/created, the databases own version is changed.

It is possible to have one toplevel resource and have all resources as children of that, but I guess that this is not the best/performant way to do this (?)

(I guess my thinking was triggered by DAtomic https://www.datomic.com/)

1 Like

Hey, great question(s) :slight_smile:

I’d probably create a special database resource, which stores the names and respective versions. We’d need to finish work on adding the node transactions to a database-wide transaction though and once committed the resource is also kept up to date.

The slightly harder thing will be to manage failed database-wide transactions, but maybe in the first step we can ignore this and add cleanup methods in a second step.

But in general I think it’s not so much work (when a special .commit-file is not deleted inside a resource-folder it means a transaction started but crashed in the middle – we then need to truncate the resource back to the offset of the most recently committed revision and UberPage, otherwise some storage space would be wasted) :slight_smile: Other than that I think we either could run background threads periodically to do this or wait until a read-write transaction is opened on the resource again.

We’d also have to check the special resource in the most recent revision to truncate the most recent revisions, if a database-wide transaction fails. Maybe we can also manage this with a special .commit file which just denotes if a database-wide transaction failed or not.

I think the first part is nearly finished, but I didn’t like the API and I didn’t come up with a good idea. The thing is

I’d use the existing node transactions on a resource and simple add them to a database-wide transaction. However if someone commits the node transaction in-between it somehow destroys the whole purpose. Somehow we’d need to have an internal commit method on the node transaction but not expose this to the public.

1 Like

Another idea would be to add a Git dependency for versioning the database wide transaction and as a commit-hook add the Git transaction. However, I think that would be overkill.

simply have a rest-api that only uses database-versioning so that clients can’t commit on resources by any api-calls ?

1 Like

You are referring to the currently unused function
org.sirix.access.trx.page.PageTrxImpl#truncateTo
right ?

1 Like

Yes, this function should simply truncate the RandomAccessFile or the storage to the most recently committed revision. That is if the system crashes for some reason during a commit there might be already synced stuff. Or in the case of database-wide transactions we have to lock the most recent revisions until all node transactions opened on the resources have committed and we maybe need to truncate those to most-recent-revision - 1 if one of the node transactions fails.

so basically delaying this call to unLock() here to the last moment
org/sirix/access/trx/node/json/JsonNodeTrxImpl.java:1260

And not calling unLock() will guarantee that the newest revision is not readable anywhere ?

1 Like

Hm, maybe splitting the method as we’d need a two-phase commit protocol. Yes, something like that.

I think I introduced the lock as of now only for scheduler based commits, that they don’t interfere with calls from users to the commit-method. But good catch :slight_smile:

The line numbers also remind me that I want to split this huge transaction classes…

I had already created an issue for this in december last year :wink: