PGSQL file backend updates on files with multiple references #31

Closed
opened 2019-02-06 23:53:20 +00:00 by jamie · 7 comments
jamie commented 2019-02-06 23:53:20 +00:00 (Migrated from git.hazaar.io)

The Problem

When you store a file that is an exact copy of a file already in the database (uses md5sum to compare) instead of storing the file chunks multiple times, it just creates multiple references to the file contents. The problem here is, when we update the contents of a file we just blindly perform a PDO UPDATE query to write the contents, which means that the contents of all references is updated which might not be the desired result.

The Solution

When updating file contents we just need to look and see if there are multiple references to the content. If there is only one references, then we can just do the update as normal. If there are multiple, we will need to check the MD5 sum and if it already exists, update the references to point to that (rather than insert new data). If the MD5 sum doesn't exist, then we can just do the INSERT of the new data and update the references.

# The Problem When you store a file that is an exact copy of a file already in the database (uses md5sum to compare) instead of storing the file chunks multiple times, it just creates multiple references to the file contents. The problem here is, when we update the contents of a file we just blindly perform a PDO UPDATE query to write the contents, which means that the contents of all references is updated which might not be the desired result. # The Solution When updating file contents we just need to look and see if there are multiple references to the content. If there is only one references, then we can just do the update as normal. If there are multiple, we will need to check the MD5 sum and if it already exists, update the references to point to that (rather than insert new data). If the MD5 sum doesn't exist, then we can just do the INSERT of the new data and update the references.
jamie commented 2019-02-09 08:17:29 +00:00 (Migrated from git.hazaar.io)

moved from hazaar-mvc#155

moved from hazaar-mvc#155
jamie commented 2019-02-10 10:27:34 +00:00 (Migrated from git.hazaar.io)

created merge request !9 to address this issue

created merge request !9 to address this issue
jamie commented 2019-02-10 10:27:37 +00:00 (Migrated from git.hazaar.io)

mentioned in merge request !9

mentioned in merge request !9
jamie commented 2019-02-11 10:32:02 +00:00 (Migrated from git.hazaar.io)

This is now working brilliantly. More testing will be done. Most importantly now though there needs to be some sort of upgrade path due to the changes to the database structure that this required.

This is now working brilliantly. More testing will be done. Most importantly now though there needs to be some sort of upgrade path due to the changes to the database structure that this required.
jamie commented 2019-02-16 22:22:33 +00:00 (Migrated from git.hazaar.io)

So my upgrade path has been sorted. I have moved the filesystem init into the Schema\Manager class so that it can be handled automatically during a schema migration. This will also handle the upgrade if it is needed. An upgrade can be triggered by making sure that the file and file_chunk tables exist and that the new hz_file and hz_file_chunk tables do not. In this case the new tables will be created and the existing data will be transferred over, leaving the old tables in place.

The reason the old tables are left in place is because they could possibly be included in existing schema management. So that way, when the developer is ready, they can drop the tables and snapshot the database to propagate these changes.

So my upgrade path has been sorted. I have moved the filesystem init into the `Schema\Manager` class so that it can be handled automatically during a schema migration. This will also handle the upgrade if it is needed. An upgrade can be triggered by making sure that the `file` and `file_chunk` tables exist and that the new `hz_file` and `hz_file_chunk` tables do not. In this case the new tables will be created and the existing data will be transferred over, leaving the old tables in place. The reason the old tables are left in place is because they could possibly be included in existing schema management. So that way, when the developer is ready, they can drop the tables and snapshot the database to propagate these changes.
jamie commented 2019-02-18 01:23:49 +00:00 (Migrated from git.hazaar.io)

mentioned in commit dd3f69042c

mentioned in commit dd3f69042c0b96382fb3ef7d1a43858fb9d1d8ac
jamie commented 2019-02-18 01:23:50 +00:00 (Migrated from git.hazaar.io)

closed via merge request !9

closed via merge request !9
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: hazaar/hazaar-dbi#31
No description provided.