PGSQL file backend updates on files with multiple references #31
Labels
No labels
bug
confirmed
critical
discussion
documentation
Doing
enhancement
suggestion
support
To Do
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: hazaar/hazaar-dbi#31
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The Problem
When you store a file that is an exact copy of a file already in the database (uses md5sum to compare) instead of storing the file chunks multiple times, it just creates multiple references to the file contents. The problem here is, when we update the contents of a file we just blindly perform a PDO UPDATE query to write the contents, which means that the contents of all references is updated which might not be the desired result.
The Solution
When updating file contents we just need to look and see if there are multiple references to the content. If there is only one references, then we can just do the update as normal. If there are multiple, we will need to check the MD5 sum and if it already exists, update the references to point to that (rather than insert new data). If the MD5 sum doesn't exist, then we can just do the INSERT of the new data and update the references.
moved from hazaar-mvc#155
created merge request !9 to address this issue
mentioned in merge request !9
This is now working brilliantly. More testing will be done. Most importantly now though there needs to be some sort of upgrade path due to the changes to the database structure that this required.
So my upgrade path has been sorted. I have moved the filesystem init into the
Schema\Manager
class so that it can be handled automatically during a schema migration. This will also handle the upgrade if it is needed. An upgrade can be triggered by making sure that thefile
andfile_chunk
tables exist and that the newhz_file
andhz_file_chunk
tables do not. In this case the new tables will be created and the existing data will be transferred over, leaving the old tables in place.The reason the old tables are left in place is because they could possibly be included in existing schema management. So that way, when the developer is ready, they can drop the tables and snapshot the database to propagate these changes.
mentioned in commit
dd3f69042c
closed via merge request !9