Menu

Uploading and Downloading Data from Postgres: Revisited

December 22, 2013 - C#, Programming

Following my experience fiddling with this as documented Here I found the result had numerous issues. Even when running it on a separate thread there was still UI problems as a result, since it starved the UI thread even with low priorities, since it never actually yielded. It also didn’t have progress, which just sucked in general.

I revisited the problem and discovered PostGres’s “Large Object” Support. The documentation and examples were a bit basic but I managed to piece together an example that works.

When you create and store Large Objects in the postgres database, you are given a “Object ID” or OID. From the C# perspective, this is basically just an int; if you want to store large objects in your Database tables, you simply have an OID field in that table, and then store the int there. We avoid a number of problems we had in that older implementation, particularly since we no longer even need sequential access at all; since all we are retrieving is an int, we don’t have a massive resultset to send over the wire from the query.

The actual ‘Oomph’ to upload or download that data to a large object can be done in a chunk-like fashion. First, here is the sample code that adds a given file’s contents as a large Object, and returns the resulting OID:

It’s reasonably simple. There are some caveats if something like this is done in a “real” program, such as deleting any existing OID for that record, making sure you only have one LargeObjectManager per connection, etc. Another caveat is that it all needs to be handled within a Connection. I’ve been testing with a local PostGres server I started locally, and I’m a bit anxious since I’m not sure if the commit will actually make my work useless and be the single point where all the work actually happens- which would effectively means I did all this for nothing. It doesn’t work outside of a commit, so I hope, and suspect, that the commit simply finalizes the changes that were already sent over the wire. I will be a sad panda if it turns out that it actually waits for the Commit to send anything at all, but that would also be a very silly way to implement it.

Of course downloading back from ah Object ID is similar. One caveat I found with npgsql was that OID fields were read as boxed long rather than int, which meant my initial attempts to cast to int failed. I had to cast from the boxed long to a long and then to an int via (int)(long)dr[“ObjectField”]. But once the ID itself was retrieved it was a relatively simple process to reverse the Upload:

Since it’s pretty much the inverse of the Upload method, I didn’t comment it. One notable thing about both of these is that in a concurrent situation you would likely want to stick any progress updates within the while loop. In my implementation I updated the appropriate UI by firing events from the class this method was placed; progress events as it uploaded or downloaded and a completion event when it finished.

My fundamental misunderstanding was that I didn’t understand that Large objects were effectively database globals; the OID field wasn’t arbitrary byte data, it is simply a int pointer to the data within that database. The test was successful; I was able to upload and then download the data successfully. Now it’s just a matter of integrating it into the project that I needed it for, which is another matter entirely.

Have something to say about this post? Comment!