Following my experience fiddling with this as documented Here I found the result had numerous issues. Even when running it on a separate thread there was still UI problems as a result, since it starved the UI thread even with low priorities, since it never actually yielded. It also didn’t have progress, which just sucked in general.
I revisited the problem and discovered PostGres’s “Large Object” Support. The documentation and examples were a bit basic but I managed to piece together an example that works.
When you create and store Large Objects in the postgres database, you are given a “Object ID” or OID. From the C# perspective, this is basically just an int; if you want to store large objects in your Database tables, you simply have an OID field in that table, and then store the int there. We avoid a number of problems we had in that older implementation, particularly since we no longer even need sequential access at all; since all we are retrieving is an int, we don’t have a massive resultset to send over the wire from the query.
The actual ‘Oomph’ to upload or download that data to a large object can be done in a chunk-like fashion. First, here is the sample code that adds a given file’s contents as a large Object, and returns the resulting OID:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
private static int UploadLargeObject(String sFileName) { NpgsqlTransaction trans = conn.BeginTransaction(); try //try block to make sure the transaction is committed. //of course Exceptions should be handled here too. { //grab the LargeObjectManager for this connection. //if you are using multiple connections or even just a single connection, you //should make sure you only have one LargeObjectManager per connection. LargeObjectManager lom = new LargeObjectManager(conn); //create the new Large Object. newobj will give us the new object ID. int newobj = lom.Create(LargeObjectManager.READWRITE); //now that it's created, retrieve a reference to a LargeObject instance representing it. LargeObject CreateObject = lom.Open(newobj, LargeObjectManager.WRITE); int Chunksize = 128*1024; //size of chunks to upload. using (FileStream fs = new FileStream(sFileName, FileMode.Open, FileAccess.Read)) { //read the File in chunks of Chunksize bytes. byte[] buffer = new byte[Chunksize]; int bytesread = fs.Read(buffer, 0, Chunksize); while (bytesread == Chunksize) { CreateObject.Write(buffer); bytesread = fs.Read(buffer, 0, Chunksize); } //write the remainder. bytesread will have the result //of the last read, which will be the size of the read data in the buffer. CreateObject.Write(buffer, 0, bytesread); } //grab the ID of the object, int resultid = CreateObject.GetOID(); //close the object CreateObject.Close(); //and return the resulting ID. return resultid; } finally { trans.Commit(); } } |
It’s reasonably simple. There are some caveats if something like this is done in a “real” program, such as deleting any existing OID for that record, making sure you only have one LargeObjectManager per connection, etc. Another caveat is that it all needs to be handled within a Connection. I’ve been testing with a local PostGres server I started locally, and I’m a bit anxious since I’m not sure if the commit will actually make my work useless and be the single point where all the work actually happens- which would effectively means I did all this for nothing. It doesn’t work outside of a commit, so I hope, and suspect, that the commit simply finalizes the changes that were already sent over the wire. I will be a sad panda if it turns out that it actually waits for the Commit to send anything at all, but that would also be a very silly way to implement it.
Of course downloading back from ah Object ID is similar. One caveat I found with npgsql was that OID fields were read as boxed long rather than int, which meant my initial attempts to cast to int failed. I had to cast from the boxed long to a long and then to an int via (int)(long)dr[“ObjectField”]. But once the ID itself was retrieved it was a relatively simple process to reverse the Upload:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
private static void DownloadLargeObject(int Oid,String TargetFilename) { NpgsqlTransaction trans = conn.BeginTransaction(); try { LargeObjectManager lom = new LargeObjectManager(conn); LargeObject ReadObject = lom.Open(Oid, LargeObjectManager.READ); int Chunksize = 128*1024; using(FileStream outfs = new FileStream(TargetFilename,FileMode.Create,FileAccess.Write)) { byte[] buffer = new byte[Chunksize]; int readbytes = ReadObject.Read(buffer,0,Chunksize); while (readbytes == Chunksize) { outfs.Write(buffer,0,Chunksize); readbytes = ReadObject.Read(buffer, 0, Chunksize); } //write remainder. outfs.Write(buffer, 0, readbytes); } } finally { trans.Commit(); } } |
Since it’s pretty much the inverse of the Upload method, I didn’t comment it. One notable thing about both of these is that in a concurrent situation you would likely want to stick any progress updates within the while loop. In my implementation I updated the appropriate UI by firing events from the class this method was placed; progress events as it uploaded or downloaded and a completion event when it finished.
My fundamental misunderstanding was that I didn’t understand that Large objects were effectively database globals; the OID field wasn’t arbitrary byte data, it is simply a int pointer to the data within that database. The test was successful; I was able to upload and then download the data successfully. Now it’s just a matter of integrating it into the project that I needed it for, which is another matter entirely.
Have something to say about this post? Comment!