Uploading and Downloading Data from Postgres: Revisited

Following my experience fiddling with this as documented Here I found the result had numerous issues. Even when running it on a separate thread there was still UI problems as a result, since it starved the UI thread even with low priorities, since it never actually yielded. It also didn’t have progress, which just sucked in general.

I revisited the problem and discovered PostGres’s “Large Object” Support. The documentation and examples were a bit basic but I managed to piece together an example that works.

When you create and store Large Objects in the postgres database, you are given a “Object ID” or OID. From the C# perspective, this is basically just an int; if you want to store large objects in your Database tables, you simply have an OID field in that table, and then store the int there. We avoid a number of problems we had in that older implementation, particularly since we no longer even need sequential access at all; since all we are retrieving is an int, we don’t have a massive resultset to send over the wire from the query.

The actual ‘Oomph’ to upload or download that data to a large object can be done in a chunk-like fashion. First, here is the sample code that adds a given file’s contents as a large Object, and returns the resulting OID:

        private static int UploadLargeObject(String sFileName)
        {
            NpgsqlTransaction trans = conn.BeginTransaction();
            try //try block to make sure the transaction is committed.
                //of course Exceptions should be handled here too.
            {
                //grab the LargeObjectManager for this connection.
                //if you are using multiple connections or even just a single connection, you
                //should make sure you only have one LargeObjectManager per connection.
                LargeObjectManager lom = new LargeObjectManager(conn);
                //create the new Large Object. newobj will give us the new object ID.
                int newobj = lom.Create(LargeObjectManager.READWRITE);
                //now that it's created, retrieve a reference to a LargeObject instance representing it.
                LargeObject CreateObject = lom.Open(newobj, LargeObjectManager.WRITE);

                int Chunksize = 128*1024; //size of chunks to upload.
                using (FileStream fs = new FileStream(sFileName, FileMode.Open, FileAccess.Read))
                {
                    //read the File in chunks of Chunksize bytes.
                    byte[] buffer = new byte[Chunksize];
                    int bytesread = fs.Read(buffer, 0, Chunksize);
                    while (bytesread == Chunksize)
                    {
                        CreateObject.Write(buffer);
                        bytesread = fs.Read(buffer, 0, Chunksize);
                    }
                    //write the remainder. bytesread will have the result
                    //of the last read, which will be the size of the read data in the buffer.
                    CreateObject.Write(buffer, 0, bytesread);

                }
                //grab the ID of the object,
                int resultid = CreateObject.GetOID();
                //close the object
                CreateObject.Close();
                //and return the resulting ID.
                return resultid;
            }
            finally
            {
                trans.Commit();
            }
        }

private static int UploadLargeObject(String sFileName)

{

NpgsqlTransaction trans = conn.BeginTransaction();

try //try block to make sure the transaction is committed.

//of course Exceptions should be handled here too.

{

//grab the LargeObjectManager for this connection.

//if you are using multiple connections or even just a single connection, you

//should make sure you only have one LargeObjectManager per connection.

LargeObjectManager lom = new LargeObjectManager(conn);

//create the new Large Object. newobj will give us the new object ID.

int newobj = lom.Create(LargeObjectManager.READWRITE);

//now that it's created, retrieve a reference to a LargeObject instance representing it.

LargeObject CreateObject = lom.Open(newobj, LargeObjectManager.WRITE);

int Chunksize = 128*1024; //size of chunks to upload.

using (FileStream fs = new FileStream(sFileName, FileMode.Open, FileAccess.Read))

{

//read the File in chunks of Chunksize bytes.

byte[] buffer = new byte[Chunksize];

int bytesread = fs.Read(buffer, 0, Chunksize);

while (bytesread == Chunksize)

{

CreateObject.Write(buffer);

bytesread = fs.Read(buffer, 0, Chunksize);

}

//write the remainder. bytesread will have the result

//of the last read, which will be the size of the read data in the buffer.

CreateObject.Write(buffer, 0, bytesread);

}

//grab the ID of the object,

int resultid = CreateObject.GetOID();

//close the object

CreateObject.Close();

//and return the resulting ID.

return resultid;

}

finally

{

trans.Commit();

}

It’s reasonably simple. There are some caveats if something like this is done in a “real” program, such as deleting any existing OID for that record, making sure you only have one LargeObjectManager per connection, etc. Another caveat is that it all needs to be handled within a Connection. I’ve been testing with a local PostGres server I started locally, and I’m a bit anxious since I’m not sure if the commit will actually make my work useless and be the single point where all the work actually happens- which would effectively means I did all this for nothing. It doesn’t work outside of a commit, so I hope, and suspect, that the commit simply finalizes the changes that were already sent over the wire. I will be a sad panda if it turns out that it actually waits for the Commit to send anything at all, but that would also be a very silly way to implement it.

Of course downloading back from ah Object ID is similar. One caveat I found with npgsql was that OID fields were read as boxed long rather than int, which meant my initial attempts to cast to int failed. I had to cast from the boxed long to a long and then to an int via (int)(long)dr[“ObjectField”]. But once the ID itself was retrieved it was a relatively simple process to reverse the Upload:

        private static void DownloadLargeObject(int Oid,String TargetFilename)
        {
            NpgsqlTransaction trans = conn.BeginTransaction();
            try
            {
                LargeObjectManager lom = new LargeObjectManager(conn);
                LargeObject ReadObject = lom.Open(Oid, LargeObjectManager.READ);
                int Chunksize = 128*1024;

                using(FileStream outfs = new FileStream(TargetFilename,FileMode.Create,FileAccess.Write))
                {
                    byte[] buffer = new byte[Chunksize];
                    int readbytes = ReadObject.Read(buffer,0,Chunksize);
                    while (readbytes == Chunksize)
                    {
                        outfs.Write(buffer,0,Chunksize);
                        readbytes = ReadObject.Read(buffer, 0, Chunksize);
                    }
                    //write remainder.
                    outfs.Write(buffer, 0, readbytes);

                }

            }
            finally
            {
                trans.Commit();
            }
        }

private static void DownloadLargeObject(int Oid,String TargetFilename)

{

NpgsqlTransaction trans = conn.BeginTransaction();

try

{

LargeObjectManager lom = new LargeObjectManager(conn);

LargeObject ReadObject = lom.Open(Oid, LargeObjectManager.READ);

int Chunksize = 128*1024;

using(FileStream outfs = new FileStream(TargetFilename,FileMode.Create,FileAccess.Write))

{

byte[] buffer = new byte[Chunksize];

int readbytes = ReadObject.Read(buffer,0,Chunksize);

while (readbytes == Chunksize)

{

outfs.Write(buffer,0,Chunksize);

readbytes = ReadObject.Read(buffer, 0, Chunksize);

}

//write remainder.

outfs.Write(buffer, 0, readbytes);

}

finally

{

trans.Commit();

}

Since it’s pretty much the inverse of the Upload method, I didn’t comment it. One notable thing about both of these is that in a concurrent situation you would likely want to stick any progress updates within the while loop. In my implementation I updated the appropriate UI by firing events from the class this method was placed; progress events as it uploaded or downloaded and a completion event when it finished.

My fundamental misunderstanding was that I didn’t understand that Large objects were effectively database globals; the OID field wasn’t arbitrary byte data, it is simply a int pointer to the data within that database. The test was successful; I was able to upload and then download the data successfully. Now it’s just a matter of integrating it into the project that I needed it for, which is another matter entirely.

Have something to say about this post? Comment!

BASeCamp Programming Blog

Multi-lingual when he wants to be.

Uploading and Downloading Data from Postgres: Revisited