Menu

Npgsql 3 And the mysterious Socket Error

May 7, 2017 - .NET, C#, Programming

Upgrading library components across an entire suite of applications is not without it risks, as one may shortly learn when upgrading from Npgsql 2 to Npgsql 3. Though it applies between any version- who knows what bugs might be added or maybe even bugs were fixed that you relied upon previously, either intentionally or without even being aware that the behaviour on which you relied was in fact unintended.

As it happens, Npgsql 3 has such particulars when it comes to upgrading from Npgsql 2. On some sites, and with some workstations, we were receiving reports of Exceptions and errors after the upgrade was deployed. These were in the form of IOExceptions from within the Npgsql library when we ran a query, which failed because “a connection was forcibly closed by the remote host” form. Even adding some retry loops/attempts didn’t resolve the issue, as it would hit it’s retry limit, as if at some point it just refuses to work across the NpgsqlConnection.

After some investigation, it turned out to be a change in Npgsql 3 and how pooling is managed. In this case, a connection in the pool was being “closed” by the postgres server. This was apparently masked with previous versions because with Npgsql 2, the Pooled connections would be re-opened if they were found to be closed. Npgsql 3 changed this both for performance reasons as well as to be consistent with other Data Providers; This change meant that our code was Creating a connection and opening it- but that connection was actually a remotely closed connection from the pool, as a result attempts to query against that connection would throw exceptions.

Furthermore, because of the nature of the problem there was no clear workaround that could be used. We could trap the exception but at that point, how do we actually handle it? If we try to re-open the connection we’d just get the same closed connection back. It would be possible to disable pooling to get the connection open in that case but there isn’t much reason to have that only take place when that specific exception occurs, and it means having added handling such that we handle the error everywhere we perform an SQL query- and that exception might not specifically be caused by the pooling consideration either.The fix we used was to add new global configuration options to our software which would add parameters to the connection string to disable pooling and/or enable Npgsql’s keepalive feature. The former of which would sidestep the issue by not using pooled connections and the latter which would prevent the connections from being closed remotely (except when there was a serious problem of course, in which case we want to exception to take place). So far it has been successful in resolving the problem on affected systems.

Have something to say about this post? Comment!