Possible Azure SQL #14

adamfur · 2015-08-26T21:59:58Z

Hi,

We have built a system that is using the PollingClient to feed around 25 commit observing projection builders.
Occupationally in production we have simultaneously received crashes in several of our projection builders at the same time, at the very same commit checkpoint.
When we reset the checkpoint, and feed the projection builders with the entire event stream again, it passes perfectly.
So it appears it sometimes skips a few events!
It only seem to happen during highly intense read+write periods during imports of data.

SQL Azure's default transaction level is different from vanilla MSSQL.

If I write event+0 and event+1 to the database, is it possible that event+1 is available before event+0 in SQL Azure?

In MSSQL we haven't been able to reproduce this result.

Any clues?

andreabalducci · 2015-08-27T06:14:13Z

Haven't worker yet with SqlAzure but is possibile to have on a distributed system this kind of issue. You should handle this "glitch" in your client:
A) polling client -> sequencer -> projection
B) modify the polling client to read with few milliseconds delay

I will go for A

adamfur · 2015-08-27T17:25:34Z

I've messed around a bit, received transaction exceptions in the polling client while using the EnlistInAmbientTransaction() call during the Wireup(). Not sure if it actually solves anything, but gonna try it out for a few days.

Regarding opt A and B.
I think creating a sequencer is difficult, we except a lot of holes in CheckpointNumber identity column, as we are using several buckets, also SqlAzure sometimes "randomly" bumps the identity by +10'000.

Implemented a version of B were we changed ObserveFrom*() to pass UtcNow - 300ms, and ignore all commits newer than that. Giving some time for the infrastructure to catch up.

If we settle for opt B, I will eventually send a pull request with a SqlAzureDialect.

adamfur · 2015-09-11T20:03:52Z

Our issues:

Azure SQL sometimes bump up the identity (CheckpointNumber) value by 10'000.
Fetching a range of recently written commits, is sometimes missing a portion of the data.
There are unexplained gaps in CheckpointNumber, usually ranging from 1-2.

The workaround:

Added a predicate so we can filter on the bucketId, whatever we should use the OnNext()-method on the current commit.
Always fetches from all-the-buckets (Only way to know if we have a gap).
Always consume commits older than 5 seconds.
Throw if the next commit is not the last checkpointnumber + 1.

Notes:

If case we stumble upon a gap, the clients will have to tolerate a lag of 5 seconds before their projections are updated (happens like two times a day).
Invalid sequence retrieval is treated like a transient error, will retry until the next commit has aged to 5 seconds or more, or if we receive the correct sequence.
In our logs we can see that it has taken almost 0.5s before we ultimately receive our expected sequence number.

fschmied · 2016-03-31T07:46:05Z

I believe this to be caused by READ COMMITTED SNAPSHOT, which seems to be on in Azure SQL by default (https://blogs.msdn.microsoft.com/sqlcat/2013/12/26/be-aware-of-the-difference-in-isolation-levels-if-porting-an-application-from-windows-azure-sql-db-to-sql-server-in-windows-azure-virtual-machine/) and is incompatible with NES.

I wonder if creating an AzureSqlDialect using READCOMMITTEDLOCK would have resolved the issue (if that works on Azure SQL).

fschmied · 2018-04-15T13:02:12Z

I just did a bit of experimentation that showed adding the WITH (READCOMMITTEDLOCK) table hint to NEventStore's queries would probably solve the problem observed by @adamfur as it reintroduces the blocking behavior of the polling client normally seen under SQL Server, but lost under Azure SQL.

fschmied · 2019-08-09T06:13:35Z

We've created a subclass of MsSqlDialect that adds the READCOMITTEDLOCK table hint for Azure SQL, and it seems to fix the main problem.

What remains is the very low likelihood of #21 occurring, but I think noone has actually ever seen this in production.

andreabalducci mentioned this issue Mar 30, 2016

Recreate the correct Sequence in CommitPollingClient NEventStore/NEventStore#425

Open

This was referenced Aug 9, 2019

Possibility of losing commits when polling in high concurrency scenarios with SQL Server #21

Open

Make MsSqlDialect compatible with AzureSql and READ COMMITTED SNAPSHOT #31

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible Azure SQL #14

Possible Azure SQL #14

adamfur commented Aug 26, 2015

andreabalducci commented Aug 27, 2015

adamfur commented Aug 27, 2015

adamfur commented Sep 11, 2015

fschmied commented Mar 31, 2016

fschmied commented Apr 15, 2018

fschmied commented Aug 9, 2019 •

edited

Loading

Possible Azure SQL #14

Possible Azure SQL #14

Comments

adamfur commented Aug 26, 2015

andreabalducci commented Aug 27, 2015

adamfur commented Aug 27, 2015

adamfur commented Sep 11, 2015

fschmied commented Mar 31, 2016

fschmied commented Apr 15, 2018

fschmied commented Aug 9, 2019 • edited Loading

fschmied commented Aug 9, 2019 •

edited

Loading