@drawohara
published on: 2015-04-22

TL; DR;

99.9% of the web developer world believes that the correct usage of an RDBMS, along with transactions, prevents their applications from seeing bad data and introducing serious data quality errors. They are DEAD WRONG.

I read with great interest Kyle Kingsbury’s excellent article about Mongo’s consistency model at https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads

Obviously this guy is super switched on and knows his stuff. He’s doing the work and everything about this article is insightful and well put together.

What I found astonishing, however, were the comments and what they reveal about the average professional developer:

Developers think using and RDBMS makes their data safe and they are absolutely wrong

I cannot tell you how many times I’ve gotten into arguments with ‘professional’ developers and especially silly sysadmins that actually believe that, by simply saying the word RDBMS, spinning a chicken around their head 3 times, and connecting to the magical unicorn of DBs their data will be safe and sound like, you know, … (something something about) …. banking transaction and all that (nonsense) dither about transactions and fsync. And a bunch of other stuff no developer I’ve ever met actually understands or has considered in the context of an HTTP (hint: stateless) application.

Before I proceed I’m going to issue a challenge:

Find me @ /contact or /team/ara-t-howard. Now on with it…

Riddle me this developer: what’s wrong with this code path:

  @db.transaction do
    if no_user_exists_with_conditions?
      @user = make_that_user_exists_with_those_conditions!
      deliver_an_activation_email_to!(@user)
    end
  end

Let me reveal something earth shattering to you:

THIS CODE IS TOTALLY BROKEN ON EVERY MAJOR RDBMS, AND VIRTUALLY EVERY APPLICATION, IN THE WORLD

I assure you that the email will go out twice.

Explaining transactions is beyond the scope of this article, but let me introduce you to ‘phantom reads’

http://en.wikipedia.org/wiki/Isolation_%28database_systems%29#Phantom_reads

In the above code a 2nd, concurrent, transaction can cause the following to happen:

  @db.transaction do
    if no_user_exists_with_conditions?
      # meanwhile, a 2nd transaction has created a duplicate user...
      # the following will will succeed, in __both__ transactions
      @user = make_that_user_exists_with_those_conditions!
      # both transaction will deliver the email
      deliver_an_activation_email_to!(@user)
    end
  end
  # one of the transactions will fail to commit, and go *BOOM* but, by then,
  # it is too late: the email has been sent twice and the error has been made

I know I know, you can’t believe it. But that’s just because you never bothered to RTFM when it comes to what ‘transaction’ means. Start here:

http://www.postgresql.org/docs/9.1/static/transaction-iso.html

Note that little table. Let me translate it for you:

And so I ask you which is a worse engineering decision:

ps. I’ve worked on large scale financial, realtime, and HA systems that use both Mongo and PostgreSQL. It’s damn hard either way.

pss. I tried to comment on your blog Kyle, but comments were blowing up ;-)