@drawohara
published on: 2014-06-12

Yesterday Ryan Cook and I hit one of those dreaded production only issues experienced developers know and hate on the soft launched Moshi Moshi Co product Wall Space Finder.

This one was a doozy, models would fail to save in staging only, not locally or in production. Obviously this was RAILS_ENV related, or so I thought…

First we did the obvious, looked at the code locally in development mode. Nothing, it totally worked. No problems.

Then we used the awesome

  ~> cap staging db:suck 
  
  
  ### the inverse is, you guessed it, db:blow, because development needs to be moar funi...

to suck the remote staging db into the local db, thereby replicating code, data, and the RAILS_ENV via

  ~> RAILS_ENV=production RAILS_STAGE=staging ./bin/rails server

A quick note on that: Dojo4 runs all staging deploys in RAILS_ENV=production and disambiguates staging vs. production via another environment variable (RAILS_STAGE) precisely for the reason that we like to exercise any and all production behaviors in staging/qa where possible - selectively guarding only crazy behaviors like charging credit cards via RAILS_STAGE.

So rest assured the issue was not because we had ./config/environments/staging.rb setup differently than ./config/environments/production.rb!

But still, we could not replicate. #WTF!?

Finally, I instrumented the staging deploy to use #dieüberawesomesauce pry-remote and dropped right into the BOOMing code on the staging node.

And there it was: a unique contraint was being violated in the database. Yet no unique indexes were defined in the model, or anywhere else. Hrmmmmm…

Reviewing the git logs I found that, previously, a unique index had been defined on the offending model. Problem solved I thought, a quick

  ~> rake db:mongoid:remove_indexes

and. The problem remained.

Ok. Code reading time. 3 minutes later the problem was discovered. In Mongoid 4 the remove_indexes task uses this code

      # Return the list of indexes by model that exist in the database but aren't
      # specified on the models.
      #
      # @example Return the list of unused indexes.
      # Mongoid::Tasks::Database.undefined_indexes
      #
      # @return Hash{Class => Array(Hash)} The list of undefined indexes by model.
      def undefined_indexes(models = ::Mongoid.models)
        undefined_by_model = {}

ref: https://github.com/mongoid/mongoid/blob/master/lib/mongoid/tasks/database.rb#L40

but, in Mongoid 3.x, which we are using, it has no such logic.

ref: https://github.com/mongoid/mongoid/blob/3.1.0-stable/lib/rails/mongoid.rb

So there you have it: Mongoid 4 ensures that all indexes, even those no longer defined in the code/repo are nuked when indexes are dropped, while Mongoid 3 will leave those indexes lying around in the database!

I decided to write about this experience because:

And people wonder why we engineers can’t estimate the time and effort to fix a simple bug.

P.S. Some of you readers might be wondering how I fixed this. I simply re-defined the index in the console, so Mongoid would be aware of it, and then used the model level methods to nuke it

[48] pry(#<My::SpacesConducer>)> model.save
Moped::Errors::OperationFailure: The operation: #<Moped::Protocol::Command
  @length=89
  @request_id=604
  @response_to=0
  @op_code=2004
  @flags=[]
  @full_collection_name="wall_space_finder-staging.$cmd"
  @skip=0
  @limit=-1
  @selector={:getlasterror=>1, :safe=>true}
  @fields=nil>
failed with error 11000: "E11000 duplicate key error index: wall_space_finder-staging.art_spaces.$profile.slug_1 dup key: { : null }"
[49] pry(#<My::SpacesConducer>)> ArtSpace.index({:slug => 1}, {:unique => true})
=> {:unique=>true}
[50] pry(#<My::SpacesConducer>)> ArtSpace.remove_indexes
=> true
[51] pry(#<My::SpacesConducer>)> model.save
=> true