After the healthcare.gov fiasco where people could not sign up during the open enrollment period for their mandatory healthcare because of the site continually crashing, you would have thought that the technical industry and companies that exist within the domain where you can have sudden spikes of traffic would have learned something by now.
It appears after the very visible failure of healthcare.gov in the 12th hour that was scrutinized at the highest level in the media, the local government of Philadelphia, along with it’s transpiration department (SEPTA) have learned nothing.
For those of you who do not know, the Pope is coming to visit the city of Philadelphia for a week at the end of September. Already, there are special passes and special schedules for public transit along with road closures planned around center city.
So basically, it’s a big deal.
I was listening to the radio on the way to work the other day, and I heard that the SEPTA website selling special passes on its Regional Rail service for the Pope’s visit crashed when the tickets went on sale. Not only did it crash, but they couldn’t bring the site back up. It kept going down.
Wow. There’s a new one, a website going down when there is incredibly high demand in a relatively short amount of time for a very limited resource (the papal visit tickets). Has anyone every heard of this before? It came out of nowhere!
You think maybe this was the same problem that healthcare.gov experienced? My guess is, most likely.
SEPTA knew this was coming, and yet they did nothing to prepare for it. Sure they claimed they “did performance testing on the site” the weekend before the tickets went on sale, but apparently their expected metrics were way off, or they were drinking their own internal Kool-Aid and instead decided that “the site will do fine, who wants to go to lunch?”
What RPC vs. asynchronous communication has show time and time again is that messaging’s ability to scale when these type of high contention scenarios unfold in a production environment is unmatched by RPC. NServiceBus allow programmers to break the cycle of “RPC’s until you deadlock” with a durable, and eventually consistent approach to high contention and high load scenarios just like this.
As a technologist who uses NServiceBus every day, seeing this type of thing happen over and over again is incredibly frustrating. How do we, as practitioners of SOA and users of NServiceBus “get the word out” to these companies and/or the programmers working there? It seems there is still a major lack of understanding about how to build and support software that has to go under this kind of intense load, and what’s even worse, is that companies that know this problem is coming apparently do nothing about it and deal with it by “crossing their fingers”.
Crossing your fingers seems like a bad approach when it’s directly related to money, but apparently, SEPTA and the City of Philadelphia don’t mind doing it. Heck, it wasn’t a total loss, on Monday (the day the site initially crashed) SEPTA sold a whopping 201 tickets out of 350,000 available for sale. Now that’s e-commerce!
At this point, they probably have an entire armada of programmers going through error logs and database tables trying to figure out who’s credit card was charged and didn’t get a ticket, finding all those “lost orders”… etc… but SEPTA has a bigger problem.
How to bring users back to their website to continue selling tickets for the papal visit?
I’m not saying that NServiceBus is a silver bullet (no technology is a silver bullet), but building software correctly using NServiceBus and asynchronous messaging would most likely have landed all the people that wanted those tickets the day the site opened with actual tickets for the papal visit. Instead, all they got were headaches and canned excuses from SEPTA.
UPDATE – 7/29/2015
More than a week after SEPTA’s site crashed when trying to sell special regional rail tickets for this September’s papal visit to Philadelphia, it looks they’re going to solve their problem by:
- holding a “lottery” for the chance that your ticket purchase will actually result in you getting a regional rail ticket.
- outsourcing everything to TicketLeap.com and Amazon.com, both companies that have PLENTY of experience in the “high demand for something” domain.
But this is what’s really happening… “Valid entries will be put in a database and winners selected at random by Ticketleap”.
So database inserts for people that want tickets, and some sort of algorithm on TicketLeap’s end to pick the people who are actually going to get the tickets. The insert-only approach gets rid of the high contention around the limited resource (we’re recording “intent” to get a ticket, not giving the user an actual ticket), and Amazon.com hosting it should help with the immense amount of traffic, so this is one way to “get ‘er done”. Not too sure if I agree 100% with the approach.
I’m sure no political favors or special interests will be influencing Philadelphia-based TicketLeap.com when going to write this “algorithm” 😉