I have one speed. I have one gear: GO Charlie Sheen
For months the University of San Francisco was consistently, but seemingly randomly experiencing major downtime on their main marketing site. After many frustrating days of troubleshooting, they called on Tandem to isolate and resolve the underlying issue.
Using a custom debugging module, significant load testing via BlazeMeter and monitoring via NewRelic we were able to analyze a complicated Drupal 7 application with a myriad of modules to identify the bug and provide a quick 72 hour turnaround.
We learned that sites using Varnish edge caching (provided via Pantheon), Amazon's CloudFront CDN and the Drupal CDN module's duplicate content protection sometimes cache pages in a way that results in an infinite redirect loop, effectively crashing the site.
To provide resolution we switched from CloudFront to Pantheon's own global CDN. As a consequence we were also able to eliminate the offending CDN module.
Less stack complexity + Less dependencies + More stability = Mission Accomplished!