Summary.
For months the University of San Francisco was consistently, but seemingly randomly experiencing major downtime on their main marketing site. After many frustrating days of troubleshooting, they called on Tandem to isolate and resolve the underlying issue. We did, with a 72 hour turnaround from contract signed to fix deployed. Obvi.
I have one speed. I have one gear: GO
- Charlie Sheen
Turnaround:
72 hrs
BlazeMeter:
75rpm
Profiling:
New Relic
Level 1 Custom Diagnostics
Using a custom debugging module, significant load testing via BlazeMeter and monitoring via NewRelic we were able to analyze a complicated Drupal 7 application with a myriad of modules to identify the bug and provide a quick 72 hour turnaround.
We learned that sites using Pantheon's edge caching, Amazon's CloudFront CDN and the Drupal CDN module's duplicate content protection sometime cache pages in a way that results in an infinite redirect loop, effectively crashing the site.
To provide resolution we switched from CloudFront to Pantheon's own global CDN. As a consequence we were also able to eliminate the offending CDN module.
Less stack complexity + Less dependencies + More stability = Mission Accomplished!