Application Performance Management as any kind of management should be a continuous work on your platforms, services and products. Do it frequently and you shall get the benefits. We've reduced 24.772.951 executions of a few queries to a couple of hundreds in a 15min. session of chatting and looking at the graphs given by New Relic APM.
I've been in Bemobi for the Past two years and one of my previous responsibilities included finding and fixing common code mistakes made by junior developers. Since then, a lot has changed, but once in a while we open issues to look for problems like that in platforms and services that we have.
The latest episode was to look at one of our database instance and find the queries that weren't supposed to be there, Bruno Campos got a list from DBA and we started to look into it, ten seconds later we a bizarre 24.772.951 hits of a query that was supposed to be cached.
Yeah, twenty four millions of executions of something that should be cached in the first place.
Looking into New relic dashboard for the platform plagued by the cache problem, we identified the queries (stacked view):
Four entry points, same queries, over and over again. Davi quickly fixed the bad configuration but took two deploys to see them vanished. We can see the last one marked as a vertical line here:
The database throughput went down as well:
So within an afternoon we solved one of the top offenders of our database instance.
So hear me out when say Metrics, metrics everywhere and Log everything all time, because without measurement there is no control. Revisit your applications and services dashboards daily, look for potential pitfalls that are not bugs per se, but not original intended behaviour which may cost you a lot of money and even bigger problems.
Next post I'll cover common issues that we had with Apache HTTPClient and how easy was to spot it on New relic.