Philosophy

A Stampede of Dinosaurs: How are API dependencies still such a problem in 2022?

A little over a decade ago I learned what an API was. In today’s world of programmatic ads, data mining, and AI-powered ad targeting systems such technical awareness is a default expectation, but in 2010 it was unique enough to get me labeled as the “mathy marketer” by the head of brand marketing at the large e-commerce retailer where I was heading up digital growth.

Compliment? Insult? Perhaps a bit of both? It didn’t really matter. Our API engineering was the bread and butter that fed the company’s coffers, and everybody knew it. When we exceeded our Google API quota or the API-dependent shopping cart went down everybody felt it, even the marketers, as we got paged in the middle of the night along with the engineers we depended on to keep our revenue cycle flowing nonstop. The ensuing hours of nail-biting as we watched the revenue targets drain in increments upwards of $10k would be very familiar today to teams across industries who are scrambling to troubleshoot outages triggered by cascading dependencies owned by other teams (or even other companies). And the real question I ask myself now that it’s 2022 is: why?

Why, in 2022 are the issues and problematic systems that we were struggling with in 2010 still keeping engineers up at night? Why are these issues even more prevalent across more industries than they were back then? Back then, we were still in an era when a conversation about SQL at the marketing team meeting was likely to veer towards whether Hollywood should stop making new sequels to Jurassic Park now that even Jeff Goldblum was finding it clichéd. Now, the culture of marketing has swung so far to the technical side that many marketers can debate the merits of SQL v. Hadoop, and yet our sources of potential failure, including our dependencies on third party APIs, are greater than they’ve ever been. Therein, IMHO, lies the problem.

While building an in-house API ad system in 2010 was revolutionary, the reasoning and the rewards were the same then as they are today. APIs and other modern technologies enable a scale of growth simply unachievable without some level of automation, and the companies who can build faster and better have a competitive advantage that is pivotal for success. In the case of digital marketing, billions of keywords can be managed in real-time based on complex rulesets that maximize ROI and revenue in a way that is simply impossible if an individual marketer is sitting at their desk with the Google Ads interface open, poking at some bids before they head out for lunch. APIs enable marketing to function at a level of customization necessary to address the complexity of users and platforms today - a level totally unimagined twenty years ago when ads were often being bought based on Nielsen ratings and blasted to huge audiences with the hope that something might stick. Expectations and OKRs today don’t mesh with the old “throw it at the wall” style. ROI is judged harshly and channels are shut down at the drop of a hat, and so the only way to succeed is to rely on the sophistication that digital marketing technology enables, almost all of which leverages APIs in one way or another.

APIs and other modern technologies enable a scale of growth simply unachievable without some level of automation, and the companies who can build faster and better have a competitive advantage that is pivotal for success.

So, acknowledging why we are putting up with technology that causes major headaches in the middle of the night when something goes wrong, I will return to the greater question of why the lived experience of teams working with these systems hasn’t improved in the twelve years that technology has leapt forward. Simply put: the pace of change as companies and teams move cloud-native has reached escape velocity. The flood of data and snowballing complexity of these technologies is moving faster than marketers or software engineers (or human brains) can catch up. The movement towards stacks and products that are built on daisy-chained API dependencies has left major blindspots that still leave teams across companies biting their nails in the dark when something goes wrong. While engineers are frantically passing the hot potato, searching desperately for the real owner of a cascading fire that has spread so quickly that the source can’t even be easily pinpointed amongst the burning embers, people across departments are waiting helplessly to deal with the fallout. Marketers are rushing to shut down ad spend and limit impact on ROI, sales reps are bracing to explain issues and missed SLAs to angry customers, and CEOs are cracking their knuckles as they poise themselves to explain the failures to whomever is left in the chain of important apologies.

The movement towards stacks and products that are built on daisy-chained API dependencies has left major blindspots that still leave teams across companies biting their nails in the dark when something goes wrong.

That is why I’m so excited about the approach that Operant is taking to solve a problem that, to me, feels like it should already be solved after twelve years of rapid improvements and exceptional innovations in software. Sitting on your hands, waiting for a Slack message from an SRE with an update while you frantically shut down ad spend shouldn’t still be a thing in 2022, and while there are always organizational complexities that compound difficulties, at its core, it is a problem that *can* be solved with technology.

Operant’s data-first approach and cloud-native controls like API rate limits, circuit breakers, and canary deploys prevent major issues before full deployment to production, significantly reducing the chances of an outage (and a midnight panic attack) to begin with. When things do go wrong, the real-time visibility into failure propagation across APIs and services allows engineers to focus on the real source of a problem, even if it is caused by a third party, instead of spinning wheels for hours just trying to figure out which API went down first. One-click troubleshooting controls for common cloud-native issues make the fixes faster, so that escalations across departments are fewer and less catastrophic.

And in today’s world, we could all use fewer catastrophes, right?