Identify & Escalate Risks: A Practical Framework for Software Engineers
Table of Contents
Have you ever worked on a task where you felt that it would be a heavier lift than what others thought? Or, during hands-on work, have you ever followed your gut to double-check an assumption while reading a ticket and uncover an oversight that could derail the project? Or maybe during a planning session, you have heard the phrase “we’ll cross that bridge when we get there” one too many times?
If you answered “yes”, you probably worked on a project that did not identify or address all of its risks. In continuation, I will explain what risks are and how you can identify them in your next task or project.
What is risk? #
Wikipedia’s definition of risk puts it as the possibility of something bad happening. In software engineering, the definition would be similar, but with a twist: the possibility of a product or project delivering a bad user experience due to a technical oversight or limitation.
There are some forms risks can take in the context of the project/task, such as:
- Assumptions: by reading the landing page, the team concludes that a third-party product will provide the entire swath of functionalities the project needs. In reality, the product is still in beta, and over 50% of the promised features won’t ship until next year.
- Blind spots: the project relies on the API v1 staying reliable and well-maintained, while the API team has finished API v2 and has deprecated the old version.
- Misunderstood scope: building a reporting feature that supports only the last three months of business, while the customers need an annual report.
- Misidentified dependencies: the project to deliver this quarter depends on an adjacent team shipping a new endpoint of the API, but they will only start to work on it in the next quarter.
- Degrading UX: because of the team’s penchant to ship fast, the additions to the checkout page cause confusion and dissatisfaction with the customers.
- Security: the project uses an outdated logging library that has a vulnerability that allows arbitrary code execution, like Log4Shell.
An organic approach doesn’t work #
Like every golfer has its signature swing, software engineers develop their own frameworks over the years. They don’t think of them as frameworks; instead, “this is my approach to doing X”. It becomes an inherent part of their skillset, part of their identity. Frequently, it’s indistinguishable from their “gut feeling”.
Most engineers think this is enough. And sometimes, they might be right. But an approach rooted in experience has a significant pitfall: one can’t know what they haven’t experienced. And this applies especially to career greenhorns. As an engineer that has transitioned from a smaller to a larger company a few times in the past decade, it’s astonishing to realize the plethora of effects your software can have.
Did you think about the user-facing documentation and work with a technical writer during the development of the new API endpoint? Are you synced with the marketing folks about the launch and the expected inbound traffic? Have you considered any legal demands from third parties (e.g., the government) and built in the internal levers to be pulled at the legal team’s request?
If one hasn’t experienced a regulated market or an intricate product space, it’s unlikely that such aspects will be part of their organic approach. Moreover, even if one has seen such intricacies at scale, the approach might not be transplantable to their next company. An example: an engineer from a global media company that has undergone compliance scrutiny might end up perplexed by compliance in the fintech space.
At this point, it’s clear that a structured approach that invites all individuals to contribute from their own experiences is essential to identifying (and eliminating) risks early.
Guiding principles #
Let’s set principles that we will guide ourselves to develop a practical framework to identify risks.
Easy to put in action The framework should be easy to implement without too much prep or sign-off. Note that we won’t be çreating a process, rather just a system that should feel flexible enough to use and adapt to a given task or project. But we have to be careful, because a framework that’s too heavy won’t be used, and a framework too light can be brittle.
Scales to larger groups of engineers (with little to no changes) When defining a new approach, we must remember that it should scale at least to a cross-functional team. It should work well for mobile engineers, backenders, and frontenders alike. It should not feel like a drag to use but instead natural.
Improves understanding of deliverables Applying a standardized approach must improve the team’s understanding of what they will deliver. In addition, clarity will allow an even playing field for all involved parties and easier flagging potential vagueness sources.
Produces artifacts The approach should improve the understanding of the involved engineers and produce a knowledge base with documentation, videos, code examples, design documents, etc. There are many benefits of a knowledge base, such as clarity on tradeoffs made or helping with onboarding newcomers.
Proposing a framework is risky: there’s a fine line to making it useful but not process heavy. In addition, it needs the right mixture of structure and utility to be helpful. To provide the best utility, the individual building blocks of the framework should be usable individually and as a whole. So, do not think of these as steps but rather complementary elements.
- Establish a good understanding of the affected components. The team planning and executing the work needs knowledge and component documentation. Build a knowledge base about the various components involved, such as frontend, backend, mobile apps, internal (back office) tooling, services, etc. Make sure the team absorbs the knowledge.
- Create a Solution Diagram. Create a low-to-medium fidelity sketch of the solution. Each subject-matter expert should pitch in and contribute part of the solution diagram. At the same time, the other members can bring up questions/suggestions/ideas to enrich the understanding of the group.
- Connect the dots. Engineers have to be able to connect the dots on all levels: product, technical and operational. Think about how that will affect other functions of the company. Will your choice make it harder to run marketing campaigns? Will it improve the customer support workflows or make them worse? Do you have enough expertise in the team to maintain and further develop the intended solution? ## Make it palpable
Covering the techniques of addressing/eliminating risk is beyond the scope of this essay, but I would like to give you a few tips on how to crystallize and escalate risks when needed.
What matters is context. If there’s a product risk, the product manager or UX designer has to understand the pitfalls of the task. Make it very “plastic”. Build a mockup of why the UX will fall apart or why the UI will become very busy. Or build a quick prototype in Figma explaining how the product flow will break. Or write down a user scenario that will explode in the user’s face. Think about all user segments - perhaps the new feature will work fine for business customers, but it will confuse retail customers.
Technical risk can come in many forms. If it’s a scaling risk, then some capacity planning can help: look at the traffic patterns, the performance footprint of the new features added, and the current resource consumption. Then extrapolate the numbers to the scale you intend to be at in 3, 6, or 12 months, explaining how the scaling math falls apart. Scaling is contextual, but a good rule of thumb for user-facing applications is to have 40% capacity available for unexpected traffic spikes.
If the technical risk is the cost of maintenance, you can think about the number of bugs reported and squashed, user complaints, and the reliability of the product overall. Explain to the stakeholders that if no quality baseline’s met, the reliability will decrease. Commonly, customer frustration is proportionally inverse to product reliability.
The bottom line: make the risk very palpable for stakeholders so they understand what’s at stake.
Clarify, and when needed, escalate #
The urgency with which the team addresses the risks can vary based on the team’s culture and the individuals’ background. In some extreme cases, you might be the only one who sees (or wants to see) the red flags.
We have to assume good intentions, and therefore if others are not seeing the risks, we have to ask ourselves why. When the risks are documented and clarified through examples, it’s easy to make the case. Here are a few tactics for how to communicate the flags.
Offer examples of misaddressed risk Look at examples how the organization has addressed risks before. Was there a case when it didn’t go to plan? Or maybe a case when the company turned a blind eye to a known risk? What were the effects and results of it? I’ll allow myself to speculate here: it probably went pretty bad, and it was escalated to leadership, resulting in a postmortem of sorts.
Organizations usually learn from their past mistakes. A paper trail must exist if a past risk has gone wrong, harming the company’s ability to deliver. Use that as another argument in your palette. Companies that write things down might even have a list of such examples already precompiled – use them to your advantage.
The bottom point is: employ the memory of a past wound to illustrate the potential of the risk in the focal point.
Substantiate Do you know what speaks louder than opinions, gut feeling, or memories? Numbers. Granted, it’s not easy to build a case around measurable impact, but it’s the holy grail of argumentation. So here are a few angles to consider:
- Impact on scalability/costs: with unchanged capacity, how many requests will be dropped with the increased traffic patterns? Or, how much will the AWS bill increase because more containers will have to be deployed?
- Impact on revenue or other relevant product KPIs: how much will the lifetime value of the customer change if the new functionality uses a browser API that is still not widely available?
- Impact on quality: will the error rate of the product increase because the team spent not enough effort on unit testing and quality assurance? Is it possible for more bug reports to trickle in because the project did not go through a bug bash?
- Impact on user trust: If the risk materializes, will it erode the users' confidence in the product? Will it put off potential users if we showcase the unaddressed risk on the sign-up page? If yes, how can we measure it?
Get feedback from allies Share your argumentation with folks that you trust, and solicit their feedback. Do not push your beliefs. On the contrary, ask questions to get a new perspective. During your argument building, you might have established strong opinions, so try holding loose on those opinions. Remember, it’s not about being right or wrong. It’s about doing right by your customers.
Upon getting feedback, you can more objectively judge whether you have many an oversight, or maybe you still have a case. Ensure you share accountability because if the potential red flags materialize, it will also impact your immediate team.
The bottom line #
In my experience, engineers often believe covering all the bases on a project is someone else’s job. Maybe the tech lead, the product manager, or the engineering manager. They fail to realize they (the engineers) themselves are the most competent and most relevant individuals to do that.
Usually, one of the responsibilities of the tech lead is to work on the feasibility of a project and find the risks on the way. But I believe that engineers are well equipped to uncover risks, as they often are solution-oriented. That means their instinct is to think about the “how”, thinking through the development steps, which is the perfect way to uncover vague aspects of the project.
So, as an engineer on a project, if you find something that smells fishy – use one of the techniques covered above and share it with your organization. You might be saving everyone on the project lots of frustration down the line. And very likely loads of money to the company.
Many thanks to David for his feedback and great ideas on how to improve the essay and Vlatko and Bart for reviewing early drafts.