Last week, Monika from Packt Publishing's marketing team shared Kenneth Hunt's kind video book review for my book with me. First, of course, I want to thank Kenneth for taking the time to do such as detail review and Monika for passing it on, I truly appreciate them. You can watch the entire review here.
Second, right around the 4 min 11 seconds mark, he mentioned about the CAB (Change Advisory / Approval Board) process that I used to and probably still hate:
It is not that I don't think it is important, I understand the value of it. Stakeholders need to understand the business impact and evaluate between the business risk and the benefits of the change, let's be honest if the change has no risk of business impact, there would not be a CAB process, to begin with. Network changes are inherently impactful and the blast radius is potentially big. But as I have stated in the book, I think there are lots of room for improvement in the many places where the CAB is adopted.
One of the things I have learned in life is to focus on the things I can control. In this post, I would like to take a different perspective and focus on what we network engineers can do to improve the CAB process.
Ready? Let's get started.
The difference between junior and senior engineers is the levels of scope they need to consider, this is both technical and none-technical. The none-technical bit could include project management, working in large teams, and understanding the core value of the business. After all, we are making the change to solve a problem, and the problem is something that is stopping us from achieving our business goals. This applies even if you are working for a non-profit or higher education institute, you are still trying to solve problems that is holding the organization back.
The problem we are solving should also be attributed back to KPIs, or Key Performance Indicators, that the business care about. You might be upgrading a switch, but the goal is not to just have the latest software; it is to make sure the security updates are implemented in production. The main business objective is to ensure business continuity and required security conformance. The KPI should be correlated back to less downtime in the long run. This directly speaks to the next objective: Engineer needs to communicating business values.
I am as guilty of this as the next engineer. We all fall in love with the technology, the elegant solution, and the shining new toys. But the stakeholders care about what the technology can do for the business. Can the change shorten the site's order pipeline so the customers are likely to buy more items? Can they aggregate more servers bandwidth so the company can save on the power bills? If we don't do this change, what are the business consequences?
Speaking in business lingos does not come easy or natural for engineers. After all, we pick engineering instead of business school for a reason. But if we try our best to communicate business values, the effort would be evident.
After we communicate the business values, we need to back it up with data. How do we know the site will load faster after we increase the link from 10GB to 100GB? The bottleneck might not be the bandwidth, right? Can we wait for the new software upgrade until after the peak shopping season?
It is in my experience people are not afraid to make the wrong decisions, we are collectively more afraid to make a decision that we are not confident about, and data gives us confidence.
There might not be data being collected that is directly related to the change we are proposing, but we should anticipate them and at least provide some circumstantial data to back up our proposal.
I actually stole this from the SRE (Site-Reliability Engineering) world of things. There are times I would label changes as 'completely safe' or 'no-brainer'. The truth is, nothing is risk-free when you touch production devices. The sooner we can recognize that and be honest to ourselves and others, the faster we can move toward the real discussion about values and trade-offs. Yeah, that switch might not come back after reload, but the trade-off was the switch might go down during business hours when nobody was watching.
We should all stop aiming for perfect, but instead, talk about error budgets, service-level objectives, and service-level agreements (thank you Network SRE!).
I hope these ideas can provide some different perspectives for the CAB process for network engineers. It really comes down to putting ourselves in the stakeholder's shoes and look at things through their lens. Did I miss anything? What do you think we can do for improvement? Leave me a comment and let me know!
Stay safe and healthy!