Posted on February 2, 2021 by Dudi Cohen

CNCF projects highlighted in this post

Guest post originally published on Rookout’s blog by Dudi Cohen, VP of R&D at Rookout

I have been managing R&D teams for the past 14 years or so and have learned many lessons along the way. Some of the best lessons have come about in the moment when your software meets the real world and you find that you need to debug remotely. Part of them were learned from my own battle scars and some were taught to me by my peers and employees. One of the most important lessons I learned – and also gave – is how to estimate time, and in accordance with that, how to answer questions about time estimations. Time estimation is difficult because R&D tasks are often very tough to predict. So, let me be more exact. Development time itself is easy to predict and research time is a bit harder, even though it can be properly scoped and limited. But when it comes to finishing with a task, ensuring there are no bugs, and that everything is working perfectly? That’s the hard part. I stopped answering the question “how long will it take to develop this feature?”. Instead I answer, “Do you want to know how much time it will take to develop it? Or do you want to know how long we’ll be fixing bugs related to it?”.

Assessing your problem solving strategy

Always expect the unexpected. Our true faces are usually revealed when we encounter problems. These problems will eventually arrive at the moment you write your first line of code in a new feature. When doing so, understand that somewhere out there in the dark a bug is waiting to surface. No matter how many tests and how much future proofing you’ll code in, those bugs will arrive in the end. So which strategy are you going to use when those bugs attack you? Since their arrival is inevitable, I advise that you devise your strategy in advance. The best way to build this strategy is to understand how efficient it will be, how many risks you can take, and how you can plan your tasks to handle it.

Join me in a journey down the rabbit hole that we call remote debugging as I describe three types of developers and their strategies- all to try and explain what you really shouldn’t do. Every company has one of these personalities. You can escape them when they’re around and maybe you’ve even been one of them. It’s ok, there’s no shame if you are, some of my best friends and I have sinned as one of the following personas.

The Code Starer

‍Your code is the absolute truth of your application. Your application doesn’t have a life of its own. It behaves the way it does because you told it what to do with your code. You gave that application specific instructions and these instructions can’t be interpreted differently. The Code Starer believes that there is one thing holy: your code.

The code doesn’t lie, and if there is a bug, the only relevant thing that can be done to solve it is to look at the code. “Let me just look at the code and I’ll find that bug”. Those are the famous last words of the Code Starer, as he goes down into his cave to stare at the code. And yes, sometimes looking at the code will give you more understanding on what the code is expected to do and what possible failures and sharp corners exist that you can stumble into. However, you or one of your colleagues have written this code and – hopefully – the code was also peer reviewed. So why would going over the code again and again and again give you a better insight?

As it is, the code isn’t the only player in this game. There is also the data being processed by your application, there are the users that make sure to create the weirdest unexpected data, and the production environment can be ever-changing. Staring at the code relentlessly and hoping that the answer will be written there is useless. Most importantly: it is very inefficient.

When trying to solve a problem, the best way to go about it is to find out more information that you didn’t have before. Looking at the code is actually looking at information that hasn’t changed and has been there all the time. If you want to be efficient, don’t waste your time looking at the existing information that will ostensibly give you nothing new. No one wants to be a Code Starer.

Tom character (from Tom and Jerry) put toothpicks by his eyes to prevent his eyes shut

The Reenactor

‍Do you know those guys that spent a week configuring their laptop’s IDE? It’s honestly amazing. You spend a ton of time configuring your environment and then everything that you do on your machine is magic. You’ll have a keyboard shortcut for everything, as well as a real time linter, an autocomplete auto-predictor AI that writes the code for you, and an automated script to order lunch.

Sounds magical, right? The problem is that once you get used to your own customized laptop, you really can’t work on any other mere mortal’s laptop. A similar situation can sometimes happen to The Reenactor when he encounters a bug in the wild. The Reenactor will see and admit that there is a bug in production. However, production is different than on his dev laptop. He can’t SSH into the production machine and load up his VIM configuration, because he isn’t even allowed to SSH. But he has a magical debugger, profiler, real-time code injector, and a rubber duck. The only thing that he now needs to do is to reproduce the bug and make the bug appear on his dev laptop. When that bug appears in The Reenactor’s laptop, everything will be clear as day and the bug will be squashed. And thus, the task of solving a bug becomes a one men’s quest to reproduce the bug.

That quest is now a bunch of unplanned multiple tasks – dumping user data from production and constantly retrying to perform a core dump of the right component in production at the right time. And when all of this fails, The Reenactor might try to engage the customer that encountered the bug with an investigation to collect any sort of information that will allow him to retrace that bug and make it reappear on his own terms. My advice? Don’t waste your time on reenacting. Your time is valuable. You might be able to solve your bug when it appears on your laptop, but making it appear might be wasted time that you could use to develop other features.

Pink bolb in the box saying "My laptop with my customized IDE. Let's fix a bug on another dev's laptop. Only IDE is notepad. Never again"

The Lumberjack

‍The Lumberjack understands that he needs more information. He has learned some lessons that the Code Starer hasn’t yet learned. When he encounters a bug in the wild, he doesn’t try to bring the bug home, but rather knows that a bug in the wild should be handled in the wild. So how does the Lumberjack collect more information about the bug?

The Lumberjack adds logs and removes logs. Placing the logs in the right place will take time, because the fact of the matter is that collecting the right piece of information will take time. Why, you ask? Well, because adding logs is writing code and deploying new code takes time (write, test, review, deploy). Sometimes adding logs might even hurt your application’s performance and this might be a conversation topic that you wouldn’t like to have with your customer.

The Lumberjack will always believe that he is one log line away from solving the bug. And sometimes when the customer is responsible for upgrading or deploying your application, you’ll hear this from the Lumberjack: “All I need is information from that log line, let’s ask the customer to upgrade one last time”. It is easy to understand that your customer’s patience might be lost. The Lumberjack’s life is a risky one with all the going back and forth with redeploying code to get more information. As a Lumberjack, you might lose your customer’s patience pretty fast.

Group of soldiers standing apart, smiling and staring at the lumberjack holding his lady closely

Go Forth and Debug

I might have exaggerated the slightest bit, but sometimes some of these strategies are valid. With this being said, you should understand that having the right tools in advance can save you a lot of time when debugging.
Simply put, don’t be a fanatic and or any of the following archetypes:

Don’t be a Code Starer – Devise a strategy that collects more information and don’t delve into the information you already have. Be efficient and focus on getting more information. Understand that you might not be able to collect it in advance and plan how it will be collected.
Don’t be a Reenactor – Understand that a bug in production or a bug in the real world will require different tools and a different approach. You can’t allow yourself to work endlessly and waste your time on reproducing the bug in your own comfort zone. You can’t bring the battle into your field. You will have to go ahead and face the battle where it happens.
Don’t be a Lumberjack – Don’t take the risk again and again, believing that all you need is just one more piece of information. Understand that every time you redeploy and change your code, you waste time and take a risk.

The most important thing is to expect the unexpected and to build a strategy that enables you and your team to be able to deal with it. The unexpected happens when you click on that “deploy” button, letting your code roam free and meet the real world.

Hyderabad, India

Are you making these 3 debugging mistakes?

Assessing your problem solving strategy

The Code Starer

The Reenactor

The Lumberjack

Go Forth and Debug