Sunday, 30 September 2018

All i wanted to do is check an error code

I was feeling a little under the weather last week and did not have enough concentration to work on developing a new NetSurf feature as I had planned. Instead I decided to look at a random bug from our worryingly large collection.

This lead me to consider the HTML form submission function at which point it was "can open, worms everywhere". The code in question has a fairly simple job to explain:
  1. A user submits a form (by clicking a button or such) and the Document Object Model (DOM) is used to create a list of information in the web form.
  2. The list is then converted to the appropriate format for sending to the web site server.
  3. An HTTP request is made using the correctly formatted information to the web server.
However the code I was faced with, while generally functional, was impenetrable having accreted over a long time.

screenshot of NetSurf test form
At this point I was forced into a diversion to fix up the core URL library handling of query strings (this is used when the form data is submitted as part of the requested URL) which was necessary to simplify some complicated string handling and make the implementation more compliant with the specification.

My next step was to add some basic error reporting instead of warning the user the system was out of memory for every failure case which was making debugging somewhat challenging. I was beginning to think I had discovered a series of very hairy yaks although at least I was not trying to change a light bulb which can get very complicated.

At this point I ran into the form_successful_controls_dom() function which performs step one of the process. This function had six hundred lines of code, hundreds of conditional branches 26 local variables and five levels of indentation in places. These properties combined resulted in a cyclomatic complexity metric of 252. For reference programmers generally try to keep a single function to no more than a hundred lines of code with as few local variables as possible resulting in a CCM of 20.

I now had a choice:

  • I could abandon investigating the bug, because even if I could find the issue changing such a function without adequate testing is likely to introduce several more.
  • I could refactor the function into multiple simpler pieces.
I slept on this decision and decided to at least try to refactor the code in an attempt to pay back a little of the technical debt in the browser (and maybe let me fix the bug). After several hours of work the refactored source has the desirable properties of:

  • multiple straightforward functions
  • no function much more than a hundred lines long
  • resource lifetime is now obvious and explicit
  • errors are correctly handled and reported

I carefully examined the change in generated code and was pleased to see the compiler output had become more compact. This is an important point that less experienced programmers sometimes miss, if your source code is written such that a compiler can reason about it easily you often get much better results than the compact alternative. However even if the resulting code had been larger the improved source would have been worth it.

After spending over ten hours working on this bug I have not resolved it yet, indeed one might suggest I have not even directly considered it yet! I wanted to use this to explain a little to users who have to wait a long time for their issues to get resolved (in any project not just NetSurf) just how much effort is sometimes involved in a simple bug.