Wednesday, 27 November 2019

Twice and thrice over, as they say, good is it to repeat and review what is good.

Three years ago I wrote about using the AFL fuzzer to find bugs in several NetSurf libraries. I have repeated this exercise a couple of times since then and thought I would summarise what I found with my latest run.

I started by downloading the latest version of AFL (2.52b) and compiling it. This went as smoothly as one could hope for and I experienced no issues although having done this several times before probably helps.

libnsbmp

I started with libnsbmp which is used to render windows bmp and ico files which remains a very popular format for website Favicons. The library was built with AFL instrumentation enabled, some output directories were created for results and a main and four subordinate fuzzer instances started.

vince@workshop:libnsbmp$ LD=afl-gcc CC=afl-gcc AFL_HARDEN=1 make VARIANT=debug test
afl-cc 2.52b by <lcamtuf@google.com>
afl-cc 2.52b by <lcamtuf@google.com>
afl-cc 2.52b by <lcamtuf@google.com>
 COMPILE: src/libnsbmp.c
afl-cc 2.52b by <lcamtuf@google.com>
afl-as 2.52b by <lcamtuf@google.com>
[+] Instrumented 633 locations (64-bit, hardened mode, ratio 100%).
      AR: build-x86_64-linux-gnu-x86_64-linux-gnu-debug-lib-static/libnsbmp.a
 COMPILE: test/decode_bmp.c
afl-cc 2.52b by <lcamtuf@google.com>
afl-as 2.52b by <lcamtuf@google.com>
[+] Instrumented 57 locations (64-bit, hardened mode, ratio 100%).
    LINK: build-x86_64-linux-gnu-x86_64-linux-gnu-debug-lib-static/test_decode_bmp
afl-cc 2.52b by <lcamtuf@google.com>
 COMPILE: test/decode_ico.c
afl-cc 2.52b by <lcamtuf@google.com>
afl-as 2.52b by <lcamtuf@google.com>
[+] Instrumented 71 locations (64-bit, hardened mode, ratio 100%).
    LINK: build-x86_64-linux-gnu-x86_64-linux-gnu-debug-lib-static/test_decode_ico
afl-cc 2.52b by <lcamtuf@google.com>
Test bitmap decode
Tests:1053 Pass:1053 Error:0
Test icon decode
Tests:609 Pass:609 Error:0
    TEST: Testing complete
vince@workshop:libnsbmp$ mkdir findings_dir graph_output_dir
vince@workshop:libnsbmp$ afl-fuzz -i test/ns-afl-bmp/ -o findings_dir/ -S f02 ./build-x86_64-linux-gnu-x86_64-linux-gnu-debug-lib-static/test_decode_bmp @@ /dev/null > findings_dir/f02.log >&1 &
vince@workshop:libnsbmp$ afl-fuzz -i test/ns-afl-bmp/ -o findings_dir/ -S f03 ./build-x86_64-linux-gnu-x86_64-linux-gnu-debug-lib-static/test_decode_bmp @@ /dev/null > findings_dir/f03.log >&1 &
vince@workshop:libnsbmp$ afl-fuzz -i test/ns-afl-bmp/ -o findings_dir/ -S f04 ./build-x86_64-linux-gnu-x86_64-linux-gnu-debug-lib-static/test_decode_bmp @@ /dev/null > findings_dir/f04.log >&1 &
vince@workshop:libnsbmp$ afl-fuzz -i test/ns-afl-bmp/ -o findings_dir/ -S f05 ./build-x86_64-linux-gnu-x86_64-linux-gnu-debug-lib-static/test_decode_bmp @@ /dev/null > findings_dir/f05.log >&1 &
vince@workshop:libnsbmp$ afl-fuzz -i test/ns-afl-bmp/ -o findings_dir/ -M f01 ./build-x86_64-linux-gnu-x86_64-linux-gnu-debug-lib-static/test_decode_bmp @@ /dev/null

The number of subordinate fuzzer instances was selected to allow the system in question (AMD 2600X) to keep all the cores in use with a clock of 4GHz which gave the highest number of
AFL master instance after six days
executions per second. This might be improved with better cooling but I have not investigated this.

After five days and six hours the "cycle count" field on the master instance had changed to green which the AFL documentation suggests means the fuzzer is unlikely to discover anything new so the run was stopped.

Just before stopping the afl-whatsup tool was used to examine the state of all the running instances.

vince@workshop:libnsbmp$ afl-whatsup -s ./findings_dir/
status check tool for afl-fuzz by <lcamtuf@google.com>

Summary stats
=============

       Fuzzers alive : 5
      Total run time : 26 days, 5 hours
         Total execs : 2873 million
    Cumulative speed : 6335 execs/sec
       Pending paths : 0 faves, 0 total
  Pending per fuzzer : 0 faves, 0 total (on average)
       Crashes found : 0 locally unique

Just for completeness there is also the graph of how the fuzzer performed over the run.

AFL fuzzer performance over libnsbmp run

There were no crashes at all (and none have been detected through fuzzing since the original run) and the 78 reported hangs were checked and all actually decode in a reasonable time. It seems the fuzzer "hang" detection default is simply a little aggressive for larger images.

libnsgif

I went through a similar setup with libnsgif which is used to render the GIF image format. The run was performed on a similar system running for five days and eighteen hours. The outcome was similar to libnsbmp with no hangs or crashes.


vince@workshop:libnsgif$ afl-whatsup -s ./findings_dir/
status check tool for afl-fuzz by <lcamtuf@google.com>

Summary stats
=============

       Fuzzers alive : 5
      Total run time : 28 days, 20 hours
         Total execs : 7710 million
    Cumulative speed : 15474 execs/sec
       Pending paths : 0 faves, 0 total
  Pending per fuzzer : 0 faves, 0 total (on average)
       Crashes found : 0 locally unique

libsvgtiny

AFL fuzzer results for libsvgtiny
I then ran the fuzzer on the SVG render library using a dictionary to help the fuzzer cope with a sparse textural input format. The run was allowed to continue for almost fourteen days with no crashes or hangs detected.

In an ideal situation this run would have been allowed to continue but the system running it required a restart for maintenance.

Conclusion

The aphorism "absence of proof is not proof of absence" seems to apply to these results. While the new fuzzing runs revealed no additional failures it does not mean there are no defects in the code to find. All I can really say is that the AFL tool was unable to find any failures within the time available.

Additionally the AFL test corpus produced did not significantly change the code coverage metrics so the existing set was retained.

Will I spend the time again in future to re-run these tests? perhaps, but I think more would be gained from enabling the fuzzing of the other NetSurf libraries and picking the low hanging fruit from there than expending thousands of hours preforming these runs again.

Thursday, 11 July 2019

We can make it better than it was. Better...stronger...faster.

It is not a novel observation that computers have become so powerful that a reasonably recent system has a relatively long life before obsolescence. This is in stark contrast to the period between the nineties and the teens where it was not uncommon for users with even moderate needs from their computers to upgrade every few years.

This upgrade cycle was mainly driven by huge advances in processing power, memory capacity and ballooning data storage capability. Of course the software engineers used up more and more of the available resources and with each new release ensured users needed to update to have a reasonable experience.

And then sometime in the early teens this cycle slowed almost as quickly as it had begun as systems had become "good enough". I experienced this at a time I was relocating for a new job and had moved most of my computer use to my laptop which was just as powerful as my desktop but was far more flexible.

As a software engineer I used to have a pretty good computer for myself but I was never prepared to spend the money on "top of the range" equipment because it would always be obsolete and generally I had access to much more powerful servers if I needed more resources for a specific task.

To illustrate, the system specification of my desktop PC at the opening of the millennium was:
  • Single core Pentium 3 running at 500Mhz
  • Socket 370 motherboard with 100 Mhz Front Side Bus
  • 128 Megabytes of memory
  • A 25 Gigabyte Deskstar hard drive
  • 150 Mhz TNT 2 graphics card
  • 10 Megabit network card
  • Unbranded 150W PSU
But by 2013 the specification had become:
    2013 PC build still using an awesome beige case from 1999
  • Quad core i5-3330S Processor running at 2700Mhz
  • FCLGA1155 motherboard running memory at 1333 Mhz
  • 8 Gigabytes of memory
  • Terabyte HGST hard drive
  • 1,050 Mhz Integrated graphics
  • Integrated Intel Gigabit network
  • OCZ 500W 80+ PSU
The performance change between these systems was more than tenfold in fourteen years with an upgrade roughly once every couple of years.

I recently started using that system again in my home office mainly for Computer Aided Design (CAD), Computer Aided Manufacture (CAM) and Electronic Design Automation (EDA). The one addition was to add a widescreen monitor as there was not enough physical space for my usual dual display setup.

To my surprise I increasingly returned to this setup for programming tasks. Firstly because being at my desk acts as an indicator to family members I am concentrating where the laptop was no longer had that effect. Secondly I really like the ultra wide display for coding it has become my preferred display and I had been saving for a UWQHD

Alas last month the system started freezing, sometimes it would be stable for several days and then without warning the mouse pointer would stop, my music would cease and a power cycle was required. I tried several things to rectify the situation: replacing the thermal compound, the CPU cooler and trying different memory, all to no avail.

As fixing the system cheaply appeared unlikely I began looking for a replacement and was immediately troubled by the size of the task. Somewhere in the last six years while I was not paying attention the world had moved on, after a great deal of research I managed to come to an answer.

AMD have recently staged something of a comeback with their Ryzen processors after almost a decade of very poor offerings when compared to Intel. The value for money when considering the processor and motherboard combination is currently very much weighted towards AMD.

My timing also seems fortuitous as the new Ryzen 2 processors have just been announced which has resulted in the current generation being available at a substantial discount. I was also encouraged to see that the new processors use the same AM4 socket and are supported by the current motherboards allowing for future upgrades if required.

I Purchased a complete new system for under five hundred pounds, comprising:
    New PC assembled and wired up
  • Hex core Ryzen 5 2600X Processor 3600Mhz
  • MSI B450 TOMAHAWK AMD Socket AM4 Motherboard
  • 32 Gigabytes of PC3200 DDR4 memory
  • Aero Cool Project 7 P650 80+ platinum 650W Modular PSU
  • Integrated RTL Gigabit networking
  • Lite-On iHAS124 DVD Writer Optical Drive
  • Corsair CC-9011077-WW Carbide Series 100R Silent Mid-Tower ATX Computer Case
to which I added some recycled parts:
  • 250 Gigabyte SSD from laptop upgrade
  • GeForce GT 640 from a friend
I installed a fresh copy of Debian and all my CAD/CAM applications and have been using the system for a couple of weeks with no obvious issues.

An example of the performance difference is compiling NetSurf from a clean with empty ccache used to take 36 seconds and now takes 16 which is a nice improvement, however a clean build with the results cached has gone from 6 seconds to 3 which is far less noticeable and during development a normal edit, build, debug cycle affecting only of a small number of files has gone from 400 milliseconds to 200 which simply feels instant in both cases.

My conclusion is that the new system is completely stable but that I have gained very little in common usage. Objectively the system is over twice as fast as its predecessor but aside from compiling large programs or rendering huge CAD drawings this performance is not utilised. Given this I anticipate this system will remain unchanged until it starts failing and I can only hope that will be at least another six years away.

Tuesday, 19 February 2019

A very productive weekend

I just hosted a NetSurf Developer weekend which is an opportunity for us to meet up and make use of all the benefits of working together. We find the ability to plan work and discuss solutions without loosing the nuances of body language generally results in better outcomes for the project.

NetSurf Development build
Due to other commitments on our time the group has not been able to do more than basic maintenance activities in the last year which has resulted in the developer events becoming a time to catch up on maintenance rather than making progress on features.

Because of this the July and November events last year did not feel terribly productive, there were discussions about what we should be doing and bugs considered but a distinct lack of commuted code.

As can be seen from our notes this time was a refreshing change. We managed to complete a good number of tasks and actually add some features while still having discussions, addressing bugs and socialising.

We opened on the Friday evening by creating a list of topics to look at over the following days and updating the wiki notes. We also reviewed the cross compiler toolchains which had been updated to include the most recent releases for things like openssl, curl etc.

As part of this review we confirmed the decision to remove the Atari platform from active support as its toolchain builds have remained broken for over two years with no sign of any maintainer coming forward.

While it is a little sad to see a platform be removed it has presented a burden on our strained resources by requiring us to maintain a CI worker with a very old OS using tooling that can no longer be replicated. The tooling issue means a developer cannot test changes locally before committing so testing changes that affected all frontends was difficult.

Saturday saw us clear all the topics from our list which included:
  • Fixing a bug preventing compiling our reference counted string handling library.
  • Finishing the sanitizer work started the previous July
  • Fixing several bugs in the Framebuffer frontend installation.
  • Making the Framebuffer UI use the configured language for resources.
The main achievement of the day however was implementing automated system testing of the browser. This was a project started by Daniel some eight years ago but worked on by all of us so seeing it completed was a positive boost for the whole group.

The implementation consisted of a frontend named monkey. This frontend to the browser takes textural commands to perform operations (i.e. open a window or navigate to a url) and generates results in a structured text format. Monkey is driven by a python program named monkeyfarmer which runs a test plan ensuring the results are as expected.

This allows us to run a complete browsing session in an automated way, previously someone would have to manually build the browser and check the tests by hand. This manual process was tedious and was rarely completed across our entire test corpus generally concentrating on just those areas that had been changed such as javascript output.

We have combined the monkey tools and our test corpus into a CI job which runs the tests on every commit giving us assurance that the browser as a whole continues to operate correctly without regression. Now we just have the task of creating suitable plans for the remaining tests. Though I remain hazy as to why, we became inordinately amused by the naming scheme for the tools.

Google webp library gallery rendered in NetSurfWe rounded the Saturday off by going out for a very pleasant meal with some mutual friends. Sunday started by adding a bunch of additional topics to consider and we made good progress addressing these. 

We performed a bug triage and managed to close several issues and commit to fixing a few more. We even managed to create a statement of work of things we would like to get done before the next meetup.

My main achievement on the Sunday was to add WEBP image support. This uses the Google libwebp library to do all the heavy lifting and adding a new image content handler to NetSurf is pretty straightforward.

Sunday, 30 September 2018

All i wanted to do is check an error code

I was feeling a little under the weather last week and did not have enough concentration to work on developing a new NetSurf feature as I had planned. Instead I decided to look at a random bug from our worryingly large collection.

This lead me to consider the HTML form submission function at which point it was "can open, worms everywhere". The code in question has a fairly simple job to explain:
  1. A user submits a form (by clicking a button or such) and the Document Object Model (DOM) is used to create a list of information in the web form.
  2. The list is then converted to the appropriate format for sending to the web site server.
  3. An HTTP request is made using the correctly formatted information to the web server.
However the code I was faced with, while generally functional, was impenetrable having accreted over a long time.

screenshot of NetSurf test form
At this point I was forced into a diversion to fix up the core URL library handling of query strings (this is used when the form data is submitted as part of the requested URL) which was necessary to simplify some complicated string handling and make the implementation more compliant with the specification.

My next step was to add some basic error reporting instead of warning the user the system was out of memory for every failure case which was making debugging somewhat challenging. I was beginning to think I had discovered a series of very hairy yaks although at least I was not trying to change a light bulb which can get very complicated.

At this point I ran into the form_successful_controls_dom() function which performs step one of the process. This function had six hundred lines of code, hundreds of conditional branches 26 local variables and five levels of indentation in places. These properties combined resulted in a cyclomatic complexity metric of 252. For reference programmers generally try to keep a single function to no more than a hundred lines of code with as few local variables as possible resulting in a CCM of 20.

I now had a choice:

  • I could abandon investigating the bug, because even if I could find the issue changing such a function without adequate testing is likely to introduce several more.
  • I could refactor the function into multiple simpler pieces.
I slept on this decision and decided to at least try to refactor the code in an attempt to pay back a little of the technical debt in the browser (and maybe let me fix the bug). After several hours of work the refactored source has the desirable properties of:

  • multiple straightforward functions
  • no function much more than a hundred lines long
  • resource lifetime is now obvious and explicit
  • errors are correctly handled and reported

I carefully examined the change in generated code and was pleased to see the compiler output had become more compact. This is an important point that less experienced programmers sometimes miss, if your source code is written such that a compiler can reason about it easily you often get much better results than the compact alternative. However even if the resulting code had been larger the improved source would have been worth it.

After spending over ten hours working on this bug I have not resolved it yet, indeed one might suggest I have not even directly considered it yet! I wanted to use this to explain a little to users who have to wait a long time for their issues to get resolved (in any project not just NetSurf) just how much effort is sometimes involved in a simple bug.

Tuesday, 7 August 2018

The brain is a wonderful organ; it starts working the moment you get up in the morning and does not stop until you get into the office.

I fear that I may have worked in a similar office environment to Robert Frost. Certainly his description is familiar to those of us who have been subjected to modern "open plan" offices. Such settings may work for some types of job but for myself, as a programmer, it has a huge negative effect.

My old basement officeWhen I decided to move on from my previous job my new position allowed me to work remotely. I have worked from home before so knew what to expect. My experience led me to believe the main aspects to address when home working were:
Isolation
This is difficult to mitigate but frequent face to face meetings and video calls with colleagues can address this providing you are aware that some managers have a terrible habit of "out of sight, out of mind" management
Motivation
You are on your own a lot of the time which means you must motivate yourself to work. Mainly this is achieved through a routine. I get dressed properly, start work the same time every day and ensure I take breaks at regular times.
Work life balance
This is often more of a problem than you might expect and not in the way most managers assume. A good motivated software engineer can have a terrible habit of suddenly discovering it is long past when they should have finished work. It is important to be strict with yourself and finish at a set time.
Distractions
In my previous office testers, managers, production and support staff were all mixed in with the developers resulting in a lot of distractions however when you are at home there are also a great number of possible distractions. It can be difficult to avoid friends and family assuming you are available during working hours to run errands. I find I need to carefully budget time to such tasks and take it out of my working time like i was actually in an office.
Environment
My previous office had "tired" furniture and decoration in an open plan which often had a negative impact on my productivity. When working from home I find it beneficial to partition my working space from the rest of my life and ensure family know that when I am in that space I am unavailable. You inevitably end up spending a great deal of time in this workspace and it can have a surprisingly large effect on your productivity.
Being confident I was aware of what I was letting myself into I knew I required a suitable place to work. In our previous home the only space available for my office was a four by ten foot cellar room with artificial lighting. Despite its size I was generally productive there as there were few distractions and the door let me "leave work" at the end of the day.

Garden office was assembled June 2017
This time my resources to create the space are larger and I wanted a place I would be comfortable to spend a lot of time in. Initially I considered using the spare bedroom which my wife was already using as a study. This was quickly discounted as it would be difficult to maintain the necessary separation of work and home.

Instead we decided to replace the garden shed with a garden office. The contractor ensured the structure selected met all the local planning requirements while remaining within our budget. The actual construction was surprisingly rapid. The previous structure was removed and a concrete slab base was placed in a few hours on one day and the timber building erected in an afternoon the next.

Completed office in August 2018
The building arrived separated into large sections on a truck which the workmen assembled rapidly. They then installed wall insulation, glazing and roof coverings. I had chosen to have the interior finished in a hardwood plywood being hard wearing and easy to apply finish as required.

Work desk in July 2017
Although the structure could have been painted at the factory Melodie and I applied this ourselves to keep the project in budget. I laid a laminate floor suitable for high moisture areas (the UK is not generally known as a dry country) and Steve McIntyre and Andy Simpkins assisted me with various additional tasks to turn it into a usable space.

To begin with I filled the space with furniture I already had, for example the desk was my old IKEA Jerker which I have had for over twenty years.

Work desk in August 2018
Since then I have changed the layout a couple of times but have finally returned to having my work desk in the corner looking out over the garden. I replaced the Jerka with a new IKEA Skarsta standing desk, PEXIP bought me a nice work laptop and I acquired a nice print from Lesley Mitchell but overall little has changed in my professional work area in the last year and I have a comfortable environment.

Cluttered personal work area
In addition the building is large enough that there is space for my electronics bench. The bench itself was given to me by Andy. I purchased some inexpensive kitchen cabinets and worktop (white is cheapest) to obtain a little more bench space and storage. Unfortunately all those flat surfaces seem to accumulate stuff at an alarming rate and it looks like I need a clear out again.

In conclusion I have a great work area which was created at a reasonable cost.

There are a couple of minor things I would do differently next time:
  • Position the building better with respect to the boundary fence. I allowed too much distance on one side of the structure which has resulted in an unnecessary two foot wide strip of unusable space.
  • Ensure the door was made from better materials. The first winter in the space showed that the door was a poor fit as it was not constructed to the same standard as the rest of the building.
  • The door should have been positioned on the end wall instead of the front. Use of the building showed moving the door would make the internal space more flexible.
  • Planned the layout more effectively ahead of time, ensuring I knew where services (electricity) would enter and where outlets would be placed.
  • Ensure I have an electrician on site for the first fix so electrical cables could be run inside the walls instead of surface trunking.
  • Budget for air conditioning as so far the building has needed heating in winter and cooling in summer.
In essence my main observation is better planning of the details matters. If i had been more aware of this a year ago perhaps I would not not be budgeting to replace the door and fit air conditioning now.

Wednesday, 1 August 2018

Irony is the hygiene of the mind

While Elizabeth Bibesco might well be right about the mind software cleanliness requires a different approach.

Previously I have written about code smells which give a programmer hints where to clean up source code. A different technique, which has recently become readily available, is using tool-chain based instrumentation to perform run time analysis.

At a recent NetSurf developer weekend Michael Drake mentioned a talk he had seen at the Guadec conference which reference the use of sanitizers for improving the security and correctness of programs.

Santizers differ from other code quality metrics such as compiler warnings and static analysis in that they detect issues when the program is executed rather than on the source code. There are currently two  commonly used instrumentation types:
address sanitizer
This instrumentation detects several common errors when using memory such as "use after free"
undefined behaviour sanitizer
This instruments computations where the language standard has behaviour which is not clearly specified. For example left shifts of negative values (ISO 9899:2011 6.5.7 Bit-wise shift operators)
As these are runtime checks it is necessary to actually execute the instrumented code. Fortunately most of the NetSurf components have good unit test coverage so Daniel Silverstone used this to add a build target which runs the tests with the sanitizer options.

The previous investigation of this technology had been unproductive because of the immaturity of support in our CI infrastructure. This time the tool chain could be updated to be sufficiently robust to implement the technique.

Jobs were then added to the CI system to build this new target for each component in a similar way to how the existing coverage reports are generated. This resulted in failed jobs for almost every component which we proceeded to correct.

An example of how most issues were addressed is provided by Daniel fixing the bitmap library. Most of the fixes ensured correct type promotion in bit manipulation, however the address sanitizer did find a real out of bounds access when a malformed BMP header is processed. This is despite this library being run with a fuzzer and electric fence for many thousands of CPU hours previously.

Although we did find a small number of real issues the majority of the fixes were to tests which failed to correctly clean up the resources they used. This seems to parallel what I observed with the other run time testing, like AFL and Valgrind, in that often the test environment has the largest impact on detected issues to begin with.

In conclusion it appears that an instrumented build combined with our existing unit tests gives another tool to help us improve our code quality. Given the very low amount of engineering time the NetSurf project has available automated checks like these are a good way to help us avoid introducing issues.

Friday, 1 June 2018

You can't make a silk purse from a sow's ear

Pile of network switches
I needed a small Ethernet network switch in my office so went to my pile of devices and selected an old Dell PowerConnect 2724 from the stack. This seemed the best candidate as the others were intended for data centre use and known to be very noisy.

I installed it into place and immediately ran into a problem, the switch was not quiet enough, in fact I could not concentrate at all with it turned on.

Graph of quiet office sound pressure
Believing I could not fix what I could not measure I decided to download an app for my phone that measured raw sound pressure. This would allow me to empirically examine what effects any changes to the switch made.

The app is not calibrated so can only be used to examine relative changes so a reference level is required. I took a reading in the office with the switch turned off but all other equipment operating to obtain a baseline measurement.

All measurements were made with the switch and phone in the same positions about a meter apart. The resulting yellow curves are the average for a thirty second sample period with the peak values in red.

The peak between 50Hz and 500Hz initially surprised me but after researching how a human perceives sound it appears we must apply the equal loudness curve to correct the measurement.

Graph of office sound pressure with switch turned onWith this in mind we can concentrate on the data between 200Hz and 6000Hz as the part of the frequency spectrum with the most impact. So in the reference sample we can see that the audio pressure is around the -105dB level.

I turned the switch on and performed a second measurement which showed a level around the -75dB level with peaks at the -50dB level. This is a difference of some 30dB, if we assume our reference is a "calm room" at 25dB(SPL) then the switch is causing the ambient noise level to similar to a "normal conversation" at 55dB(SPL).

Something had to be done if I were to keep using this device so I opened the switch to examine the possible sources of noise.

Dell PowerConnect 2724 with replacement Noctua fan
There was a single 40x40x20mm 5v high capacity sunon brand fan in the rear of the unit. I unplugged the fan and the noise level immediately returned to ambient indicating that all the noise was being produced by this single device, unfortunately the switch soon overheated without the cooling fan operating.

I thought the fan might be defective so purchased a high quality "quiet" NF-A4x20 replacement from Noctua. The fan has rubber mounting fixings to further reduce noise and I was hopeful this would solve the issue.

Graph of office sound pressure with modified switch turned on
The initial results were promising with noise above 2000Hz largely being eliminated. However the way the switch enclosure was designed caused airflow to make sound which produce a level around 40dB(SPL) between 200Hz and 2000Hz.

I had the switch in service for several weeks in this configuration eventually the device proved impractical on several points:

  • The management interface was dreadful to use.
  • The network performance was not very good especially in trunk mode.
  • The lower frequency noise became a distraction for me in an otherwise quiet office.

In the end I purchased an 8 port zyxel switch which is passively cooled and otherwise silent in operation and has none of the other drawbacks.

From this experience I have learned some things:

  • Higher frequency noise (2000Hz and above) is much more difficult to ignore than other types of noise.
  • As I have become older my tolerance for equipment noise has decreased and it actively affects my concentration levels.
  • Some equipment has a design which means its audio performance cannot be improved sufficiently.
  • Measuring and interpreting noise sources is quite difficult.

Saturday, 18 March 2017

A rose by any other name would smell as sweet

Often I end up dealing with code that works but might not be of the highest quality. While quality is subjective I like to use the idea of "code smell" to convey what I mean, these are a list of indicators that, in total, help to identify code that might benefit from some improvement.

Such smells may include:
  • Complex code lacking comments on intended operation
  • Code lacking API documentation comments especially for interfaces used outside the local module
  • Not following style guide
  • Inconsistent style
  • Inconsistent indentation
  • Poorly structured code
  • Overly long functions
  • Excessive use of pre-processor
  • Many nested loops and control flow clauses
  • Excessive numbers of parameters
I am most certainly not alone in using this approach and Fowler et al have covered this subject in the literature much better than I can here. One point I will raise though is some programmers dismiss code that exhibits these traits as "legacy" and immediately suggest a fresh implementation. There are varying opinions on when a rewrite is the appropriate solution from never to always but in my experience making the old working code smell nice is almost always less effort and risk than a re-write.

Tests

When I come across smelly code, and I decide it is worthwhile improving it, I often discover the biggest smell is lack of test coverage. Now do remember this is just one code smell and on its own might not be indicative, my experience is smelly code seldom has effective test coverage while fresh code often does.

Test coverage is generally understood to be the percentage of source code lines and decision paths used when instrumented code is exercised by a set of tests. Like many metrics developer tools produce, "coverage percentage" is often misused by managers as a proxy for code quality. Both Fowler and Marick have written about this but sufficient to say that for a developer test coverage is a useful tool but should not be misapplied.

Although refactoring without tests is possible the chances for unintended consequences are proportionally higher. I often approach such a refactor by enumerating all the callers and constructing a description of the used interface beforehand and check that that interface is not broken by the refactor. At which point is is probably worth writing a unit test to automate the checks.

Because of this I have changed my approach to such refactoring to start by ensuring there is at least basic API code coverage. This may not yield the fashionable 85% coverage target but is useful and may be extended later if desired.

It is widely known and equally widely ignored that for maximum effectiveness unit tests must be run frequently and developers take action to rectify failures promptly. A test that is not being run or acted upon is a waste of resources both to implement and maintain which might be better spent elsewhere.

For projects I contribute to frequently I try to ensure that the CI system is running the coverage target, and hence the unit tests, which automatically ensures any test breaking changes will be highlighted promptly. I believe the slight extra overhead of executing the instrumented tests is repaid by having the coverage metrics available to the developers to aid in spotting areas with inadequate tests.

Example

A short example will help illustrate my point. When a web browser receives an object over HTTP the server can supply a MIME type in a content-type header that helps the browser interpret the resource. However this meta-data is often problematic (sorry that should read "a misleading lie") so the actual content must be examined to get a better answer for the user. This is known as mime sniffing and of course there is a living specification.

The source code that provides this API (Linked to it rather than included for brevity) has a few smells:
  • Very few comments of any type
  • The API are not all well documented in its header
  • A lot of global context
  • Local static strings which should be in the global string table
  • Pre-processor use
  • Several long functions
  • Exposed API has many parameters
  • Exposed API uses complex objects
  • The git log shows the code has not been significantly updated since its implementation in 2011 but the spec has.
  • No test coverage
While some of these are obvious the non-use of the global string table and the API complexity needed detailed knowledge of the codebase, just to highlight how subjective the sniff test can be. There is also one huge air freshener in all of this which definitely comes from experience and that is the modules author. Their name at the top of this would ordinarily be cause for me to move on, but I needed an example!

First thing to check is the API use

$ git grep -i -e mimesniff_compute_effective_type --or -e mimesniff_init --or -e mimesniff_fini
content/hlcache.c:              error = mimesniff_compute_effective_type(handle, NULL, 0,
content/hlcache.c:              error = mimesniff_compute_effective_type(handle,
content/hlcache.c:              error = mimesniff_compute_effective_type(handle,
content/mimesniff.c:nserror mimesniff_init(void)
content/mimesniff.c:void mimesniff_fini(void)
content/mimesniff.c:nserror mimesniff_compute_effective_type(llcache_handle *handle,
content/mimesniff.h:nserror mimesniff_compute_effective_type(struct llcache_handle *handle,
content/mimesniff.h:nserror mimesniff_init(void);
content/mimesniff.h:void mimesniff_fini(void);
desktop/netsurf.c:      ret = mimesniff_init();
desktop/netsurf.c:      mimesniff_fini();

This immediately shows me that this API is used in only a very small area, this is often not the case but the general approach still applies.

After a little investigation the usage is effectively that the mimesniff_init API must be called before the mimesniff_compute_effective_type API and the mimesniff_fini releases the initialised resources.

A simple test case was added to cover the API, this exercised the behaviour both when the init was called before the computation and not. Also some simple tests for a limited number of well behaved inputs.

By changing to using the global string table the initialisation and finalisation API can be removed altogether along with a large amount of global context and pre-processor macros. This single change removes a lot of smell from the module and raises test coverage both because the global string table already has good coverage and because there are now many fewer lines and conditionals to check in the mimesniff module.

I stopped the refactor at this point but were this more than an example I probably would have:
  • made the compute_effective_type interface simpler with fewer, simpler parameters
  • ensured a solid set of test inputs
  • examined using a fuzzer to get a better test corpus.
  • added documentation comments
  • updated the implementation to 2017 specification.

Conclusion

The approach examined here reduce the smell of code in an incremental, testable way to improve the codebase going forward. This is mainly necessary on larger complex codebases where technical debt and bit-rot are real issues that can quickly overwhelm a codebase if not kept in check.

This technique is subjective but helps a programmer to quantify and examine a piece of code in a structured fashion. However it is only a tool and should not be over applied nor used as a metric to proxy for code quality.

Monday, 13 February 2017

The minority yields to the majority!

Deng Xiaoping (who succeeded Mao) expounded this view and obviously did not depend on a minority to succeed. In open source software projects we often find ourselves implementing features of interest to a minority of users to keep our software relevant to a larger audience.

As previously mentioned I contribute to the NetSurf project and the browser natively supports numerous toolkits for numerous platforms. This produces many challenges in development to obtain the benefits of a more diverse user base. As part of the recent NetSurf developer weekend we took the opportunity to review all the frontends to make a decision on their future sustainability.

Each of the nine frontend toolkits were reviewed in turn and the results of that discussion published. This task was greatly eased because we we able to hold the discussion face to face, over time I have come to the conclusion some tasks in open source projects greatly benefit from this form of interaction.

Netsurf running on windows showing this blog post
Coding and day to day discussions around it can be easily accommodated va IRC and email. Decisions affecting a large area of code are much easier with the subtleties of direct interpersonal communication. An example of this is our decision to abandon the cocoa frontend (toolkit used on Mac OS X) against that to keep the windows frontend.

The cocoa frontend was implemented by Sven Weidauer in 2011, unfortunately Sven did not continue contributing to this frontend afterwards and it has become the responsibility of the core team to maintain. Because NetSuf has a comprehensive CI system that compiles the master branch on every commit any changes that negatively affected the cocoa frontend were immediately obvious.

Thus issues with the compilation were fixed promptly but because these fixes were only ever compile tested and at some point the Mac OS X build environments changed resulting in an application that crashes when used. Despite repeatedly asking for assistance to fix the cocoa frontend over the last eighteen months no one had come forward.

And when the topic was discussed amongst the developers it quickly became apparent that no one had any objections to removing the cocoa support. In contrast the windows frontend, which despite having many similar issues to cocoa, we decided to keep. These were almost immediate consensus on the decision, despite each individual prior to the discussion not advocating any position.

This was a single example but it highlights the benefits of a disparate development team having a physical meeting from time to time. However this was not the main point I wanted to discuss, this incident highlights that supporting a feature only useful to a minority of users can have a disproportionate cost.

The cost of a feature for an open source project is usually a collection of several factors:
Developer time
Arguably the greatest resource of a project is the time its developers can devote to it. Unless it is a very large, well supported project like the Kernel or libreoffice almost all developer time is voluntary.
Developer focus
Any given developer is likely to work on an area of code that interests them in preference to one that does not. This means if a developer must do work which does not interest them they may loose focus and not work on the project at all.
Developer skillset
A given developer may not have the skillset necessary to work on a feature, this is especially acute when considering minority platforms which often have very, very few skilled developers available.
Developer access
It should be obvious that software that only requires commodity hardware and software to develop is much cheaper than that which requires special hardware and software. To use our earlier example the cocoa frontend required an apple computer running MAC OS X to compile and test, this resource was very limited and the project only had access to two such systems via remote desktop. These systems also had to serve as CI builders and required physical system administration as they could not be virtualized.
Support
Once a project releases useful software it generally gains users outside of the developers. Supporting users consumes developer time and generally causes them to focus on things other than code that interests them.

While most developers have enough pride in what they produce to fix bugs, users must always remember that the main freedom they get from OSS is they recived the code and can change it themselves, there is no requirement for a developer to do anything for them.
Resources
A project requires a website, code repository, wiki, CI systems etc. which must all be paid for. Netsurf for example is fortunate to have Pepperfish look after our website hosting at favorable rates, Mythic beasts provide exceptionally good rates for the CI system virtual machine along with hardware donations (our apple macs were donated by them) and Collabora for providing physical hosting for our virtual machine server.

Despite these incredibly good deals the project still spends around 200gbp (250usd) a year on overheads, these services obviously benefit the whole project including minority platforms but are generally donated by users of the more popular platforms.
The benefits of a feature are similarly varied:
Developer learning
A developer may implement a feature to allow them to learn a new technology or skill
Project diversity
A feature may mean the project gets built in a new environment which reveals issues or opportunities in unconnected code. For example the Debian OS is built on a variety of hardware platforms and sometimes reveals issues in software by compiling it on big endian systems. These issues are often underlying bugs that are causing errors which are simply not observed on a little endian platform.
More users
Gaining users of the software is often a benefit and although most OSS developers are contributing for personal reasons having their work appreciated by others is often a factor. This might be seen as the other side of the support cost.

In the end the maintainers of a project often have to consider all of these factors and more to arrive at a decision about a feature, especially those only useful to a minority of users. Such decisions are rarely taken lightly as they often remove another developers work and the question is often what would I think about my contributions being discarded?

As a postscript, if anyone is willing to pay the costs to maintain the NetSurf cocoa frontend I have not removed the code just yet.

Sunday, 23 October 2016

Rabbit of Caerbannog

Subsequent to my previous use of American Fuzzy Lop (AFL) on the NetSurf bitmap image library I applied it to the gif library which, after fixing the test runner, failed to produce any crashes but did result in a better test corpus improving coverage above 90%

I then turned my attention to the SVG processing library. This was different to the bitmap libraries in that it required parsing a much lower density text format and performing operations on the resulting tree representation.

The test program for the SVG library needed some improvement but is very basic in operation. It takes the test SVG, parses it using libsvgtiny and then uses the parsed output to write out an imagemagick mvg file.

The libsvg processing uses the NetSurf DOM library which in turn uses an expat binding to parse the SVG XML text. To process this with AFL required instrumenting not only the XVG library but the DOM library. I did not initially understand this and my first run resulted in a "map coverage" indicating an issue. Helpfully the AFL docs do cover this so it was straightforward to rectify.

Once the test program was written and environment set up an AFL run was started and left to run. The next day I was somewhat alarmed to discover the fuzzer had made almost no progress and was running very slowly. I asked for help on the AFL mailing list and got a polite and helpful response, basically I needed to RTFM.

I must thank the members of the AFL mailing list for being so helpful and tolerating someone who ought to know better asking  dumb questions.

After reading the fine manual I understood I needed to ensure all my test cases were as small as possible and further that the fuzzer needed a dictionary as a hint to the file format because the text file was of such low data density compared to binary formats.

Rabbit of Caerbannog. Death awaits you with pointy teeth
I crafted an SVG dictionary based on the XML one, ensured all the seed SVG files were as small as possible and tried again. The immediate result was thousands of crashes, nothing like being savaged by a rabbit to cause a surprise.

Not being in possession of the appropriate holy hand grenade I resorted instead to GDB and electric fence. Unlike the bitmap library crashes memory bounds issues simply did not feature in the crashes.Instead they mainly centered around actual logic errors when constructing and traversing the data structures.

For example Daniel Silverstone fixed an interesting bug where the XML parser binding would try and go "above" the root node in the tree if the source closed more tags than it opened which resulted in wild pointers and NULL references.

I found and squashed several others including dealing with SVG which has no valid root element and division by zero errors when things like colour gradients have no points.

I find it interesting that the type and texture of the crashes completely changed between the SVG and binary formats. Perhaps it is just the nature of the textural formats that causes this although it might be due to the techniques used to parse the formats.

Once all the immediately reproducible crashes were dealt with I performed a longer run. I used my monster system as previously described and ran the fuzzer for a whole week.

Summary stats
=============

       Fuzzers alive : 10
      Total run time : 68 days, 7 hours
         Total execs : 9268 million
    Cumulative speed : 15698 execs/sec
       Pending paths : 0 faves, 2501 total
  Pending per fuzzer : 0 faves, 250 total (on average)
       Crashes found : 9 locally unique

After burning almost seventy days of processor time AFL found me another nine crashes and possibly more importantly a test corpus that generates over 90% coverage.

A useful tool that AFL provides is afl-cmin. This reduces the number of test files in a corpus to only those that are required to exercise all the code paths reached by the test set. In this case it reduced the number of files from 8242 to 2612

afl-cmin -i queue_all/ -o queue_cmin -- test_decode_svg @@ 1.0 /dev/null
corpus minimization tool for afl-fuzz by 

[+] OK, 1447 tuples recorded.
[*] Obtaining traces for input files in 'queue_all/'...
    Processing file 8242/8242...
[*] Sorting trace sets (this may take a while)...
[+] Found 23812 unique tuples across 8242 files.
[*] Finding best candidates for each tuple...
    Processing file 8242/8242...
[*] Sorting candidate list (be patient)...
[*] Processing candidates and writing output files...
    Processing tuple 23812/23812...
[+] Narrowed down to 2612 files, saved in 'queue_cmin'.

Additionally the actual information within the test files can be minimised with the afl-tmin tool. This must be run on each file individually and can take a relatively long time. Fortunately with GNU parallel one can run many of these jobs simultaneously which merely required another three days of CPU time to process. The resulting test corpus weighs in at a svelte 15 Megabytes or so against the 25 Megabytes before minimisation.

The result is yet another NetSurf library significantly improved by the use of AFL both from finding and squashing crashing bugs and from having a greatly improved test corpus to allow future library changes with a high confidence there will not be any regressions.

Tuesday, 11 October 2016

The pine stays green in winter... wisdom in hardship.

In December 2015 I saw the kickstarter for the Pine64. The project seemed to have a viable hardware design and after my experience with the hikey I decided it could not be a great deal worse.

Pine64 board in my case design
The system I acquired comprises of:
  • Quad core Allwinner A64 processor clocked at 1.2GHz 
  • 2 Gigabytes of DDR3 memory
  • Gigabit Ethernet
  • two 480Mbit USB 2.0 ports
  • HDMI type A
  • micro SD card for storage.
Hardware based kickstarter projects are susceptible to several issues and the usual suspects occurred causing delays:
  • Inability to scale, several thousand backers instead of the hundred they were aiming for
  • Issues with production
  • Issues with shipping
My personal view is that PINE 64 inc. handled it pretty well, much better than several other projects I have backed and as my Norman Douglas quotation suggests I think they have gained some wisdom from this.

I received my hardware at the beginning of April only a couple of months after their initial estimated shipping date which as these things go is not a huge delay. I understand some people who had slightly more complex orders were just receiving their orders in late June which is perhaps unfortunate but still well within kickstarter project norms.

As an aside: I fear that many people simply misunderstand the crowdfunding model for hardware projects and fail to understand that they are not buying a finished product, on the other side of the debate I think many projects need to learn expectation management much better than they do. Hyping the product to get interest is obviously the point of the crowdfunding platform, but over promising and under delivering always causes unhappy customers.

Pine64 board dimensions
Despite the delays in production and shipping the information available for the board was (and sadly remains) inadequate. As usual I wanted to case my board and there were no useful dimension drawings so I had to make my own from direct measurements together with a STL 3D model.

Also a mental sigh for "yet another poor form factor decision" so another special case size and design. After putting together a design and fabricating with the laser cutter I moved on to the software.

Once more this is where, once again, the story turns bleak. We find a very pretty website but no obvious link to the software (hint scroll to the bottom and find the "support" wiki link) once you find the wiki you will eventually discover that the provided software is either an Android 5.1.1 image (which failed to start on my board) or relies on some random guy from the forums who has put together his own OS images using a hacked up Allwinner Board Support Package (BSP) kernel.

Now please do not misunderstand me, I think the work by Simon Eisenmann (longsleep) to get a working kernel and Lenny Raposo to get viable OS images is outstanding and useful. I just feel that Allwinner and vendors like Pine64 Inc. should have provided something much, much better than they have. Even the efforts to get mainline support for this hardware are all completely volunteer community efforts and are are making slow progress as a result.

Assuming I wanted to run a useful OS on this hardware and not just use it as a modern work of art I installed a basic Debian arm64 using Lenny Raposo's pine64 pro site downloads. I was going to use the system for compiling and builds so used the "Debian Base" image to get a minimal setup. After generating unique ssh keys, renaming the default user and checking all the passwords and permissions I convinced myself the system was reasonably trustworthy.

The standard Debian Jessie OS runs as expected with few surprises. The main concern I have is that there are a number of unpackaged scripts installed (prefixed with pine64_) which perform several operations from reporting system health (using sysfs entries) to upgrading the kernel and bootloader.

While I understand these scripts have been provided for the novice users to reduce support burden, doing even more of the vendors job, I would much rather have had proper packages for these scripts, kernel and bootloader which apt could manage. This would have reduced image creation to a simple debootstrap giving much greater confidence in the images provenance.

The 3.10 based kernel is three years old at the time of writing and lacks a great number of features for the aarch64 ARM processors introduced since release. However I was pleasantly surprised at kvm apparently being available.

# dmesg|grep -i kvm
[    7.592896] kvm [1]: Using HYP init bounce page @b87c4000
[    7.593361] kvm [1]: interrupt-controller@1c84000 IRQ25
[    7.593778] kvm [1]: timer IRQ27
[    7.593801] kvm [1]: Hyp mode initialized successfully

I installed the libvirt packages (and hence all their dependencies like qemu) and created a bridge ready for the virtual machines.

I needed access to storage for the host disc images and while I could have gone the route of using USB attached SATA as with the hikey I decided to try and use network attached storage instead. Initially I investigated iSCSI but it seems the Linux target (iSCSI uses initiator for client and target for server) support is either old, broken or unpackaged.

I turned to network block device (nbd) which is packaged and seems to have reasonable stability out of the box on modern distributions. This appeared to work well, indeed over the gigabit Ethernet interface I managed to get a sustained 40 megabytes a second read and write rate in basic testing. This is better performance than a USB 2.0 attached SSD on the hikey

I fired up the guest and perhaps I should have known better than to expect a 3.10 vendor kernel to cope. The immediate hard crashes despite tuning many variables convinced me that virtualisation was not viable with this kernel.

So abandoning that approach I attempted to run the CI workload directly on the system. To my dismay this also proved problematic. The processor has the bad habit of throttling due to thermal issues (despite a substantial heatsink) and because the storage is network attached throttling the CPU also massively impacts I/O.

The limitations meant that the workload caused the system to move between high performance and almost no progress on a roughly ten second cycle. This caused a simple NetSurf recompile CI job to take over fifteen minutes. For comparison the same task takes the armhf builder (CubieTruck) four minutes and a 64 bit x86 build which takes around a minute.

If the workload is tuned to a single core which does not trip thermal throttling the build took seven minutes. which is almost identical to the existing single core virtual machine instance running on the hikey.

In conclusion the Pine64 is an interesting bit of hardware with fatally flawed software offering. Without Simon and Lenny providing their builds to the community the device would be practically useless rather than just performing poorly. There appears to have been no progress whatsoever on the software offering from Pine64 in the six months since I received the device and no prospect of mainline Allwinner support for the SoC either.

Effectively I have spent around 50usd (40 for the board and 10 for the enclosure) on a failed experiment. Perhaps in the future the software will improve sufficiently for it to become useful but I do not hold out much hope that this will come from Pine64 themselves.

Saturday, 1 October 2016

Paul Hollywood and the pistoris stone

There has been a great deal of comment among my friends recently about a particularly British cookery program called "The Great British Bake Off". There has been some controversy as the program is moving from the BBC to a commercial broadcaster.

Part of this discussion comes from all the presenters, excepting Paul Hollywood, declining to sign with the new broadcaster and partly because of speculation the BBC might continue with a similar format show with a new name.

Rob Kendrick provided the start to this conversation by passing on a satirical link suggesting Samuel L Jackson might host "cakes on a plane"

This caused a large number of suggestions for alternate names which I will be reporting but Rob Kendrick, Vivek Das Mohapatra, Colin Watson, Jonathan McDowell, Oki Kuma, Dan Alderman, Dagfinn Ilmari MannsÃ¥ke, Lesley Mitchell and Daniel Silverstone are the ones to blame.


  • Strictly come baking
  • Stars and their pies
  • Baking with the stars
  • Bake/Off.
  • Blind Cake
  • Cake or no cake?
  • The cake is a lie
  • Bake That.
  • Bake Me On
  • Bake On Me
  • Bakin' Stevens.
  • The Winner Bakes It All
  • Bakerloo
  • Bake Five
  • Every breath you bake
  • Every bread you bake
  • Unbake my heart
  • Knead and let prove
  • Bake me up before you go-go
  • I want to bake free
  • Another bake bites the dust
  • Cinnamon whorl is not enough
  • The pie who loved me
  • The yeast you can do.
  • Total collapse of the tart
  • Bake and deliver
  • You Gotta Bake
  • Bake's Seven
  • Natural Born Bakers
  • Bake It Or Leaven It
  • Driving the last pikelet
  • Pie crust on the dancefloor
  • Tomorrow never pies
  • Murder on the pie crust
  • The pie who came in from the cold.
  • You only bake twice (Every body has to make one sweet and one savoury dish).


So that is our list, anyone else got better ideas?

Monday, 26 September 2016

I'll huff, and I'll puff, and I'll blow your house in

Sometimes it really helps to have a different view on a problem and after my recent writings on my Public Suffix List (PSL) library I was fortunate to receive a suggestion from my friend Enrico Zini.

I had asked for suggestions on reducing the size of the library further and Enrico simply suggested Huffman coding. This was a technique I had learned about long ago in connection with data compression and the intervening years had made all the details fuzzy which explains why it had not immediately sprung to mind.

A small subset of the Public Suffix List as stored within libnspslHuffman coding named for David A. Huffman is an algorithm that enables a representation of data which is very efficient. In a normal array of characters every character takes the same eight bits to represent which is the best we can do when any of the 256 values possible is equally likely. If your data is not evenly distributed this is not the case for example if the data was english text then the value is fifteen times more likely to be that for e than k.

every step of huffman encoding tree build for the example string tableSo if we have some data with a non uniform distribution of probabilities we need a way the data be encoded with fewer bits for the common values and more bits for the rarer values. To be efficient we would need some way of having variable length representations without storing the length separately. The term for this data representation is a prefix code and there are several ways to generate them.

Such is the influence of Huffman on the area of prefix codes they are often called Huffman codes even if they were not created using his algorithm. One can dream of becoming immortalised like this, to join the ranks of those whose names are given to units or whole ideas in a field must be immensely rewarding, however given Huffman invented his algorithm and proved it to be optimal to answer a question on a term paper in his early twenties I fear I may already be a bit too late.

The algorithm itself is relatively straightforward. First a frequency analysis is performed, a fancy way of saying count how many of each character is in the input data. Next a binary tree is created by using a priority queue initialised with the nodes sorted by frequency.

The resulting huffman tree and the binary representation of the input symbols
The two least frequent items count is summed together and a node placed in the tree with the two original entries as child nodes. This step is repeated until a single node exists with a count value equal to the length of the input.

To encode data once simply walks the tree outputting a 0 for a left node or 1 for right node until reaching the original value. This generates a mapping of values to bit output, the input is then simply converted value by value to the bit output. To decode the data the data is used bit by bit to walk the tree to arrive at values.

If we perform this algorithm on the example string table *!asiabvcomcoopitamazonawsarsaves-the-whalescomputebasilicata we can reduce the 488 bits (61 * 8 bit characters) to 282 bits or 40% reduction. Obviously in a real application the huffman tree would need to be stored which would probably exceed this saving but for larger data sets it is probable this technique would yield excellent results on this kind of data.

Once I proved this to myself I implemented the encoder within the existing conversion program. Although my perl encoder is not very efficient it can process the entire PSL string table (around six thousand labels using 40KB or so) in less than a second, so unless the table grows massively an inelegant approach will suffice.

The resulting bits were packed into 32bit values to improve decode performance (most systems prefer to deal with larger memory fetches less frequently) and resulted in 18KB of output or 47% of the original size. This is a great improvement in size and means the statically linked test program is now 59KB and is actually smaller than the gzipped source data.

$ ls -alh test_nspsl
-rwxr-xr-x 1 vince vince 59K Sep 25 23:58 test_nspsl
$ ls -al public_suffix_list.dat.gz 
-rw-r--r-- 1 vince vince 62K Sep  1 08:52 public_suffix_list.dat.gz

To be clear the statically linked program can determine if a domain is in the PSL with no additional heap allocations and includes the entire PSL ordered tree, the domain label string table and the huffman decode table to read it.

An unexpected side effect is that because the decode loop is small it sits in the processor cache. This appears to cause the string comparison function huffcasecmp() (which is not locale dependant because we know the data will be limited ASCII) performance to be close to using strcasecmp() indeed on ARM32 systems there is a very modest improvement in performance.

I think this is as much work as I am willing to put into this library but I am pleased to have achieved a result which is on par with the best of breed (libpsl still has a data representation 20KB smaller than libnspsl but requires additional libraries for additional functionality) and I got to (re)learn an important algorithm too.

Tuesday, 20 September 2016

If I see an ending, I can work backward.

Now while I am sure Arthur Miller was referring to writing a play when he said those words they have an oddly appropriate resonance for my topic.

In the early nineties Lou Montulli applied the idea of magic cookies to HTTP to make the web stateful, I imagine he had no idea of the issues he was going to introduce for the future. Like most of the web technology it was a solution to an immediate problem which it has never been possible to subsequently improve.

Chocolate chip cookie are much tastier than HTTP cookiesThe HTTP cookie is simply a way for a website to identify a connecting browser session so that state can be kept between retrieving pages. Due to shortcomings in the design of cookies and implementation details in browsers this has lead to a selection of unwanted side effects. The specific issue that I am talking about here is the supercookie where the super prefix in this context has similar connotations as to when applied to the word villain.

Whenever the browser requests a resource (web page, image, etc.) the server may return a cookie along with the resource that your browser remembers. The cookie has a domain name associated with it and when your browser requests additional resources if the cookie domain matches the requested resources domain name the cookie is sent along with the request.

As an example the first time you visit a page on www.example.foo.invalid you might receive a cookie with the domain example.foo.invalid so next time you visit a page on www.example.foo.invalid your browser will send the cookie along. Indeed it will also send it along for any page on another.example.foo.invalid

A supercookies is simply one where instead of being limited to one sub-domain (example.foo.invalid) the cookie is set for a top level domain (foo.invalid) so visiting any such domain (I used the invalid name in my examples but one could substitute com or co.uk) your web browser gives out the cookie. Hackers would love to be able to set up such cookies and potentially control and hijack many sites at a time.

This problem was noted early on and browsers were not allowed to set cookie domains with fewer than two parts so example.invalid or example.com were allowed but invalid or com on their own were not. This works fine for top level domains like .com, .org and .mil but not for countries where the domain registrar had rules about second levels like the uk domain (uk domains must have a second level like .co.uk).

NetSurf cookie manager showing a supercookieThere is no way to generate the correct set of top level domains with an algorithm so a database is required and is called the Public Suffix List (PSL). This database is a simple text formatted list with wildcard and inversion syntax and is at time of writing around 180Kb of text including comments which compresses down to 60Kb or so with deflate.

A few years ago with ICANN allowing the great expansion of top level domains the existing NetSurf supercookie handling was found to be wanting and I decided to implement a solution using the PSL. At this point in time the database was only 100Kb source or 40Kb compressed.

I started by looking at limited existing libraries. In fact only the regdom library was adequate but used 150Kb of heap to load the pre-processed list. This would have had the drawback of increasing NetSurf heap usage significantly (we still have users on 8Mb systems). Because of this and the need to run PHP script to generate the pre-processed input it was decided the library was not suitable.

Lacking other choices I came up with my own implementation which used a perl script to construct a tree of domains from the PSL in a static array with the label strings in a separate table. At the time my implementation added 70Kb of read only data which I thought reasonable and allowed for direct lookup of answers from the database.

This solution still required a pre-processing step to generate the C source code but perl is much more readily available, is a language already used by our tooling and we could always simply ship the generated file. As long as the generated file was updated at release time as we already do for our fallback SSL certificate root set this would be acceptable.

wireshark session shown NetSurf sending a co.uk supercookie to bbc.co.uk
I put the solution into NetSurf, was pleased no-one seemed to notice and moved on to other issues. Recently while fixing a completely unrelated issue in the display of session cookies in the management interface and I realised I had some test supercookies present in the display. After the initial "thats odd" I realised with horror there might be a deeper issue.

It quickly became evident the PSL generation was broken and had been for a long time, even worse somewhere along the line the "redundant" empty generated source file had been removed and the ancient fallback code path was all that had been used.

This issue had escalated somewhat from a trivial display problem. I took a moment to asses the situation a bit more broadly and came to the conclusion there were a number of interconnected causes, centered around the lack of automated testing, which could be solved by extracting the PSL handling into a "support" library.

NetSurf has several of these support libraries which could be used separately to the main browser project but are principally oriented towards it. These libraries are shipped and built in releases alongside the main browser codebase and mainly serve to make API more obvious and modular. In this case my main aim was to have the functionality segregated into a separate module which could be tested, updated and monitored directly by our CI system meaning the embarrassing failure I had found can never occur again.

Before creating my own library I did consider a library called libpsl had been created since I wrote my original implementation. Initially I was very interested in using this library given it managed a data representation within a mere 32Kb.

Unfortunately the library integrates a great deal of IDN and punycode handling which was not required in this use case. NetSurf already has to handle IDN and punycode translations and uses punycode encoded domain names internally only translating to unicode representations for display so duplicating this functionality using other libraries requires a great deal of resource above the raw data representation.

I put the library together based on the existing code generator Perl program and integrated the test set that comes along with the PSL. I was a little alarmed to discover that the PSL had almost doubled in size since the implementation was originally written and now the trivial test program of the library was weighing in at a hefty 120Kb.

This stemmed from two main causes:
  1. there were now many more domain label strings to be stored
  2. there now being many, many more nodes in the tree.
To address the first cause the length of the domain label strings was moved into the unused padding space within each tree node removing a byte from each domain label saving 6Kb. Next it occurred to me that while building the domain label string table that if the label to be added already existed as a substring within the table it could be elided.

The domain labels were sorted from longest to shortest and added in order searching for substring matches as the table was built this saved another 6Kb. I am sure there are ways to reduce this further I have missed (if you see them let me know!) but a 25% saving (47Kb to 35Kb) was a good start.

The second cause was a little harder to address. The structure representing nodes in the tree I started with was at first look reasonable.

struct pnode {
    uint16_t label_index; /* index into string table of label */
    uint16_t label_length; /* length of label */
    uint16_t child_node_index; /* index of first child node */
    uint16_t child_node_count; /* number of child nodes */
};

I examined the generated table and observed that the majority of nodes were leaf nodes (had no children) which makes sense given the type of data being represented. By allowing two types of node one for labels and a second for the child node information this would halve the node size in most cases and requiring only a modest change to the tree traversal code.

The only issue with this would be that a way to indicate a node has child information. It was realised that the domain labels can have a maximum length of 63 characters meaning their length can be represented in six bits so a uint16_t was excessive. The space was split into two uint8_t parts one for the length and one for a flag to indicate child data node followed.

union pnode {
    struct {
        uint16_t index; /* index into string table of label */
        uint8_t length; /* length of label */
        uint8_t has_children; /* the next table entry is a child node */
    } label;
    struct {
        uint16_t node_index; /* index of first child node */
        uint16_t node_count; /* number of child nodes */
    } child;
};

static const union pnode pnodes[8580] = {
    /* root entry */
    { .label = { 0, 0, 1 } }, { .child = { 2, 1553 } },
    /* entries 2 to 1794 */
    { .label = {37, 2, 1 } }, { .child = { 1795, 6 } },

...

    /* entries 8577 to 8578 */
    { .label = {31820, 6, 1 } }, { .child = { 8579, 1 } },
    /* entry 8579 */
    { .label = {0, 1, 0 } },

};

This change reduced the node array size from 63Kb to 33Kb almost a 50% saving. I considered using bitfields to try and reduce the label length and has_children flag into a single byte but such packing will not reduce the length of a node below 32bits because it is unioned with the child structure.

A possibility of using the spare uint8_t derived by bitfield packing to store an additional label node in three other nodes was considered but added a great deal of complexity to node lookup and table construction for saving around 4Kb so was not incorporated.

With the changes incorporated the test program was a much more acceptable 75Kb reasonably close to the size of the compressed source but with the benefits of direct lookup. Integrating the libraries single API call into NetSurf was straightforward and resulted in correct operation when tested.

This episode just reminded me of the dangers of code that can fail silently. It exposed our users to a security problem that we thought had been addressed almost six years ago and squandered the limited resources of the project. Hopefully a lesson we will not have to learn again any time soon. If there is a positive to take away it is that the new implementation is more space efficient, automatically built and importantly tested