Vincents Random Waffle: Man cannot discover new oceans unless he has the courage to lose sight of the shore

Thursday, 7 March 2013

Man cannot discover new oceans unless he has the courage to lose sight of the shore

Continuing with my whirlwind introduction to NetSurf Development now is the time to start examining the code, how its arranged and how to interact with the existing developers.

The way the NetSurf source is structured is around the idea of the frontends each being a native browser. While this implies that there are nine separate browsers that happen to share a common code base the separation is not quite that well defined.

Each frontend provides the OS entry point (the main() function in a c program) and calls out to standard browser initialisation entry function netsurf_init() and then starts running the browsers main dispatch loop with netsurf_main_loop() when that exits the frontend cleans up with netsurf_exit().

The frontends provide a large selection of functions which are called from the core code. These routines run from running the event scheduler through to rendering graphics and text.

Finding your way around

The browsers directory layout is fairly shallow consisting of some Makefiles, the nine frontend directories and eleven others.

The Makefiles are GNU make and represent a pretty straightforward linear build system. We do not use recursive make or autotools. There are plans to use the core buildsystem that all the other components use.

The frontend directories contain the code for the frontend and the makefile fragments to build them which are included by the top level Makefile.

In addition there is:

desktop

This contains the non frontend specific code that actually behaves like a browser. For example desktop/netsurf.c contains the three primary functions we outlined in the introduction. You will also find much of the function and data structures interfaces the frontend must provide. It is unlikely someone new to the project will need to change anything in here (there be dragons) but is an important set of routines.

utils

Here you will find the utility and compatibility interfaces, things like url handling, logging, user messages, base64 handling. These are utility interfaces that do not justify splitting out their functionality to a separate library but are useful everywhere. Changing an interface in here would likely result in a major refactor.
For example one quirk is the logging macro was created before varadic C preprocessor macros were universal so it must be called as LOG(("hello %s world", world_type)) e.g. with double brackets. Fixing this and perhaps improving the logging functionality would be "nice" but the changes would be massive and potentially conflict with ongoing work.

content

This contains all the core code to handle contents i.e. html, css, javascript, images. The handling includes retrieval of the resources from URI , correct caching of the received objects and managing the objects state. It should be explicitly stated that the content handlers are separate and use this core functionality. The actual content handlers to deal with object contents such as the routines to decode image files or render html are elsewhere.

image

This is where the content handlers for the various image types are kept. The majority of these image types (jpeg, png, webp, gif, bmp) use a standard library to perform the actual decode (libjpeg, libpng etc.). One special feature used by most image handlers is that of a decoded image cache which is distinct and separate from the content cache.
The decoded image cache manages decoding of the source images (the jpegs and pngs) into frontend specific "render" bitmaps. For example the gtk frontend keeps the decoded images as a cairo surface ready for immediate plotting.
The cache uses a demand based (when the browser actually displays the image) just in time strategy which has been carefully balanced, with real world input, to reduce the overhead of unnecessary image decoding and processing against memory usage for the render bitmaps.

css

The css content handlers provide for the processing of css source text and use the NetSurf libcss library to process into a bytecode suitable for applying style selection at render time.

javascript

The javascript handlers (strictly speaking this should be named the "script" directory as all types of script are handled here) provide basic functionality to bind a javascript engine to the rest of the browser, principally the Document Object Model (DOM) accessed with libdom. The only engine that currently has bindings is Spidermonkey from the Mozilla foundation.

render

This is the heart of the browser containing the content handler for html (and plain text). This handler deals with:

Acquiring the html.
Running the base parser as data arrives which generates the DOM and hence DOM events from which additional resource (stylesheets etc.) fetches are started.
Deal with script loading
Constructing the box model used for layout
Performing the Layout and rendering of the document.

Because this module has so many jobs to do it has inevitably become very complex and involved, it is also the principle area of core development . Currently NetSurf lacks a dynamic renderer so changes made by scripts post document load event are not visible. This also has the side effect that the render is only started after the DOM has finished construction and all the resources on a page have completed their fetch which can lead to undesirable display latency.

Docs

Documentation about building and using NetSurf. If anyone wants a place to start improving NetSurf, this is it, it is very incomplete. It must be noted this is not where dynamically generated documentation is found. For the current Doxygen output the best place to look is the most recent build on the Continuous Integration system.

resources

These are runtime resources which are common to all frontends. To be strictly correct they may be the sources which get converted into runtime resources e.g. The FatMessages file which is teh message text for all frontends in all languages, this gets processed at build time into separate files ready.

!NetSurf

This is another resources directory and technically the resources for the riscos frontend. The naming and reliance on this directory are historical. To allow the RISC OS frontend to be run directly from the source directory and an inability of RISC OS to process symbolic links most common runtime resources end up in here and linked to from elsewhere.

test

These are some basic canned test programs and files, principally to test elements of the utils and perform specific exercise of various javascript components.

Getting started

Once a developer has a checked out working build environment and can run the executable for their chosen frontend (and maybe done some web browsing :-) it is time to look at contributing.

If a developer does not have a feature or bug in mind when they begin the best way to get started is often to go bug hunting. The NetSurf bugtacker has lots to choose from unfortunately. Do remember to talk to us (IRC is the best bet if you are bug hunting) about what you are up to but do not be impatient. Some of those bugs are dirty great Shelob types and are not being fixed because even the core developers are stumped!

When first getting going I cannot recommend reading the code enough, this seems to be a skill that many inexperienced open source developers have yet to acquire, especially if they are from a predominately proprietary development background. One wonderful feature of open source software is you get to see all of it, all the elegant nice code and all the "what the hell were they thinking" code too.

One important point is to use your tools well the source is in git, if you learn how to use git well you will gain a skill that is readily portable, not just for NetSurf. And not just revision control tools, learn to use your debugger well and tools like valgrind. Those skills will replay the time spent learning them themselves many times over.

When using git one thing to remember is commit early and often, it does not matter if you have lots of junk commits and dead ends, once you have something viable you can use

git rebase --interactive

and rewrite it all into a single sensible set of commits. Also please do remember to develop on a branch, or if you did not

git pull --rebase

is your friend to avoid unnecessary merges.

Playing nicely with others

The NetSurf community is a small band of unpaid volunteers. On average we manage to collectively put in, perhaps, ten hours a week with the occasional developer weekend where we often manage over twenty hours each.

The result is that developer time is exceptionally valuable, add in a mature codebase with its fair share of technical debt and you get a group who, when they get to work on NetSurf, are incredibly busy. To be fair we are just like every other small open source project in that respect.

What this means to a new contributor is that they should try and do their homework before asking the obvious questions. The documentation is there for a reason and in spite of its obvious shortcomings please read it!

When asking questions it should be noted that currently the majority of active contributors are in the Europe so if you visit the IRC channel or post questions to lists the time difference is something to keep in mind.

I carefully said contributor above and not developer, users trying the CI builds and reporting results are welcome...as long as they report useful bugs to the bug tracker. Simon Tatham has produced an excellent resource on this subject.

Also we are always happy to receive translations to new languages (diff against the FatMessages file would be outstandingly useful but anything is welcome), artwork, documentation. Just recall what I mentioned about busy developers. Surest way to get us to see something is the development mailing list, you will probably get a reply, though I will not promise how fast!

Some of the more common mistakes when interacting with the community are:

Demanding we fix or add a feature. At best we will ignore you...though merciless sarcasm is not an unusual response to this. Perhaps a polite suggestion to the users mailing list would get better response? This is simple case of misunderstanding the relationship with the developers, you got the software for free so demanding we spend our leisure time to change it for you is impolite, or at least that is how I see it (and I am British, we do polite to excess).
Request write access to the git repository without a proven track record. We are fairly open to new developers once they have a track record but initially contributions should be via patch series on the mailing list we can feed to git-am. Eventually we may give you commit access to your own personal branch space and from there extend to the rest of the repository.
Developing a feature without talking to the team first and then being upset when we reject it. This is especially aggravating for all concerned as effort is wasted all around. If you have a great idea for a feature talk to us first! And if we indicate in our typically polite way that it is not going to be accepted listen to us! Of course you are free to ignore us, just please do not be upset later on.
Non-constructive criticism. What I refer to here is finding fault in our software without logging a bug or otherwise providing something to respond to. We try to provide the best software we can and by extension have a great deal of pride in our project. This antisocial behaviour helps no one but can have a large negative impact on developer productivity.

In conclusion

Hopefully this has been of some use although I had hoped to cover more and provide deeper insights and advice on the codebase but there is only so much generalisation to be done before it is just easier for the developer to go read the code for themselves.

I look forward to lots of new contributions :-) though I fear this may all end up as more of a crib sheet for next time we do GSOC, time will tell.