Thursday, 7 March 2013

Man cannot discover new oceans unless he has the courage to lose sight of the shore

Continuing with my whirlwind introduction to NetSurf Development now is the time to start examining the code, how its arranged and how to interact with the existing developers.

The way the NetSurf source is structured is around the idea of the frontends each being a native browser. While this implies that there are nine separate browsers that happen to share a common code base the separation is not quite that well defined.

Each frontend provides the OS entry point (the main() function in a c program) and calls out to standard browser initialisation entry function netsurf_init() and then starts running the browsers main dispatch loop with  netsurf_main_loop() when that exits the frontend cleans up with netsurf_exit().

The frontends provide a large selection of functions which are called from the core code. These routines run from running the event scheduler through to rendering graphics and text.

Finding your way around

The browsers directory layout is fairly shallow consisting of some Makefiles, the nine frontend directories and eleven others.

The Makefiles are GNU make and represent a pretty straightforward linear build system. We do not use recursive make or autotools. There are plans to use the core buildsystem that all the other components use.

The frontend directories contain the code for the frontend and the makefile fragments to build them which are included by the top level Makefile.

In addition there is:
desktop
This contains the non frontend specific code that actually behaves like a browser. For example desktop/netsurf.c contains the three primary functions we outlined in the introduction. You will also find much of the function and data structures interfaces the frontend must provide. It is unlikely someone new to the project will need to change anything in here (there be dragons) but is an important set of routines.
utils
Here you will find the utility and compatibility interfaces, things like url handling, logging, user messages, base64 handling. These are utility interfaces that do not justify splitting out their functionality to a separate library but are useful everywhere. Changing an interface in here would likely result in a major refactor.
For example one quirk is the logging macro was created before varadic C preprocessor macros were universal so it must be called as LOG(("hello %s world", world_type)) e.g. with double brackets. Fixing this and perhaps improving the logging functionality would be "nice" but the changes would be massive and potentially conflict with ongoing work.
content
This contains all the core code to handle contents i.e. html, css, javascript, images. The handling includes retrieval of the resources from URI , correct caching of the received objects and managing the objects state. It should be explicitly stated that the content handlers are separate and use this core functionality. The actual content handlers to deal with object contents such as the routines to decode image files or render html are elsewhere.
image
This is where the content handlers for the various image types are kept. The majority of these image types (jpeg, png, webp, gif, bmp) use a standard library to perform the actual decode (libjpeg, libpng etc.). One special feature used by most image handlers is that of a decoded image cache which is distinct and separate from the content cache.
The decoded image cache manages decoding of the source images (the jpegs and pngs) into frontend specific "render" bitmaps. For example the gtk frontend keeps the decoded images as a cairo surface ready for immediate plotting.
The cache uses a demand based (when the browser actually displays the image) just in time strategy which has been carefully balanced, with real world input, to reduce the overhead of unnecessary image decoding and processing against memory usage for the render bitmaps.
css
The css content handlers provide for the processing of css source text and use the NetSurf libcss library to process into a bytecode suitable for applying style selection at render time.
javascript
The javascript handlers (strictly speaking this should be named the "script" directory as all types of script are handled here) provide basic functionality to bind a javascript engine to the rest of the browser, principally the Document Object Model (DOM) accessed with libdom. The only engine that currently has bindings is Spidermonkey from the Mozilla foundation.
render
This is the heart of the browser containing the content handler for html (and plain text). This handler deals with:
  • Acquiring the html.
  • Running the base parser as data arrives which generates the DOM and hence DOM events from which additional resource (stylesheets etc.) fetches are started.
  • Deal with script loading
  • Constructing the box model used for layout
  • Performing the Layout and rendering of the document.
Because this module has so many jobs to do it has inevitably become very complex and involved, it is also the principle area of core development . Currently NetSurf lacks a dynamic renderer so changes made by scripts post document load event are not visible. This also has the side effect that the render is only started after the DOM has finished construction and all the resources on a page have completed their fetch which can lead to undesirable display latency.
Docs
Documentation about building and using NetSurf. If anyone wants a place to start improving NetSurf, this is it, it is very incomplete. It must be noted this is not where dynamically generated documentation is found. For the current Doxygen output the best place to look is the most recent build on the Continuous Integration system.
resources
These are runtime resources which are common to all frontends. To be strictly correct they may be the sources which get converted into runtime resources e.g. The FatMessages file which is teh message text for all frontends in all languages, this gets processed at build time into separate files ready.
!NetSurf
This is another resources directory and technically the resources for the riscos frontend. The naming and reliance on this directory are historical. To allow the RISC OS frontend to be run directly from the source directory and an inability of RISC OS to process symbolic links most common runtime resources end up in here and linked to from elsewhere.
test
These are some basic canned test programs and files, principally to test elements of the utils and perform specific exercise of various javascript components.

Getting started

Once a developer has a checked out working build environment and can run the executable for their chosen frontend (and maybe done some web browsing :-) it is time to look at contributing. 

If a developer does not have a feature or bug in mind when they begin the best way to get started is often to go bug hunting. The NetSurf bugtacker has lots to choose from unfortunately. Do remember to talk to us (IRC is the best bet if you are bug hunting) about what you are up to but do not be impatient. Some of those bugs are dirty great Shelob types and are not being fixed because even the core developers are stumped!

When first getting going I cannot recommend reading the code enough, this seems to be a skill that many inexperienced open source developers have yet to acquire, especially if they are from a predominately proprietary development background. One wonderful feature of open source software is you get to see all of it, all the elegant nice code and all the "what the hell were they thinking" code too.

One important point is to use your tools well the source is in git, if you learn how to use git well you will gain a skill that is readily portable, not just for NetSurf. And not just revision control tools, learn to use your debugger well and tools like valgrind. Those skills will replay the time spent learning them themselves many times over.

When using git one thing to remember is commit early and often, it does not matter if you have lots of junk commits and dead ends, once you have something viable you can use
git rebase --interactive
and rewrite it all into a single sensible set of commits. Also please do remember to develop on a branch, or if you did not
git pull --rebase
is your friend to avoid unnecessary merges.

Playing nicely with others

The NetSurf community is a small band of unpaid volunteers. On average we manage to collectively put in, perhaps, ten hours a week with the occasional developer weekend where we often manage over twenty hours each.

The result is that developer time is exceptionally valuable, add in a mature codebase with its fair share of technical debt and you get a group who, when they get to work on NetSurf, are incredibly busy. To be fair we are just like every other small open source project in that respect.

What this means to a new contributor is that they should try and do their homework before asking the obvious questions. The documentation is there for a reason and in spite of its obvious shortcomings please read it!

When asking questions it should be noted that currently the majority of active contributors are in the Europe so if you visit the IRC channel or post questions to lists the time difference is something to keep in mind.

I carefully said contributor above and not developer, users trying the CI builds and reporting results are welcome...as long as they report useful bugs to the bug tracker. Simon Tatham has produced an excellent resource on this subject.

Also we are always happy to receive translations to new languages (diff against the FatMessages file would be outstandingly useful but anything is welcome), artwork, documentation. Just recall what I mentioned about busy developers. Surest way to get us to see something is the development mailing list, you will probably get a reply, though I will not promise how fast!

Some of the more common mistakes when interacting with the community are:
  • Demanding we fix or add a feature. At best we will ignore you...though merciless sarcasm is not an unusual response to this. Perhaps a polite suggestion to the users mailing list would get better response? This is simple case of misunderstanding the relationship with the developers, you got the software for free so demanding we spend our leisure time to change it for you is impolite, or at least that is how I see it (and I am British, we do polite to excess).
  • Request write access to the git repository without a proven track record. We are fairly open to new developers once they have a track record but initially contributions should be via patch series on the mailing list we can feed to git-am. Eventually we may give you commit access to your own personal branch space and from there extend to the rest of the repository.
  • Developing a feature without talking to the team first and then being upset when we reject it. This is especially aggravating for all concerned as effort is wasted all around. If you have a great idea for a feature talk to us first! And if we indicate in our typically polite way that it is not going to be accepted listen to us! Of course you are free to ignore us, just please do not be upset later on.
  • Non-constructive criticism. What I refer to here is finding fault in our software without logging a bug or otherwise providing something to respond to. We try to provide the best software we can and by extension have a great deal of pride in our project. This antisocial behaviour helps no one but can have a large negative impact on developer productivity.

In conclusion

Hopefully this has been of some use although I had hoped to cover more and provide deeper insights and advice on the codebase but there is only so much generalisation to be done before it is just easier for the developer to go read the code for themselves. 

I look forward to lots of new contributions :-) though I fear this may all end up as more of a crib sheet for next time we do GSOC, time will tell.

The way to get started is to quit talking and begin doing

When Walt Disney said that he almost certainly did not have software developers in mind. However it is still good advice, especially if you have no experience with a piece of software you want to change.

Others have written extensively on the topic of software as more than engineering and the creative aspects, comparing it to a craft, which is a description I am personally most comfortable with. As with any craft though you have to understand the material you have to work with and an existing codebase is often a huge amount of material.

While developing NetSurf we get a lot of people who talk a lot about what we "ought" or "should" do to improve the browser but few who actually get off their collective posteriors and contribute. In fact according to the ohloh analysis of the revision control system there are roughly six of us who contribute significantly on a regular basis and of those only four who have interests beyond a specific frontend.

It has been mentioned by a couple of new people who have recently compiled the browser from source that it is somewhat challenging to get started. To address this criticism I intend to give a whirlwind introduction to getting started with the NetSurf codebase, perhaps we can even get some more contributors!

This first post will cover the mechanics of acquiring and building the source and the next will look at working with the codebase and the Netsurf community.

This is my personal blog, the other developers might disagree with my approach, which is why this is in my blog and not on the NetSurf website. That being said comments are enabled and I am sure they will correct anything I get wrong.

Resources

NetSurf has a selection of resources which are useful to a new developer:

Build environment

The first thing a new developer has to consider is their build environment. NetSurf supports nine frontends on several Operating Systems (OS) but is limited on the build environment that can be used.

The developer will require a Unix like system but let's be honest, we have not tried with anything other than Linux distributions in some time or MAC OS X for the cocoa frontend because its a special snowflake.

Traditionally at this point in this kind of introduction it would be traditional to provide the command line for various packaging systems to install the build environment and external libraries. We do have documentation that does this but no one reads it, or at least it feels like that. Instead we have chosen to provide a shell fragment that encodes all the bootstrap knowledge in one place, its kept in the revision control system so it can be updated.

To use: download it, read it (hey running random shell code is always a bad idea), source it into your environment and run ns-apt-get-install on a Debain based system or ns-yum-install on Fedora. The rest of this posting will assume the functionality of this script is available, if you want to do it the hard way please refer to the script for the relevant commands and locations.

For Example:
$ wget http://git.netsurf-browser.org/netsurf.git/plain/Docs/env.sh
$ less env.sh
$ source env.sh
$ ns-apt-get-install

Historically NetSurf built on more platforms natively but the effort to keep these build environments working was extensive and no one was prepared to do the necessary maintenance work. This is strictly a build setup decision and does not impact the supported platforms.

Since the last release NetSurf has moved to the git version control system from SVN. This has greatly improved our development process and allows for proper branching and merging we previously struggled to implement.

In addition to the core requirements external libraries NetSurf depends on will need to be installed. Native frontends where the compiled output is run on the same system it was built on are pretty straightforward in that the native package management system can be used to install the libraries for that system.

For cross building to the less common frontends we provide a toolchain repository which will build the entire cross toolchain and library set (we call this the SDK) direct from source. This is what the CI system uses to generate its output so is well tested.

External Libraries

NetSurf depends upon several external development libraries for image handling, network fetching etc. The libraries for the GTK frontend are installed by default if using the development script previously mentioned.

Generally a minimum of libcurl, libjpeg and libpng are necessary along with whatever libraries are required for the toolkit.

Project Source and Internal Libraries

One important feature of NetSurf is that a lot of functionality is split out into libraries. These are internal libraries and although technically separate projects, releases bundle them all together and for development we assume they will all be built together.

The development script provides the ns-clone function which clones all the project sources directly from their various git repositories. Once cloned the ns-make script can be used to build and install all the internal libraries into a local target ready for building the browser.

For Example:

$ source env.sh
$ ns-clone
$ ns-make-libs install

Frontend selection

As I have mentioned NetSurf supports several windowing environments (toolkits if you like) however on some OS there is only one toolkit so the two get conflated together.

NetSurf currently has nine frontends to consider:
  • amiga
    This frontend is for Amiga OS 4 on the power PC architecture and is pretty mature. It is integrated into the continuous integration (CI) system and has an active maintainer. Our toolchain repository can build a functional cross build environment, the target is ppc-amigaos.
  • atari
    This frontend is for the m68k and m5475 (coldfire) architecture. It has a maintainer but is still fairly limited principally because of the target hardware platform. It is integrated into the continuous integration system. Our toolchain repository can build a functional cross build environment for both architectures.
  • beos
    This frontend is for beos and the Haiku clone. It does have a maintainer although they are rarely active. It is little more than a proof of concept port and there is no support in the CI system because there is currently no way to run the jenkins slave client or to construct a viable cross build environment. This frontend is unusual in that it is the only one written in C++ 
  • cocoa
    NetSurf Mac OS X build boxes for PPC and X86This frontend supports the cocoa, the windowing system of MacOS X, on both PPC (version 10.5) and X86 (10.6 or later). The port is usefully functional and is integrated into the CI system, built natively on Mac mini systems as a jenkins slave. The port is written in objective C and currently has no active maintainer. 
  • framebuffer
    This frontend is different to the others in that it does not depend on a system toolkit and allows the browser to be run anywhere the projects internal libnsfb library can present a linear framebuffer. It is maintained and integrated into the CI system.
  • gtk
    This frontend uses the gtk+ toolkit library and is probably the most heavily used frontend by the core developers.  The port is usefully functional and is integrated into the CI system, there is no official maintainer. 
  • monkey
    This frontend is a debugging and test framework. It can be built with no additional library dependencies but has no meaningful user interface. It is maintained and integrated into the CI system.
  • riscos
    This frontend is the oldest from which the browser evolved. The port is usefully functional and is integrated into the CI system. There is an official maintainer for this frontend although they are not active very often. Our toolchain repository can build a functional cross build environment for this target.
  • windows
    This frontend would more accurately be called the win32 frontend as it specifically targets that Microsoft API. The port is functional but suffers from a lack of a maintainer. The port is integrated into the CI system and the toolchain repository can build a functional cross build environment for this target.

Building and running NetSurf

For a developer new to the project I recommend that the gtk version be built natively which is what I describe here.

Once the internal libraries have been installed, building NetSurf itself is as simple as running make.

For Example:
$ source env.sh
$ ns-make -C ${TARGET_WORKSPACE}/${NS_BROWSER} TARGET=gtk

Though generally most developers would change into the netsurf source directory and run make there. The target (frontend) selection defaults to gtk on Linux systems so that can also be omitted  Once the browser is built it can be run from the source tree to test.

For Example:
$ source env.sh
$ cd ${TARGET_WORKSPACE}/${NS_BROWSER}
$ ns-make
$ ./nsgtk

The build can be configured by editing a Makefile.config file. An example Makefile.config.example can be copied into place and the configuration settings overridden as required. The default values can be found in Makefile.defaults which should not be edited directly.

Logging is enabled with the command line switch -v and user options can be specified on the command line, options on the command line will override those sourced from a users personal configuration, generally found in ~/.netsurf/Choices although this can be compile time configured.