Thursday 24 May 2012

Interrupt Service Routines

Something a little low level for this post. I have been asked recently how to "test" for the maximum duration of an Interrupt Service Routine (ISR) in Linux

To do this I probably ought to explain what the heck an ISR is!

A CPU executes one instruction after another and runs your programs. However early in the history of the electronic computer it soon became apparent that sometimes there were events happening, generally caused by a hardware peripheral, that required some other code to be executed without having to wait for the running program to check for the event.

This could have been solved by having a second processor to look after those exceptional events but that would have been expensive, difficult to synchronize and the designers took the view that there was a perfectly good processor already sat there just running some users program. This Interruption in the code flow became known as, well, an Interrupt (and the other approach as polling).

The hardware for supporting interrupts started out very simply, the processor would complete execution of the current instruction and when the Program Counter (PC) was about to be incremented if an Interrupt ReQest (IRQ) was pending the PC would be stored somewhere (often a "special" IRQ stack or register) and then execution started at some fixed address.

The interrupting event would be dealt with by some code and execution returned to the original program without it ever knowing the CPU just wandered off to do something else. The code that deals with the interrupt is known as the Interrupt Service Routine (ISR).

Now I have glossed over a lot of issues here (sufficient to say there are a huge number of details in which to hide the devil) but the model is good enough for my purpose. A modern CPU has a extraordinarily complex system of IRQ controllers to deal with numerous peripherals requesting the CPU stop what its doing and look after something else.

This system of controllers will ultimately cause program execution to be delivered to an ISR for that device. If we were living in the old single thread of execution world we could measure how long execution remains within an ISR, perhaps by using a physical I/O line as a semaphore and an external oscilloscope to monitor the line.

You may well ask "Why measure this?" well historically while the ISR was running nothing else could interrupt it executing which meant even if there was an event that was more important it would not get the CPU until the first ISR was complete. This was known as IRQ latency which was undesirable if you were doing something that required an IRQ to be serviced in a timely manner (like playing audio)

This is no longer how things are done while the top half runs with IRQ disabled many are threaded interrupt handlers and are preemptable (I.e. can be interrupted themselves) which leads to the first issue with measuring ISR time in that the ISR may be executed in multiple chunks if something more important interrupts. Indeed it may appear an ISR has taken many times longer one time than another because the CPU has been off servicing multiple other IRQ.

Then we have the issue that Linux kernel drivers often do as little as possible within their ISR, often only as much as is required to clear the physical interrupt line. Processing is then continued in a "bottom half" handler  this leads to ISR which take practically no time to execute but processing is still being caused elsewhere in the system.

The next issue is the world is not uniprocessor any more, how many processors does a machine have these days? even a small ARM SoC can often have two or even four cores. This makes our timing harder because  it is now possible to be servicing multiple interrupts from a single peripheral on separate cores at the same time!

In summary measuring ISR execution time is not terribly enlightening and almost certainly not what you are interested in. The actual question is much more likely that you really want to be examining something that the ISR time was an historical proxy for like IRQ latency or system overheads in locking.

2 comments:

  1. little correction : «I probably ought to explain» :)

    Very interresting article, i knew a little about interruptions, but i learnt a few things, thanks :)

    ReplyDelete