[uClinux-dev] Timer bug in all m68knommu architectures

Paul McGougan paul.mcgougan at braintree.com.au
Tue Mar 18 02:54:58 EST 2003


I've come across a timer bug that I believe affects at least all m68knommu
architectures, although I have only experimentally proven it on the 5272.

It is evident when using the gettimeofday system call, which on-calls the
function do_gettimeofday which is show next.

void do_gettimeofday(struct timeval *tv)
{
    extern volatile unsigned long wall_jiffies;
    unsigned long flags;
    unsigned long usec, sec, lost;

    read_lock_irqsave(&xtime_lock, flags);
    usec = mach_gettimeoffset ? mach_gettimeoffset() : 0;
    lost = jiffies - wall_jiffies;
    if (lost)
        usec += lost * (1000000/HZ);
    sec = xtime.tv_sec;
    usec += xtime.tv_usec;
    read_unlock_irqrestore(&xtime_lock, flags);

    while (usec >= 1000000) {
        usec -= 1000000;
        sec++;
    }

    tv->tv_sec = sec;
    tv->tv_usec = usec;
}

The flow of this function is designed so that:
1. do_gettimeofday is called
2. read_lock_irqsave is acquired and interrupts are locally disabled
3. The timer up-count is read from the relevant timer register
4. Our local time structure is calculated from the kernel xtime variable
plus the time differences due to any pending timer bottom halves (i.e. the
lost variable) and the upcount we previously read.

The problem is shown when the following sequence occurs:
1. do_gettimeofday is called
2. read_lock_irqsave is acquired and interrupts are locally disabled
***
2A. The relevant timer now reaches its terminal upcount and in hardware is
restarted (back at 0). Interrupts are disabled so jiffies does not get
incremented before the calculation of lost.
***
3. The timer up-count is read from the relevant timer register
4. Our local structure is calculated from the kernel xtime variable plus the
time difference due to any pending timer bottom halves (i.e. the lost
variable) and the upcount we previously read.

The problem is that the upcount register has reset to zero, but our
calculations were based on a jiffies value from before the timer is
restarted, i.e. very large.

The result is that if you issue calls to gettimeofday at a high resolution
you will sometimes get the famous "time apparently going backwards" problem
as can be shown here:

Curr: 943920148.230002  Last: 943920148.239752
Curr: 943920148.570005  Last: 943920148.579755
Curr: 943920148.900000  Last: 943920148.909751
Curr: 943920149.240003  Last: 943920149.249753
Curr: 943920149.930002  Last: 943920149.939752
Curr: 943920150.270005  Last: 943920150.279755
Curr: 943920150.600001  Last: 943920150.609751

These have happened because the Current time value is actually incorrect.
For example, the first one 943920148.230002, should really have returned
943920148.240002.

I have attached a patch that can be applied to the latest uClinux-dist (i.e.
uClinux-dist-20030305). It only patches the 2.4.x kernel tree. It applies a
platform specific time code fix for the 5272. I believe that a similar fix
should be simple enough for the other m68knommu chips, but I don't have
enough knowledge of those architectures to produce them.

Paul McGougan
Braintree Communications
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uClinux-dist-20030305.timer.problem.patch
Type: application/octet-stream
Size: 1106 bytes
Desc: not available
URL: <http://mailman.uclinux.org/pipermail/uclinux-dev/attachments/20030318/47355fab/attachment.obj>


More information about the uClinux-dev mailing list