[uClinux-dev] Shared libraries

pauli at snapgear.com pauli at snapgear.com
Sun Apr 28 20:10:17 EDT 2002

Hi all,

Stefan Heinzmann wrote:

> I'm new to the list, so forgive me if I missed something important. I
> found a contribution by Pauli from SnapGear posted on 2002-04-01
> regarding shared libraries on uClinux when I was trying to find
> information on how shared libraries are implemented in systems without
> virtual memory. I hope the posting date does not indicate that it is a
> joke.

I actually posted it on the 2nd.  Most people would have received it on the 
1st due to our time zone :-)  It definitely isn't a joke.

> Pauli describes a method that may well work, but I find it wanting
> (assuming that I have understood it correctly). There are too many
> compromises in there for my taste. The worst idea IMHO is the stealing
> of address bits. That is something Apple has already regretted in the
> history of 68k Macs.

As a Mac programmer in a previous life, I'm well aware of this ;-)
The simple truth is I couldn't think of a better way to do it without 
introducing substantial overheads.  I agree that the stealing of address bits 
isn't nice.  I'm happy that all the other limitations are not a real problem 
and I'm willing to live with the address limitation for the moment.  I can 
always take a few bits back again later -- we don't generally need full binary 
compatibility in the embedded space.  Source compatibility on the other hand...

> The operating system assigns sequential slot numbers to each loaded
> shared library, starting from 1. If a library gets unloaded, its slot
> can be reused by the next library that gets loaded, but loaded
> libraries keep their slot number until they get unloaded.

Immediately, a relocation record must now contain the target library's name or 
other reference rather than its ID and the OS is responsible for the 
translation.  This will cost space and probably a small amount of time.  Our 
scheme folds these together and saves space because of it.

More importantly, the GOT entries.  These are simply addresses.  We'd now have 
to output relocation records for *all* of these too (to identify the library 
the address comes from).  This would cost a fair chunk of space.  At present, 
we know about the GOT and automatically relocate it thus avoiding many 
relocation records.

> The shared library does not need to be compiled to use pc-relative
> addressing, since the dynamic linker knows where it will end up in the
> address space and can do the necessary relocations when loading the
> library. On the other hand, fewer relocations mean quicker loading, so
> pc-relative might still be a good idea.

>From experience, a pc-relative executable on the ColdFire takes up less space 
than a fully relocatable one.  Our flat file loader supports both formats and 
in general the extra relocations take more space than the overheads of 
supporting execute in place.  We get a slight performance boost by not 
executing in place but we've not needed this yet.

Also, going to absolute code loses the biggest win we've had: execute in place 
is a god-send.  It gains more than shared libraries in terms of cramming 
functionality into a small unit.  Sharing text segments is essential.  Sharing 
library is just an extra saving.  I don't think I can emphasise this enough.  
As soon as you've one absolute address in your text segment, this goes.

> The import table is set up by the dynamic linker; the application or
> the libraries only read from it. It contains pointers to the static
> data of each shared library. The offset into the table is calculated
> from the library's slot number (i.e. offset = slot_number*-4). A
> procedure in a shared library can thus get a pointer to its own static
> data with the following instruction:
>     MOVEA (offset,A5),Ax
> The dynamic linker needs to fix up the offset when loading the shared
> library (by which time it knows the slot number).

Another relocation type and even more relocation records.
One per procedure in fact.  More space.

What happens if the offset exceeds 32k?  Looks like we're now limited to a 32k 
data segment.  That is definitely not acceptable.  Okay, use multiple GOTs 
located together and access them the same way.  Now we're back to a 32k 
overall GOT limit.  More acceptable but significantly more restrictive then 
with our implementation of a 32k per library limit (remember shared libraries 
require more GOT entries than a stand alone program).

Now, do we make Ax A5 or something else?  If it is A5 then we've got the call 
back into the main program to concern ourselves with (the called back 
procedure has to set up A5).  If it isn't then we've lost another of our 
fairly precious Ax registers.  I went for the main program being PIC too to 
force the A5 reload.

> I can not currently see where this scheme breaks. It avoids several
> problems:
> -- No limit on the size of the application (no address bits stolen)
> -- No limit on the number of shared libraries
> -- No change for code that isn't in a shared library
> -- Shared libraries pay a small price for accessing static data
> -- Static data overhead is one import table per process

-- Requires a clever dynamic loader.

-- Uses more space.

-- Requires significantly larger modifications to existing tools and to the 
flat file loader and format.  Backward compatibility becomes an issue here 
(we're trying to not break existing build trees for any architecture.

-- Requires multiple passes over libraries and executables prior to loading a 
program.  We don't necessarily know what libraries are referenced ahead of 
time, we have to scan for them.  Think compressed binaries.  Solve this by 
breaking the data segment stuff into multiple chunks and allocating 
separately. Now we cannot use the proposed scheme for setting Ax up.

> Have I overlooked anything?

* There are a few problems, none of which are insurmountable but they will cause problems and they will be more troublesome to deal with.

* You will be paying space costs in several places.  Some of these are unlikely to be small.

* I'm probably being a little harsh with my comments ;-)

* Our scheme is already implemented.

>from what I've heard, the Ridge Run shared libraries for ARM uses elf format executables and libraries and avoid the limitations of our implementation.  If your problem is large enough to find our limitations limiting, there is an alternative already.



This message resent by the uclinux-dev at uclinux.org list server http://www.uClinux.org/

More information about the uClinux-dev mailing list