[uClinux-dev] Shared libraries
pauli at snapgear.com
pauli at snapgear.com
Sun Apr 28 20:10:17 EDT 2002
Stefan Heinzmann wrote:
> I'm new to the list, so forgive me if I missed something important. I
> found a contribution by Pauli from SnapGear posted on 2002-04-01
> regarding shared libraries on uClinux when I was trying to find
> information on how shared libraries are implemented in systems without
> virtual memory. I hope the posting date does not indicate that it is a
I actually posted it on the 2nd. Most people would have received it on the
1st due to our time zone :-) It definitely isn't a joke.
> Pauli describes a method that may well work, but I find it wanting
> (assuming that I have understood it correctly). There are too many
> compromises in there for my taste. The worst idea IMHO is the stealing
> of address bits. That is something Apple has already regretted in the
> history of 68k Macs.
As a Mac programmer in a previous life, I'm well aware of this ;-)
The simple truth is I couldn't think of a better way to do it without
introducing substantial overheads. I agree that the stealing of address bits
isn't nice. I'm happy that all the other limitations are not a real problem
and I'm willing to live with the address limitation for the moment. I can
always take a few bits back again later -- we don't generally need full binary
compatibility in the embedded space. Source compatibility on the other hand...
> The operating system assigns sequential slot numbers to each loaded
> shared library, starting from 1. If a library gets unloaded, its slot
> can be reused by the next library that gets loaded, but loaded
> libraries keep their slot number until they get unloaded.
Immediately, a relocation record must now contain the target library's name or
other reference rather than its ID and the OS is responsible for the
translation. This will cost space and probably a small amount of time. Our
scheme folds these together and saves space because of it.
More importantly, the GOT entries. These are simply addresses. We'd now have
to output relocation records for *all* of these too (to identify the library
the address comes from). This would cost a fair chunk of space. At present,
we know about the GOT and automatically relocate it thus avoiding many
> The shared library does not need to be compiled to use pc-relative
> addressing, since the dynamic linker knows where it will end up in the
> address space and can do the necessary relocations when loading the
> library. On the other hand, fewer relocations mean quicker loading, so
> pc-relative might still be a good idea.
>From experience, a pc-relative executable on the ColdFire takes up less space
than a fully relocatable one. Our flat file loader supports both formats and
in general the extra relocations take more space than the overheads of
supporting execute in place. We get a slight performance boost by not
executing in place but we've not needed this yet.
Also, going to absolute code loses the biggest win we've had: execute in place
is a god-send. It gains more than shared libraries in terms of cramming
functionality into a small unit. Sharing text segments is essential. Sharing
library is just an extra saving. I don't think I can emphasise this enough.
As soon as you've one absolute address in your text segment, this goes.
> The import table is set up by the dynamic linker; the application or
> the libraries only read from it. It contains pointers to the static
> data of each shared library. The offset into the table is calculated
> from the library's slot number (i.e. offset = slot_number*-4). A
> procedure in a shared library can thus get a pointer to its own static
> data with the following instruction:
> MOVEA (offset,A5),Ax
> The dynamic linker needs to fix up the offset when loading the shared
> library (by which time it knows the slot number).
Another relocation type and even more relocation records.
One per procedure in fact. More space.
What happens if the offset exceeds 32k? Looks like we're now limited to a 32k
data segment. That is definitely not acceptable. Okay, use multiple GOTs
located together and access them the same way. Now we're back to a 32k
overall GOT limit. More acceptable but significantly more restrictive then
with our implementation of a 32k per library limit (remember shared libraries
require more GOT entries than a stand alone program).
Now, do we make Ax A5 or something else? If it is A5 then we've got the call
back into the main program to concern ourselves with (the called back
procedure has to set up A5). If it isn't then we've lost another of our
fairly precious Ax registers. I went for the main program being PIC too to
force the A5 reload.
> I can not currently see where this scheme breaks. It avoids several
> -- No limit on the size of the application (no address bits stolen)
> -- No limit on the number of shared libraries
> -- No change for code that isn't in a shared library
> -- Shared libraries pay a small price for accessing static data
> -- Static data overhead is one import table per process
-- Requires a clever dynamic loader.
-- Uses more space.
-- Requires significantly larger modifications to existing tools and to the
flat file loader and format. Backward compatibility becomes an issue here
(we're trying to not break existing build trees for any architecture.
-- Requires multiple passes over libraries and executables prior to loading a
program. We don't necessarily know what libraries are referenced ahead of
time, we have to scan for them. Think compressed binaries. Solve this by
breaking the data segment stuff into multiple chunks and allocating
separately. Now we cannot use the proposed scheme for setting Ax up.
> Have I overlooked anything?
* There are a few problems, none of which are insurmountable but they will cause problems and they will be more troublesome to deal with.
* You will be paying space costs in several places. Some of these are unlikely to be small.
* I'm probably being a little harsh with my comments ;-)
* Our scheme is already implemented.
>from what I've heard, the Ridge Run shared libraries for ARM uses elf format executables and libraries and avoid the limitations of our implementation. If your problem is large enough to find our limitations limiting, there is an alternative already.
This message resent by the uclinux-dev at uclinux.org list server http://www.uClinux.org/
More information about the uClinux-dev