While systems like FreeBSD and Darwin simply use a 64bit off_t as the default, there are also systems like Linux and Solaris that do not. Instead, they implement what's called the "transitional API" in the largefile specifications; many calls are given a 64 cousin, so that there are both "open" and "open64" in the C library, as well as "lseek" and "lseek64".
Using a define like -D_LARGEFILE_SOURCE will bring about some magic so traditional calls are remapped to the transitional API, so your source code might read "concat(...)" and "lseek(...)", but it will really be linked to the symbols "open64" and "lseek64", and will make off_t a 64bit entity.
As a result, however, it is highly dangerous to use off_t in header files in largefile-sensitive systems. Most software writers do not expect that an integral type like off_t can change its size; it is just used in the exported interface, as in making a new call "off_t my_lseek(int,off_t,int)".
In reality, the situation is similar to the old DOS modes, with a "small" mode and a "large" mode for compiling source code. The library code might be compiled with a 64bit off_t, while the application code using the library is compiled with 32bit off_t, possibly ending with a callframe mismatch.
A similar problem arises when using off_t in exported structures, as these can have different sizes and offsets for the member variables therein. A library maker should take measures to defend against improper off_t sizes, possibly making dualmode func/func64, as the C library does. Unfortunately, many software writers have not been aware of the problem.
Another problem is described in the section of the largefile documents that deals with holes in the protection system. It stems from the fact that some file descriptors might be opened in largefile mode while others are not, and they can even be transferred from a non-largefile application into largefile libraries, and vice versa.
The 64on32 transitional API is trying to support this scheme, mostly by introducing a new error code EOVERFLOW that will be returned when a "small"file application accesses a file that has grown beyond the two gigabyte limit due to calls from other software parts compiled as "large"file.
However, most "small"file software does not expect this error code, and many software writers do not check the return value of lseek. This can easily lead to data corruption when the file pointer is not actually moved.
Most of the software problems arise on the side of "small"file applications. Generally, one should compile all software as largefile as soon as the system provides these interfaces. This is pretty easy; AC_SYS_LARGEFILE in autoconfed software can do it, or just some _LARGEFILE_SOURCE to be defined somewhere.
A lot of software, however, is not aware of a need to enable largefile mode somewhere. Hundreds of Open Source applications are compiled with 32bit off_t by default. It's simply been forgotten, and it would take a lot of work and publicity to make everyone aware, with the only result that the next new developer would miss it again.
Because of this, we should use technical support tools to track the problem area of mixing compiled code from sides which support largefile and those which do not yet do so.
A Perl script to do this can be fetched from http://ac-archive.sf.net/largefile/. It tries to classify binaries and libraries according to whether they are using "-32-" or "-64-" modes by looking for fopen() vs. fopen64() in the list of dynamic symbols. Each argument binary is checked, along with the dynamic dependencies it has. If there are mismatches, a list of them is printed.
Furthermore, the script can detect when a library is trying to exhibit itself as dualmode, exporting both func() and func64() calls (libgstreamer is an example of a library which does this). For these, it is okay that software may be in either -32- or -64- mode when linking to them, so actually, only three combinations are rejected: -64- which depends on -32-, -32- which depends on -64-, and 3264 dualmode libraries which depend on simple -32- libraries.
When the script is run on /usr/lib/*.so (or just /usr/bin) on a contemporary Linux system, it detects a lot of largefile mismatches. The common user will not experience any problems with that, so long as no file being handled is larger than two gigabytes. (Note that Unix98 mandates that base utilities like "cat" and "cp" be compiled with largefile support.)
Open Source OS distributions, however, carry a lot of code from many different sources. In particular, there are several graphical frontends of the filemanager type which are not compiled in largefile mode. Sooner or later, the problem will come up. It would be best if no rpm/deb/whatever binary package has a largefile mismatch in the first place.
This can be done if packagers and distro makers check binary packages while making them. It would be easy to integrate the checking routine into the set of post-%files tools (as they are called in RPM), which need to check the libraries and binaries anyway for dependent libraries (and do a "chrpath" on them, since they have been relinked in the DESTDIR staging area).
The future should see all packages compiled in largefile mode, eliminating any problems with mixing libraries from different sides. A distro maker can ensure that, and if it means a few patches, that's good, since it makes the software more portable to FreeBSD/Darwin.
At some point, one should really think about dropping the 32bit off_t default altogether, as was done with FreeBSD. Linux 2.4 and glibc 2.2 should be ready for this the step, leaving the days of "small"files behind.