Calc is arbitrary precision arithmetic system that uses a C-like language. It's useful as a calculator, an algorithm prototype, and as a mathematical research tool. More importantly, calc provides a machine-independent means of computation. Calc comes with a rich set of builtin mathematical and programmatic functions.
The number perl script will print the English name of a number. One can print names of extremely large numbers (e.g. 1e1234567). Number can be run on the command line, or as a CGI script when run as 'number.cgi'. Number prints names in both the American and European naming system. It can also print the decimal expansion of a number in either naming system.
FNV hashes are designed to be fast while maintaining a low collision rate. The high dispersion of the FNV hashes makes them well-suited for hashing nearly identical strings such as URLs, hostnames, text, and IP addresses. FNV is used in a wide variety of applications including mdbm, DNS, web search engines, NFS, Netnews, etc.
webalizer-2.01-10 patch for your consideration
We have been successfully using modified webalizer-2.01-10 extensively on multiple sites, from the very large to the very small for some time now.
We have made a number of mods to the standard webalizer-2.01-10 distribution as well as built a number of tools to process multiple virtual sites as well as to create summary/rollup stats for all of the virtual sites on a given server.
The topic of this posting are the patches that we have applied to the webalizer-2.01-10 distribution. The URL:
contains 4 patches, 3 of which I recommend to all webalizer users and the 4th is a re-package of the geolizer patch.
does the following:
* ability to process very large log files (> 2GB in size)
* countries patch
Some of the entries on the list are not countries. In some cases the nation state status is contested. In other cases the entry is related to a territory that does not claim to be a country. In some cases what some claim is a country is in dispute by another country. And things like .arpa are not a country.
I recommend that one use the term 'location' instead of
'Nation' or 'Country' to avoid the whole mess. ;-)
Added are some missing locations (from the ISO UN codes and from GeoIP's list). Some location names have been corrected or changed to their official name. Added some more TLDs.
* avoid referrer spamming (IMPORTANT)
Spammers and other low-life forms have been stuffing the "top N referrer" table in order to get webalizer to generate links to their sites ... (perhaps because they think this will improve their search engine placement or perhaps because they wish to direct people to a poisoned web page in an effort to exploit some browser bug?). Whatever the reason, we don't need to give them their links.
This patch turns the "top N referrer" table into just values instead of A tag.
* correctly process log entries made during a leap second
* long referrer and search patch
Quite a few referrer and search strings are between 128 and 256 chars in length. Avoid truncating them.
does the following:
* avoid 32 bit counter overflow
For very busy sites, 32 bit signed counters can overflow. This is particularly when using webalizer to cover a long span of time. This patch converts a few values to be u_int64_t to avoid these numeric overflow problems.
does the following:
* extend the summary page for longer than 12 months
By default, webalizer only keeps the last 12 months of data. And at the start of a month, the oldest month is discarded resulting in only 11+ months of data.
This code gets around the 12 month limit by maintaining a history of older months in a parallel directory ../history.
See the webalizer page:
for an example of this effect.
NOTE: After the 2.hist.patch has been applied, the
should be run on a monthly basis. See the comments in the 2.hist.patch file as well as the track-hist tool itself for details.
The optional 3.geolizer.patch patch:
is If AND ONLY IF you use one the MaxMind
(http://www.maxmind.com/) GeoIP database. It is a just a reapplication of the geolizer.patch that works for Un*x / Linux / GNU-Linux systems after the first 3 patches have been applied.
-=-=- in Summary -=-=-
At a minimum, I'd highly recommend the 0.basic.patch patch.
Large web sites will want the 1.64bit.patch patch (after the 0.basic.patch has been applied). It doesn't hurt smaller sites to have it either.
Sites that want to keep more than 12 months of webalizer stats need the 0.basic.patch, 1.64bit.patch and the 2.hist.patch as well as the track_hist tool.
chongo (Landon Curt Noll) /\oo/\
Share and enjoy! :-)
multisort v1.1.2 rollup patch
I use multisort in conjunction with the
Unfortunately multisort v1.1 has a few minor problems.
Several patches were sent to the author.
I do not know if he received them because I never
received a reply.
I recommend that multisort users consider looking at my
multisort v1.1.2 rollup patch page.
That page contains a patch to multisort v1.1, as well
as a revised multisort source that I call multisort v1.1.2.
multisort v1.1.2 rollup patch fixes a number of issues
related to multisort v1.1:
* Fixed sort bug where
01/Feb/2001:03:26:15 was incorrectly
* Fixed bugs related to very old dates and dates far in
* Fixed bugs related to processing empty input files
* Correctly distinguishes between file EOF and read
* Allows multisort to just process a single file (addresses
wish item in an above comment)
* Added slightly better sanity checks on timestamp
* Fixed a bug where multisort could hang on an I/O
* Correctly computes POSIX Seconds since the
Epoch values with full leapyear rules
* Speedup as per
Demiddelaer's patch (see a previous comment)
* Added -m maxage which will output only
lines less than or equal to maxage
seconds old instead of all lines
* Updated the usage message
NOTE: My rollup v1.1.2 patch is obviously unofficial.