Artha is a handy thesaurus based on WordNet with distinct features like global hot key look up, passive desktop notification, regular expression based search, etc. Artha may be used as a free open-source replacement/clone to the proprietary WordWeb Pro thesaurus (which is also based on WordNet) on Unix-like and Windows operating systems.
OpenSearchServer is a stable, high-performance search engine and a suite of high-powered full text search algorithms. Documents can be indexed in sixteen languages. Multi-lingual analyzers slice sentences into words, then run lemmatisation algorithms on words based on the document's language. Numerous document formats are supported, such as XML, HTML/XHTML, PDF, Word, PowerPoint, RTF, OpenOffice, plain text, MP3/4, Ogg, FLAC, etc. The Web interface, built around the Zkoss framework, provides an easy way to manage OSS. The integration is fast using the PHP client or the API (XML over HTTP). The crawlers of OpenSearchServer go through Web sites, file systems, and databases to rapidly and easily build your index.
EasyGIS is a simple way to share and publish geographic data. It presents a number of core features that allow you to share and publish geographic localized data sets on the Web. It has a simple Web user interface for integration with external data and service providers to prepare so-called GIS mashups. Built-in support for Google Maps and OpenStreetMap serves as a good basement for thematic mapping. All system services are exposed via a RESTful API. With the REST Web resources, external clients can leverage the search and map rendering functionality, embedding the resources into external applications and sites.
Associations Indexing Service (AIS) was originally done as an extension of human memory for tagging (storing under personal keywords and associations) resources, URIs, bookmarks, and memos (for fast access to the information in future) by using the same keywords or queries, similar to popular search engines. It can be seen as a local search engine, used as an automatic indexer of big file hierarchies (e.g. personal archives or files repositories). It is based on Lucene, so the application will remain very fast with any size index.
x2search is a crawler based on machine learning algorithms that finds pages and documents that are similar to given positive and different to given negative examples. The learned classifiers can be exported and saved for later reuse. It features multiple settings for searching by domain/server, etc. and has a plug-in mechanism for adding document types to be searched.
bot_recognizer is a PHP class that can be used to recognize Web robots and handle them specially. It can check the IP address of the computer or the user agent of the browser program currently accessing the Web server to determine if it is within a range of IP addresses known to be of Web robots like search engine site crawlers or even malicious crawlers. The class can call different callback functions depending on the type of crawler that was identified. It can also be set on debug mode by taking a given IP address or string as user agent instead of the user agent string sent by the accessing browser. The Web robots information is stored in a database. The class can load that database from a text data file.
HogTrans provides an automatic word translation engine built on statistics of text translations used for free software. It basically provides an automatically created dictionary with multiple translations and example usages for each. HogTrans can import translations from standard GNU .mo-files.
Search Engine Referrals Confluence Plugin is a Confluence plugin that displays the most recent searches on Google. When someone enters search terms on Google and clicks on one of the search results, the search terms are sent to the found Web site in the "referer" part of the HTTP request. This plugin collects this information and displays it to the user. A click on the search result opens the page Google has found. A click on the search engine icon at the left displays the corresponding Google search result page.