Projects / DataparkSearch

DataparkSearch

DataparkSearch is a Web search engine tool. It features support for http, https, ftp, nntp, and news URLs, htdb virtual URL support for indexing SQL databases, text/html, text/xml, text/plain, audio/mpeg (MP3), and image/gif mime types built-in support, external parsers support for other document types, the ability to index multilangual sites using content negotiation, searching of all of the word forms using ispell affixes and dictionaries, stopwords and synonyms lists, boolean query language support, results sorting by relevancy, popularity rank, last modified time, and importance (a multiplication of the relevancy and popularity ranks), support for various character sets, and phrases segmenting for the Chinese, Japanese, Korean, and Thai languages. It has accent-insensitive search, mod_dpsearch for Apache, and support for internationalized domain names.

Tags
Licenses
Operating Systems
Implementation

Recent releases

  •  25 Jan 2010 11:59

    Release Notes: The ReverseAliasProg, ExcerptMark, SectionSQL, and MaxHrefsPerServer commands have been added. A faster hash function was implemented. The Match command has been added for stopwords files. The Esitemap command has been added for indexer. Multithreaded result sorting has been implemented (with up to 32 threads in parallel). Support for libextractor has been added. Acronym files have been extended by regex based transformations. The maximim length of log records has been enlarged to 480 bytes, the MUST size for a syslog message. The Limit command has been extended to accept SQL-based limits.

    •  25 Apr 2009 00:14

      Release Notes: The busy timeout has been increased for SQLite. SkipHrefIn and SEASections commands were added. A Disallow command in robots.txt no longer leads to document removal from the database. A Quffix command was added. Searchd now cleans up the search cache on config loading/reloading. Time zone processing has been added for Last-Modified header and meta. A MakePrefixes command was added. Several bugs were fixed.

      •  31 Dec 2008 21:27

        Release Notes: CAS-based synchronization has been implemented for the i386/x86_64 platform. The ActionSQL, FastHrefCheck, SubDocCnt, andSubDocLevel commands have been added. Support for the KOI8-C (an extension of KOI8-R with old-Russian letters) charset has been added. HrefSection processing has been fixed in the XML parser. A $(url.directory) meta-variable has been added. An allin<section>: operator has been added to the search query language.

        •  27 Jul 2008 06:53

          Release Notes: The strict option has been added for the Section command. A word break has been added for French-style contractions. The MaxSiteLevel command now accepts a negative argument to group URLs on a subdirectory basis. Some German letters are automatically replaced by bi-letter combinations in accent-free search mode. SQLite3 support has been added. Indexing has been fixed for documents with several versions in different languages. Relevance calculation has been improved for cases when acronyms and abbreviations are used.

          •  13 Feb 2008 06:41

            Release Notes: String tokenization has been improved. A subdocument indexing technique has been implemented. The LongestTextItems command has been added. Support has been added for the georgian-academy and georgian-ps charsets. The HTML parser now skips indexing within tags with visibility set to "none" or "hidden" in the style attribute. A $*(x) type of template meta-variable has been added. The PagesInGroup command has been added. The ServerWeight command has been fixed.

            Screenshot

            Project Spotlight

            OpenStack4j

            A Fluent OpenStack client API for Java.

            Screenshot

            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.