20 projects tagged "Linux"
PDFTextStream is a PDF text and metadata extraction library available for Java and .NET. It supports all versions of the PDF document specification (including v1.7, used by Acrobat 8, 9, and X), extraction of text encoded using double-byte character sets (including Chinese, Japanese, and Korean), decryption of documents encrypted using 40-bit, 128-bit, 256-bit, and variable bit length ciphers, and extraction of all document metadata provided by PDF documents (including form data, bookmarks, and annotations). Easy integration with Jakarta Lucene is included, as well as interactive form update capability.
PowerSeek SQL allows you to create, manage, and run your own search engine and directory portal with total control and ease. it is user friendly in every aspect and built for the most demanding uses and customization needs. It comes with an extensive admin panel, the ability to sell link listings, SEO friendly URLs, link reviews/ratings, content sensitive banner rotator, spam filter, broken link checker, custom data fields, mailers, crawlers, pre-designed template sets, reciprocal link checker, image/video/file uploading, RRS feeds, optional PPC functionality, and much more. It can be used for Yellow Pages, real estate, and travel directories, complex product catalogs, image galleries, and more.
Splunk is an engine for machine data. Use Splunk to collect, index, and harness the fast moving machine data generated by all your applications, servers, and devices: physical, virtual, and in the cloud. Search and analyze all your real-time and historical data from one place. Splunking your machine data lets you troubleshoot problems and investigate security incidents in minutes, not hours or days. Monitor your end-to-end infrastructure to avoid service degradation or outages. Meet compliance mandates at lower cost. Correlate and analyze complex events spanning multiple systems. Gain new levels of operational visibility and intelligence for IT and the business.
WebGlimpse is a scalable, feature-rich search engine for indexing your Web site or any collection of local and remote sites you choose. Features include customizable output formats, custom ranking/ordering of hits, fuzzy matching, boolean queries, a Web administration interface for multiple archives, logging of queries, caching of results, and more. Localized search interfaces are provided in multiple languages including Spanish, German, French, Italian, Norwegian, Finnish, Russian, Hebrew, and others. It supports 3rd party filters for indexing PDF, Word, and Excel files. It is free for academic and most nonprofit users.
Douglas Thrift's Search Engine is an indexing search engine for use on small Web sites such as personal or small business sites. It is designed to be very similar to Google for end users and its output is customizable. For indexing, it supports both the Robots Exclusion Protocol and the Robots META Tag.
PHP Content Management System (phpCMS) makes it possible to need only one template for your whole Web site. It allows you to provide dynamic menus with unlimited levels, and use templates and sub-templates without a database. It is search engine-friendly and proxy-friendly, as the pages it generates can not be distinguished from static HTML pages. PHP code can be added to any template and content file with an optional module. It supports the caching of parsed pages and gzip compression.
Amberfish is a general purpose text/XML retrieval utility. It features indexing of both free text and nested fields, built-in support for XML documents, structured queries allowing generalized field/tag paths, hierarchical result sets, automatic searching across multiple databases, efficient indexing, and relatively low memory requirements.