ASPseek is an Internet search engine, written in C++ using the STL library. It consists of an indexing robot, a search daemon, and a search frontend (CGI or Apache module). It can index as many as a few million URLs and search for words and phrases, use wildcards, and do a Boolean search. Search results can be limited to time period given, site, or Web space (set of sites) and sorted by relevance (PageRanks are used) or date. It is optimized for multiple sites (threaded index, async DNS lookups, grouping results by site, and Web spaces), but can be used for searching one site as well. It can work with multiple languages/encodings at once (including multi-byte encodings such as Chinese) due to optional Unicode storage mode. Other features include stopwords and ispell support, a charset and language guesser, HTML templates for search results, excerpts, and query words highlighting.
Active PHP Bookmarks (APB) is a web-based program that allows you to store your bookmarks and display them in many useful ways. It will sort your bookmarks with usability in mind, keeping often- used bookmarks at your fingertips. It has a bookmark search, private/public bookmarks, nested groups, usage rankings, popularity sorting, and a quick add feature.
Ajaqs is a Web application that organizes FAQs on a per-project basis. The UI is templatized, the content is internationalized, and the styles are highly configurable. Secure login is provided via webapp security constraints. The backend uses an O-R mapping tool to achieve independence of database-specific queries. RSS feeds provide subscribers continuous updates on a per-project or per-FAQ basis. FAQs are dynamically served up as HTML pages, and can optionally be streamed to clients as PDF content.
Alexandria is a GNOME application to help manage a book collection. It retrieves book information (including cover pictures) from several online libraries, allows you to search for a book (either by EAN/ISBN, title, authors, or keyword), can import and export data into ONIX, Tellico, and EAN/ISBN-list formats, generates Web pages from your libraries, allows marking your books as loaned, saves data using the YAML format, features an HIG-compliant user interface, shows books in different views that can be filtered or sorted, and handles book rating and notes.
Nutch is highly scalable Web searching software which builds on top of Apache Hadoop and Lucene Java. Key features include a Web crawler, indexer, crawl management tools, parsers for HTML, PDF, DOC, and several other document formats, and an expandable architecture that allows you to plug in additional functionality such as document parsers, custom scoring algorithms, custom content parsers, protocols, and more.
Solr is an enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g. Word and PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.