Redland is a set of C libraries providing a high-level API for the Resource Description Framework (RDF), allowing it to be stored, parsed, serialized, queried, and manipulated. It has an object-based, modular design and comes with detailed reference documentation and examples. Redland supports all RDF vocabularies such as FOAF, RSS 1.0, Dublin Core, DOAP, and OWL, the query languages SPARQL and RDQL, and all RDF syntaxes including Turtle, RDF/XML, RDF/JSON, RSS, Atom, RDFa, and GRDDL.
Solr is an enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g. Word and PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.
Nutch is highly scalable Web searching software which builds on top of Apache Hadoop and Lucene Java. Key features include a Web crawler, indexer, crawl management tools, parsers for HTML, PDF, DOC, and several other document formats, and an expandable architecture that allows you to plug in additional functionality such as document parsers, custom scoring algorithms, custom content parsers, protocols, and more.
Python Web Graph Generator is a threaded Web graph (Power law random graph) generator. It can generate a synthetic Web graph of about one million nodes in a few minutes on a desktop machine. It supports both directed and undirected graphs. It implements a threaded variant of the RMAT algorithm. A little tweak can produce graphs representing social networks or community networks. It can also output connected components in a graph.