Erudite is an application for training and testing back propogation neural networks using the ANNeML (Artifical Neural Network Markup Language) XML format. It supports testing and training neural nets with CSV files and has support for randomized training sets, optional adapting learning rate, sigmoid or hyperbolic tangent transfer functions, optional bias and weight adjustment locking, and more.
Midao JDBC simplifies development with Java JDBC. It is flexible, customizable, and simple/intuitive to use, and provides a lot of functionality: transactions, working with metadata, type handling, profiling, input/output processing/converting, pooled datasource libraries support, cached/lazy query execution, named parameters, multiple vendor support out of the box, custom exception handling, and overrides. With a single jar, it supports both JDBC 3.0 (Java 5) and JDBC 4.0 (Java 6). Midao JDBC is well tested. Not only does it have around 700 unit and functional tests, but it's also tested with the latest drivers of Derby, MySQL (MariaDB), PostgreSQL, Microsoft SQL, and Oracle. Midao is a data-centric project. Its goal is to shield Java developer from nuances of vendor implementation and standard boilerplate code. Midao JDBC is the first library released under it.
Piggydb is a flexible and scalable knowledge building platform that supports a heuristic or bottom-up approach to discover new concepts or ideas based on your input. You can begin with using it as a flexible outliner, diary or notebook, and as your database grows, Piggydb helps you to shape or elaborate your own knowledge. Piggydb is a Web application provided as a self-contained package that contains a Web server and database engine.
Weed-FS is a simple and highly scalable distributed file system. There are two objectives: to store billions of files, and to serve the files fast! Instead of supporting full POSIX file system semantics, it implements only a key-file mapping. Instead of managing all file metadata in a central master, it manages file volumes in the central master and lets volume servers manage files and the metadata. This relieves concurrency pressure from the central master and spreads file metadata into volume servers' memories, allowing faster file access with just one disk read operation. It is modelled on Facebook's Haystack design paper. Only 40 bytes of disk storage are required for each file's metadata, and disk reads are O(1).
Duke is a fast and flexible record linkage engine. It does not use the traditional blocking (sort by key) approach, but instead relies on Lucene. This makes it high-performance (able to process 1,000,000 records in ~10 minutes). Duke can be run from the command line, but also has an API allowing incremental linking applications to be built easily. It supports reading data from CSV, JDBC, SPARQL, and NTriples, and also supports a number of string comparators and string normalizers.