18 projects tagged "English"
jsoup is a Java library for working with real-world HTML. It can parse HTML from a URL, file, or string. It can find and extract data, using DOM traversal or CSS selectors. The HTML elements, attributes, and text can be manipulated. It can clean user-submitted content against a safe white-list. jsoup is designed to deal with all varieties of HTML found in the wild, from pristine and validating to invalid tag-soup; jsoup will create a sensible parse tree.
phoneutria is a Web crawler that is multi-threaded, scalable, high performance, extensible, and polite. It can be used to crawl, index, load-test, or even download any Web or enterprise domain and is configurable through a XML configuration file. Phoneutria can be used for either checking the links of a Web site or for load-testing purposes (i.e. the level of politeness can be configured). It provides a plug-in mechanism for further extensions.
BorderFlow implements a general-purpose graph clustering algorithm. It maximizes the inner to outer flow ratio from the border of each cluster to the rest of the graph. The main advantage of the algorithm is that it does not need parametrization to compute results of high accuracy.
Zynaptic Reaction is a flexible asynchronous programming framework for Java which may be used to implement complex event-driven applications. It is heavily influenced by the Twisted programming framework developed by TwistedMatrix Labs for the Python programming language. The focus of the Reaction library is on the concurrency and callback model and as such it is application neutral. It can be used to manage lots of concurrent I/O or to farm out compute intensive tasks to multicore processors. As well as being usable as a basic Java library, Reaction can also run as an independent OSGi service and integrate into any GUI framework you choose.
terp is a modular template engine that integrates tightly into ANT and provides a portable C++ compiler task (aCC, g++, icc, msvc++, SUN CC, xlC) on many platforms and processor architectures, a collection iterator task, a full-featured expression language with host introspection, formatters, selectors, and transformers for expressions, and much more. It can be embedded into Ant, used as a stand-alone or embedded templating engine, or used as a batch or interactive expression evaluator. It is extremely flexible and can be extended with your own types, operators can be overloaded, and properties and methods can be added to types.
Markdown Doclet is a replacement for the standard Sun Java Doclet that allows developers to use Markdown syntax in their Javadoc comments rather than embedding unreadable HTML. The advantage of Markdown is that the syntax allows for HTML to be passed through, allowing the Markdown doclet to be applied to any existing codebase which may contain HTML Javadoc comments. It also includes a patched version of UMLGraph which calls the Markdown Doclet instead of the Sun standard doclet. The doclet also writes a more modern stylesheet for more attractive Javadocs.