Projects / webStraktor

webStraktor

webStraktor is a programmable World Wide Web data extraction client. It features a scripting language to facilitate the collection, extraction, and storage of information available on the Web, including images. The scripting language uses elements of regular expression and XPath syntax. The standard webStraktor output format is XML based, either in ASCII, UTF-8, or ISO-8859-1 (Latin1). It adheres to the Robots Exclusion Protocol and can be configured to operate anonymously by connecting through proxy servers. Exhaustive logging and tracing information are provided.

Tags
Licenses
Operating Systems

Recent releases

  •  21 Apr 2014 15:20

    Release Notes: Initial Freecode announcement.

    Screenshot

    Project Spotlight

    OpenStack4j

    A Fluent OpenStack client API for Java.

    Screenshot

    Project Spotlight

    TurnKey TWiki Appliance

    A TWiki appliance that is easy to use and lightweight.