Projects / screen-scraper

screen-scraper

screen-scraper is a tool for extracting data from Web sites. It works much like a database that provides access to the information of the Web. It provides a graphical interface allowing you to designate URLs, data elements to be extracted, and scripting logic to traverse pages and work with scraped data. Once these items have been created, screen-scraper can be invoked from external languages such as .NET, Java, PHP, and Active Server Pages. It can be scheduled to scrape information at periodic intervals, and can automatically write extracted data to CSV files.

Tags
Licenses
Operating Systems
Implementation

Recent releases

  •  15 Jan 2007 18:50

    Release Notes: This release contains a number of feature enhancements and bugfixes, including being able to drag and drop objects into folders, several new features in the logging window such as automatic scrolling, being able to call scripts from other scripts, backing up the database automatically, and the addition of a new library used to facilitate saving scraped data as XML.

    •  15 Jan 2007 18:49

      Release Notes: Several bugfixes and minor features have been added, including automatic backup of the database, enhanced HTML rendering and HTML stripping, fixing an error that caused duplicate scripts to appear at times on import, and fixing multiple errors relating to international character sets and non-ASCII characters.

      •  28 Mar 2006 19:00

        Release Notes: The http-client library has been updated to accept all SSL certificates. In certain situations, the database was closed prematurely when screen-scraper was invoked from the command line.

        •  28 Mar 2006 01:27

          Release Notes: This release fixes a particularly annoying bug that slipped into version 2.7 related to running from the command line. It also contains a few other minor bugfixes.

          •  14 Mar 2006 14:08

            Release Notes: screen-scraper can now generate RSS feeds from scraped data. The session.addToSessionVariable method was added. Log messages have been enhanced and clarified. All of screen-scraper's ports may now be set in the properties file. A number of miscellaneous bugfixes have been made.

            Screenshot

            Project Spotlight

            OpenStack4j

            A Fluent OpenStack client API for Java.

            Screenshot

            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.