Projects / Beautiful Soup

Beautiful Soup

Beautiful Soup is a self-contained parser that makes screen-scraping easy. It parses both good and bad HTML and XML and offers methods for traversing the parse tree and extracting specific parts of a document.

Operating Systems

Recent releases

  •  07 Jun 2006 04:47

    Release Notes: Beautiful Soup can now convert invalid HTML or XML into something approaching XHTML or valid XML.

    •  02 Jun 2006 18:00

      Release Notes: This release escapes all special XML characters contained in attribute values. 2.x method names have been reintroduced for backwards compatibility. There are other minor bugfixes.

      •  29 May 2006 01:14

        Release Notes: Beautiful Soup now autodetects document encodings and converts them to Unicode. Methods have been added for manipulating the parse tree. You can now parse only part of a document, saving time. The API has been cleaned.

        •  19 Sep 2005 16:08

          Release Notes: Several parsing bugfixes and a fix for a serious performance problem were made.

          •  05 May 2005 05:13

            Release Notes: Several new ways to search a parse tree were added. Some minor bugs were fixed. Search performance was improved.


            Project Spotlight


            A Fluent OpenStack client API for Java.


            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.