10 projects tagged "Parser"
csvgrep is a commandline program which enables users to execute searches on text-delimited files using a rudimentary query language. Its query language is bound to simplicity and expressivity, to be easily comprehensible. It aims at replacing both grep and awk when you are challenged to retrieve information from a text-delimited file based on the content of a specific field (or column). You can get what you want using the semantic already in the file’s underlying structure.
Apache OpenNLP is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services.
Flexc++ is a tool for generating scanners based on regular expressions. Flexc++ is highly comparable to the programs flex and flex++. The goal was to create a similar program, but to implement it completely in C++. Most flex and flex++ grammars should be usable with flexc++ with minor adjustments.
The Parsing Expression Grammar Template Library (PEGTL) is a C++0x library for creating parsers according to a Parsing Expression Grammar (PEG). Grammars are embedded as regular C++ code, created with template programming (not template meta programming). These hierarchies naturally correspond to the inductive definition of PEGs. The library extends on the subject of PEGs with new expression types, actions that can be attached to grammar rules, and mechanisms to ensure helpful diagnostics in case of parsing errors. PEGs are superficially similar to Context-Free Grammars (CFGs).
YAJL (Yet Another JSON Library) is a small event-driven (SAX-style) JSON parser written in ANSI C, and a small validating JSON generator. It's highly portable, data representation independent, fast, generates verbose error messages including context of where the error occurs in the input text, can parse JSON data incrementally off a stream, and is tiny.