Projects / C-ICAP Classify

C-ICAP Classify

C-ICAP Classify is a module that allows classification (labeling) of Web pages, images, and soon video based on content. Labels are placed in HTTP headers. Any PIC-Label META tags are exported into HTTP headers. This allows for creation of very flexible filters according to rules defined by the user, using the ICAP enabled proxy's ACLs. It is not a URL filter, so implementing it with sslBump or similar proxy technologies makes it very difficult to bypass. Text classification is done using Fast Hyperspace (based on Hyperspace from CRM114) and/or a Fast Naive Bayes. Image and video (when implemented) use haar feature detection from the OpenCV library.

Tags
Licenses
Operating Systems
Implementation

Recent releases

  •  29 May 2014 16:44

    Release Notes: This release contains a bugfix (rarely triggered memory leak) and optimizations. These optimizations can be quite significant in reducing time to classify for larger documents. There are also large gains in iterative training with fnb_learn, especially with the -d option.

    •  07 Apr 2014 12:46

      Release Notes: Copyright dates were updated in several files. html.[ch] now better removes HTML. This may speed things up for some pages, and may slow it down for others. It is necessary to retrain your data. You may or may not need to do a full re-stage if you are using the smart training tools. The regex was further modified speed and accuracy. Various fixes were applied to solve bugs and performance issues with stripping HTML/XML wrapping from text. In srv_classify.c, the reg->args zero length check was fixed.

      •  30 Dec 2013 21:14

        Release Notes: With c-icap-0.3.2 and c-icap-modules-0.3.2, and this release of C-ICAP Classify, the remaining known bugs have been fixed. If you use C-ICAP Classify, upgrade to the mentioned releases. The changes: Hybrid radix/binary search to avoid unnecessary cache misses; better grouping of functions to avoid cache misses; fixes comments to be clearer and more accurate; avoids problems in computeOSBHashes if too little data is present; fixes locking on error conditions; better handles maximum memory size for in memory objects; and better handles moving from memory to disk and disk to memory.

        •  02 Nov 2013 17:34

          Release Notes: This release fixes several stability issues. Documentation has also been updated to cover more parts of the program, how to use it, and how to train text data for it.

          Screenshot

          Project Spotlight

          OpenStack4j

          A Fluent OpenStack client API for Java.

          Screenshot

          Project Spotlight

          TurnKey TWiki Appliance

          A TWiki appliance that is easy to use and lightweight.