Solr-Connector-Files crawls and indexes directories and files from your filesystem (whatever is mountable to Linux) into Apache Solr. It features extraction of file contents with Tika, which extracts metadata and text form many document and file formats. It also integrates automatic text recognition (OCR) for images, photos, and PDFs using Tesseract OCR.
|Tags||Solr Files directories OCR PDF Images commandline|
|Operating Systems||Linux Debian GNU/Linux Ubuntu Linux|