Apitron PDF Rasterizer is a .NET component that performs high-quality conversion from PDF files to images. It supports complex PDF content including text (with embedded, externally linked, standard, simple, and composite fonts), images, including masked ones, complex paths and fills, PDF Forms, annotation objects of various types, all blending modes, tiling patterns, shading patterns (function-based, axial, radial), transparency groups, masked content (stencil masks, colorkey masks, soft masks), all colorspaces specified by the PDF standard, Adobe Illustrator created files, PDF bookmarks and page navigation support, and text search and highlighting (including non-Latin alphabets).
Solr-Connector-Files crawls and indexes directories and files from your filesystem (whatever is mountable to Linux) into Apache Solr. It features extraction of file contents with Tika, which extracts metadata and text form many document and file formats. It also integrates automatic text recognition (OCR) for images, photos, and PDFs using Tesseract OCR.