|
OCR Xpress [2.x]
Text and Font Style Recognition
Perform OCR on a digital image, delivering the text in:
- The appropriate Serif, Sans Serif, or Monospace font style
- A font that is closest in form to the recognized font in normal, bold, italic, or bold italic Scaled over a wide range of font sizes
Language Recognition
- Recognize text in English, French, German, Italian, Spanish, Portuguese, Danish, Dutch, Swedish, Norwegian, Hungarian, Polish, and Finnish
- Recognize one language at a time
- Includes dictionaries for all supported languages
- Accepts and applies user-defined words in a custom dictionary
Auto Rotation
- Accepts input images in any orientation and automatically rotates 0, 90, 180, or 270 degrees
- Returns the amount of rotation applied
- Uses the text to determine orientation
- Highly optimized for speed
Character Position Information
- Returns character position information for all characters (recognized with high and low confidence)
- Use this feature to redact or highlight text in the original image using the included NotateXpress component.
- Use this feature to build your own PDF files, using the position information to place the hidden text in the correct location.
- Use the recognition confidence of each character in OCR Xpress and its recognition engine in conjunction with other OCR engines, such as SmartZone, to perform voting, thereby improving the recognition accuracy of both engines.
Text Correction Capability
- Identify characters recognized with low confidence
- Make corrections to text prior to outputting to the document
- Build text proofing and character replacement functions into applications
Image Binarisation
- Create black and white images from 24 bit color and 8 bit grayscale image file formats, with image input and conversion support provided by ImagXpress Document v8
- Retain non-text color regions for reinsertion into the output document
Deskew
Full-page deskew on images with up to 15 degrees of skew
Image Input
- OCR Xpress includes ImagXpress Document (read the full ImagXpress Document v8 product description) for image input (including TIFF, JPEG, JBIG2, and more)
- Input uncompressed in-memory image data
File Output Formats
- The output from OCR Xpress is a digital file containing unformatted text, formatted text, or formatted text plus image data, delivered in a variety of file formats. OCR Xpress Professional outputs all file types listed below, INCLUDING PDF. OCR Xpress Standard outputs all file types listed below, EXCEPT PDF.
- ASCII
- ASCII with no line breaks
- ASCII with line breaks
- ASCII with smart formatting (positioned with spaces)
- ASCII, comma-delimited (one line per field)
- ASCII, tab-delimited
- Excel v2.x (compatible with later versions)
- HTML, with a sub-folder of the same name containing images
- PDFI
- PDF – Searchable Image (Original Image with Hidden Text), PDF version 1.4 file (Professional edition only)
- PDF – Formatted Text and Graphics (Normal), PDF version 1.4 file (Professional edition only)
- PDF – Image only, PDF version 1.4 file (Professional edition only)
- RTF – Used for import to Word, WordPerfect, etc.
- WordPerfect 5.0, WordPerfect 5.1
- All image-only file formats supported by ImagXpress (see "File Format Support" of ImagXpress Document)
Segmentation
- Automatically or manually locate regions of the input image and identify them as either images (whose color can be preserved) or areas containing recognizable text
- Access various regions separately, or recombine into fully-formatted documents such as RTF or PDF files
Create Searchable Text
Easily build a custom system for conversion of scanned images to searchable PDF or RTF files.
Auto Rotation
Automatically correct the orientation of document images, even if they are not to be converted to text.
Image PDF to Text PDF
Add PDF Xpress to open image-only PDF files, then convert them to image-over-text or standard PDFs.
Advanced Preprocessing
Use the industry-leading image cleanup facilities of ImagXpress and optional ScanFix Xpress to yield superior OCR results.
Recognition Dictionaries
Use the included dictionaries for each language, plus a user-customizable dictionary, to improve overall OCR accuracy.
Suspect Character Identification
Easily create a user interface to replace low-confidence characters identified during OCR.
|