Class TesseractOcrProvider
Represents an OCR provider that uses the Tesseract engine to extract text from an image.
Inheritance
Inherited Members
Namespace: Telerik.Windows.Documents.TesseractOcr
Assembly: Telerik.Windows.Documents.TesseractOcr.dll
Syntax
public class TesseractOcrProvider : IOcrProvider
Constructors
TesseractOcrProvider(String)
Creates a new instance of the Tesseract
Declaration
public TesseractOcrProvider(string dataPath)
Parameters
System. The path to the parent directory containing the tessdata directory. Ignored if the TESSDATA_PREFIX environment variable is set. If set to "." the tessdata directory should be in the same directory as the executable. |
Properties
DataPath
The path to the parent directory containing the tessdata directory. Ignored if the TESSDATA_PREFIX environment variable is set. "." by default. If left unchanged, the tessdata directory should be in the same directory as the executable.
Declaration
public string DataPath { get; set; }
Property Value
System.
|
LanguageCodes
The language codes to use for the Tesseract OCR engine. You can find the corresponding trained data for each language and their codes here: https://github.com/tesseract-ocr/tessdata
Declaration
public List<string> LanguageCodes { get; set; }
Property Value
System.
|
ParseLevel
Indicates the level of parsing that the OCR processor will perform.
Declaration
public OcrParseLevel ParseLevel { get; set; }
Property Value
Implements
Methods
GetAllTextFromImage(Byte[])
Extracts all text from an image and returns it as a single string.
Declaration
public string GetAllTextFromImage(byte[] imageBytes)
Parameters
System. The bytes of the image. |
Returns
System. The entire text as a string. |
Implements
GetTextFromImage(Byte[])
Extracts the text from an image and returns the words and their bounding rectangles.
Declaration
public Dictionary<Rectangle, string> GetTextFromImage(byte[] imageBytes)
Parameters
System. The bytes of the image. |
Returns
System. Words with corresponding bounding rectangles |