Namespace: BitMiracle.Docotic.Pdf
public PdfCharacterCodeToUnicodeMapper UnmappedCharacterCodeHandler { get; set; }
Public Property UnmappedCharacterCodeHandler As PdfCharacterCodeToUnicodeMapper Get Set
PDF font objects usually define how to map character codes to corresponding Unicode values. However, some PDF producers create PDF files where font objects do not include such data. Using this property, you can instruct the library on how to map character codes from incomplete font objects.
For example, you can save the glyph defined by the character code as an image using a PdfTextRasterizer object. Then you can perform an OCR on the image. See OCR PDF in C# and VB.NET article for ideas on how to implement OCR.
The default handler returns an input character code as the Unicode value:
(charCode) => ((char)charCode.Value).ToString(CultureInfo.InvariantCulture);
You can use the following handler to map character codes to a fixed Unicode value ('?' in this example):
(charCode) => "?";
Use the following handler if you do not want to extract text for unmapped character codes:
(charCode) => null;