Bengali character recognition using feature extraction
Abstract
The Character Recognition Problem can be assumed as a classification
task in which a (portion of an) image is to be given a label among a set of
possible labels that represent the characters under consideration. This is the
fundamental aspect of feature extraction technique .This generic formulation may
lead to quite different settings. Also, if the images of the characters can be
obtained optically, we speak of “Optical Character Recognition” (OCR), as
opposed to other settings in which input data is obtained by other means. OCR
itself can be considered as a subtask of the more general problem of “Document
Analysis or Understanding”, where the goal is to obtain a symbolic representation
of a digital image of the document under consideration that include not only the
recognized text (characters), but also other document components and their
relationship. In this thesis I will discuss various feature extraction techniques and
later I will see how zoning can be used to build an efficient Bengali character
recognition system.
Different feature extraction techniques are used to recognize different
representations of characters for example binary characters, character contours,
skeletons (thinned characters) or gray level sub images of each individual
character. The feature extraction methods are distinguished in terms of
invariance properties, re-constructability and expected distortions and variability
of characters. When a feature extraction method is chosen we need to consider
it in terms of efficient application of the system and time consideration for building
such system.