![]() ![]() ![]() Machine processing of document categorization demands for establishing a relation between coded sequence of characters and human perception of the language. The concept of capitalization does not exist. Indian scripts do not show upper-case and lower-case distinction. The Telugu script is the second most major script in India. It is used as the writing system for over 28 languages including Sanskrit, Hindi, Kashmiri, Marathi and Nepali. The Devanagari script is the most widely used Indian script. Latin script represents Western European languages and Devanagari for Hindi Telugu, Kannada and Tamil scripts for Dravidian languages. A script is a system of characters used for writing or printing a natural language. This is reported in Proceedings of International Conference on Document Analysis and Recognition, 1999. ![]() Recently, work has been done for development of such packages for Indian languages. The research on OCRs for Indian scripts is still a challenging task. While a large amount of literature is available for the recognition of Roman, Chinese and Japanese language characters, relatively less work is reported for the recognition of Indian language scripts. They can recognize characters with different fonts and sizes as well as different formats including intermixed text and graphics. These systems can process documents that are typewritten, or printed. Today, reasonably efficient and inexpensive OCR packages are commercially available to recognize printed texts in widely used languages such as English, Chinese, and Japanese. The object of OCR is automatic reading of optically sensed document text materials to translate human-readable characters into machine-readable codes. Optical character recognition is usually abbreviated as OCR. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |