|
|||||||
Artificial-Intelligence Algorithms for Converting Marathi Inscript to Marathi Text |
|||||||
Paper Id :
18228 Submission Date :
2023-10-11 Acceptance Date :
2023-10-22 Publication Date :
2023-10-25
This is an open-access research paper/article distributed under the terms of the Creative Commons Attribution 4.0 International, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. DOI:10.5281/zenodo.10185567 For verification of this paper, please visit on
http://www.socialresearchfoundation.com/innovation.php#8
|
|||||||
| |||||||
Abstract |
It develops an advanced technique to convert marathi inscription to marathi text. The purpose system will use machine learning techniques such as deep learning and advance language processing. The research involved collecting a large amount of marathi script input & outputting it to marathi text to guide the learning algorithm. Various machine learning models have been explored, Convolutional Neural Network (CNNs), and transform model to capture the data connection and detail’s required for accurate transformation. There are many OCR technologies on the market, but OCR technology is not effective in recognizing characters but more research is needed to identify text. Also challenging in recognition of marathiinscripts are discussed in this paper. |
||||||
---|---|---|---|---|---|---|---|
Keywords | Artificial-Intelligence, deep learning, CNNs, OCR recognizing. | ||||||
Introduction | Marathi is Indo-Aryan language spoken about 83
million people in India’s central and western region. In 1966, it was
designated the official language of the state of Maharashtraof India’s 22
scheduled languages. It’s geographical range stretches from the north western
part of Mumbai apart the western coast through goa and eastward over the
deccan. There were approximately 83 million native Marathi language speakers as
per 2011. Many Inscription have been found in ancient Indian cities[1], These
inscriptions are very important and important documents of ancient India to
learnhistorical information on the time and area, cases of the evolution of
language over the centuries[2]. Review of Marathi Language:
There are types of marathi among all the languages
in the world, ranking tenth in the world. Marathi also has some ancient
literature. Marathi is one of the language of Sanskrit origin. The marathi
language first appears in the stone inscription originate from the 11th century
in from the 13thto 20th century nearing it’s middle,
year were written in the modi script, and from 1950 to Sanskrit . use language
as a communication tool. Article while the purpose of language is to express
and understand the thoughts of other, writing is to write characters from two
or more language.Marathi is written is form of Devanagari script, Marathi
consist of 12 vowels, 36 consonants and numbers are given in figure1,2 and 3
respectively. |
||||||
Objective of study | 1. Definitely replace the marathi Inscription characters with their modern marathi equivalent 2. Insure accuracy & fairness in the transformation process 3. Handle variations & context-specific mapping inherent in marathiinscript 4. Develop an efficient & scalable algorithm for large volumes of text 5. Contribute to the preservation & promotion of the linguistic & cultural heritage of the marathi inscriptions 6. Converting Marathi inscript to Marathi text |
||||||
Review of Literature | A programme that can read characters from old Sinhala inscriptions using optical character recognition(OCR). The OCR module makes use of artificial neural network(ANN) and convolutional neural network(CNN) technology. The recognition rates using genius test photos, pre-processed test data, and from training are all included in the evaluation. The selected OCR solution in CNN since it performs better than ANN. CNN can only efficiently identify a character since there is not enough data[3].The article reviews methods for deciphering ancient marathi scripts from stone inscription, focuses on difficulties like background noise and script background similarly doesn’t provide any concerts finding but makes the case for future research on enhancing the precision of recognition[4].Abhishek Tomer, Minu Choudhary, Amit Yerpude[5] by comprehending the script and language in the photo, the author performed a survey investigation central on Indian inscription identification of characters areas and carried out prehistoric inscription analysis.Anush Goel and Akash Sehrawat [6] proposed a system that provides automatic data reading for blind people using Raspberry Pi. uses OCR technology to recognize printed characters using image detection devices and python programming. Convert images to audio format using OCR and Text-to-Speech (TTS). The diagram for the conversion was made using a Raspberry Pi and the was reworked using the Tesseract library and python. Section Use the OpenCV library to process the text and complete the audio output.Shuping Liu, Yantian Xian, Huafeng Li, and Zentao Yu report a new detection model based on MCA and different representations. It provides two useful results for text search. The first is to develop the capacity to summarize and search for articles using the MCA method in summarizing. Article The method used for document search improves document search performance [7].Ancientdocumentsrecognition using statistical feature extraction techniques Preprocessing (Noise Removal, Binarization, etc) using local or global Thresholding Feature extraction CNN(88.95), section of the characters frame, including overlapping and non-overlapping zones, are divided up. Cone pixels may be used to calculate density, which is calculate as the whole number of pixels divided by the number of pixels in each zone[8].it is the next level of segmentation. In their 2015 research, Sachin S. Bhat and H.V. Balachandra Achar sought to use cutting-edge identification algorithms to pinpoint the historical era of various ancient kannada scripts. Their technique included elements for image processing such character set segmentation, noise reduction, feature extraction, classification, and characters from inscriptions was an inventive element. Their MATLAB-based studies, which had an accuracy rate of 80%, proved the viability of their strategy[9].Soumya A and G Hemantha Kumar research focuses on improving the creation of ancient manuscripts using preprocessing and segmentation techniques. They convert scanned images into images in digital format. To enhance results, they present three filtering techniques: Gaussian blur, Laplacian filter, and USM filter. To get better result in terms of image quality and classification accuracy, these filters can be used in variety of combinations, including non-uniform filters[10]. |
||||||
Main Text |
Inscription : The information history or information written on brickwork, stone and other brittle surfaces is call inscription. Indian Inscription may be categorised into four groups : stone, cave, pillar and wall. The written worlds on cave walls are called cave inscription. information written on originally stones is known as stone inscription. later inscription or pillar inscription were curved on artificially finished stone slabs, pillar and other stone materials.The figure 4 is one of the oldest known marathi inscription was unearthed at the foot of the Bahubali statue at the jain temple at Shravanabelagola.
Fig 4. Marathi inscription |
||||||
Analysis | Architecture : This section provides a succinct overview of the general structure of the architecture of converting marathiinscript to marathi text, as shown in figure 5. Which converts marathi in script to marathi text, which also stores Marathi text as output is stored in a text file(given a text file name).The architecture divided mainly into three part namely a. Pre-processing module, b.Text recognition module, and c. Post-processing module
Fig 5. Architecture of converting
Marathiinscript to Marathi text 1. Pre-processing module : the pre-processing
module is responsible for preparing the input image before it is fed into the
text module. it’s main purpose is to improve image quality and improve OCR
accuracy. Here are some preliminary steps are Image Resizing : Increase image
size for better performance, Binary : simplify text extraction by converting
images to binary format,Noise Reduction : Removes unused noise and imperfection
in image that can be caused by imaging. 2. Text recognition module :The module for text
recognition is the core of the OCR system. It will use the pre-processed image
as input and perform correct character/word recognition. There are many ways to
recognize text, including, Optical Character Recognition(OCR) : OCR algorithms
analyze images to recognize & identify characters in text, Documents layout
Analysis: understanding the layout of documents to determine the reading and
distribution of documents.Modern text recognition after uses deep learning
techniques such as convolutional neural network (CNNs) to achieve character
recognition accuracy.
3. Post-processing module : the post-processing
module takes the output recognition by the text module and further optimizes it
to improve overall accuracy & fix anything not visible in the test at
confirmation. Finishing operation include: Language model community, Context
analysis, Confidence score.The post-processing module is required to solve OCR
problem caused by various factors such as poor image quality, hard text, or
invisible characters. |
||||||
Result and Discussion |
The user interface is developed by using Python as front end and OCR as backend where it enables computer to perform activities of reading Marathi inscription and convert it to equivalent Marathi Text as shown in figure 6, 7 and 8 respectively: Text detection process consist of Pre-processing: The image concept is pre-processed to improve its quality and eliminate noise, which will improve the work quality of the next step,edge Detection: Use an edge detection technique to detect high contrast areas often associated with text borders. Text Region Suggestion: Use different methods such as floating windows or anchors to show regions containing text. Text Field Validation: These defined fields will be subjected to a validation process to determine if they contain text. This step usually involves classifying text and nontextual regions using machine learning or deep learning models.Text Localization: After verifying a region of text, a bounding box or pattern is drawn around the text to determine its position.
|
||||||
Conclusion |
Among others, the development of smart algorithms to convert Marathi inscriptions into Marathi sentences represents a great achievement in preserving and deciphering the language and culture of Marathi inscriptions. Built on artificial intelligence, easyOCR and CNNs, the algorithm can translate historical Marathi inscript into Marathi text, paving the way for scholars, historians and language lovers to understand and understand ancient Marathi texts. The algorithm bridges the gap between the past and the present by addressing issues of translation, context-sensitive mapping and translation efficiency, unlocking the rich heritage of Marathi inscriptions and promoting a deeper understanding of the Marathi language and its historical origins. |
||||||
References | 1. K.G.N.D. Karunarathne1, K.V. Liyanage2, D.A.S. Ruwanmini3, G.K.A. Dias4, S.T. Nandasara5," Recognizing ancient Sinhala Inscription Characters using Neural Network Technologies” International Journal of Scientific Engineering and Applied Science (IJSEAS) –ISSN: 2395-3470 3(1), Jan 2017.(https://ijseas.com/volume3/v3i1/ijseas20170104.pdf) 2. P.Nikhi,V.Jayakumar, S.kolkure [2015] OPTICAL CHARACTER RECOGNITION: AN ENCOMPASSING REVIEW [ONLINE] Available at: http://esatjournals.net/ijret/2015v04/i01/IJRET20150 401062.pdf 3. K.G.N.D. Karunarathne, K.V. Liyanage, D.A.S. Ruwanmini, G.K.A. Dias, S.T. Nandasara[2017] Recognizing ancient Sinhala Inscription Characters using Neural Network Technologies. Available at: https://ijseas.com/volume3/v3i1/ijseas20170104.pdf 4. BapuChendage, Rajivkumar Mente, Vikas Magar[2017] A Survey on Ancient Marathi Script Recognition from Stone Inscriptions. https://www.researchgate.net/publication/343808844_A_Survey_on_Ancient_ Marathi_Script_Recognition_from_Stone_Inscriptions 5. Abhishek Tomar , Minu Choudhary , Amit Yerpude [2015] Ancient Indian Scripts Image Pre-Processing and Dimensionality Reduction for Feature Extraction and Classification: A Survey [ONLINE] Available at: http://www.ijcttjournal.org/2015/Volume21/number2/IJCTT-V21P116.pdf 6. Anush Goel, Akash Sehrawat, Ankush Patil, Prashant Chougule and Supriya Khatavkar (2018), “Raspberry Pi based reader for blind people”, International Research Journal of Engineering and Technology (IRJET 2018), vol: 5, issue: 6, pp: 1639- 1642. Available at: https://www.researchgate.net/publication/362491389_Raspberry_Pi_Based_Reader_for_Blind_People 7. Shuping Liu, Yantuan Xian, Huafeng Li, and Zhengtao Yu, “Text Detection in Natural Scene Images Using Morphological Component Analysis and Laplacian Dictionary”, IEEE/CAA JOURNAL OF AUTOMATICA SINICA, 2016. Available at :https://www.ieeejas.net/article/doi/10.1109/JAS.2017.7510427 8. Pritpal Singh, Sumit Budhiraja, “Feature Extraction and Classification Techniques in O.C.R. Systems for Handwritten Gurmukhi Script – A Survey”, International Journal of Engineering Research and Applications (IJERA). Available at :https://www.ijera.com/papers/Vol%201%20issue%204/BQ01417361739.pdf 9. Sachin S Bhat, H.V. Balachandra Achar [2015] Character recognition and Period prediction of ancient Kannada Epigraphical scripts [ONLINE] Available at: http://www.ijarcce.com/upload/2016/si/nCORETech -16/nCORETech%2024.pdf 10. Soumya A, G Hemantha Kumar,” Preprocessing of Camera Captured Inscriptions and Segmentation of Handwritten Kannada text”, International Journal of Advanced Research in Computer and Communication Engineering Vol. 3(5), May 2014. (https://ijarcce.com/wp-content/uploads/2012/03/IJARCCE9G-s-soumya-Preprocessing-of-Camera.pdf) 11. Namrata Dave, "Segmentation Methods for Hand-Written Character Recognition", International Journal of Signal Processing, Image Processing and Pattern Recognition Vol. 8(4), 2015 12. G. Bhuvaneswari1, V. Subbiah Bharathi,” An efficient method for digital imaging of ancient stone inscriptions”, CURRENT SCIENCE, VOL. 110 (2), 25 JANUARY 2016 13. Sonika Narang, M K Jindal, Munish Kumar, “Devanagari ancient documents recognition using statistical feature extraction techniques”, Springer, 2019 14. D.A.S. Ruwanmini, K.V. Liyanage, K.G.N.D. Karunarathne, G.K.A. Dias, S.T. Nandasara, "An Architecture for an Inscription Recognition System for Sinhala Epigraphy”, International Journal of Research – GRANTHAALAYAH, Dias et. al., 4 (12): December 2016 15. Asad Iqbal Khan, AnangHudaya Muhamad Amin,” Real-Time Multi Oriented AncientScript Recognition using Single Layer Hierarchical Graph Neuron (SLHGN)”, International Conference on Applied Science and Technology (ICAST),2018 16. David Rivest-Hénault, Reza Farrahi Moghaddam, Mohamed Cheriet, “A local linear level set method for the binarization of degraded historical document images”, IJDAR (2012) 15:101– 124 DOI 10.1007/s10032-011-0157-5, 5 April 2011© Springer-Verlag 2011 |