Normally, when it comes to image to text conversion, OCR-based tools and applications are used. However, if you feel as if these software are not very accurate or safe to use, you can take things into your own hands and do the same thing manually.
In this post, we will be looking at the code that you can use in Python to convert an image to text without using conventional tools and applications. We will be using Tesseract OCR for Windows for this process.
Steps to Extract Text from Image Using Python
Following are the steps to extract text from image using a python application:
Install Tesseract OCR on Windows
You can download the executable file of a Tesseract for Windows, either 32-bit or 64-bit, depending on your operating system. As a next step, you must configure the Tesseract path in the Environment Variables window under System Variables.
Installing Modules
After downloading the Tesseract, it is necessary to install three modules: pytesseract, opencv-python, and Tesseract. The Pytesseract wrapper is designed to provide an interface to the Tesseract-OCR engine. Furthermore, it can be used as a standalone invocation script to invoke Tesseract.
All image formats supported by Pillow and Leptonica libraries, including PNG, JPEG, GIF, and BMP, can be read by this application. Using the pip tool, you will be able to install these packages.
pip install opencv-python
pip install pytesseract
pip install tesseract
Python Code to Extract Text From Image Using Tesseract
Let us assume that you have the following test image that is located in the same working directory. As a first step, you have created a Python file and imported all the required modules.
# text recognition
import cv2
import pytesseract
As a next step, you need to use the imread() function in order to load the test image from the specified location.
# read image
img = cv2.imread(‘quotes.jpg’)
Now, you have set the custom options for the configuration.
# configurations
config = (‘-l eng –oem 1 –psm 3’)
It is recommended that the following be added to your System variables PATH if you have not yet done so.
# pytessercat
pytesseract.pytesseract.tesseract_cmd = ‘C:/Program Files/Tesseract-OCR/tesseract.exe’
As a next step, you should convert from Image to String using the function image_to_string().
text = pytesseract.image_to_string(img, config=config)
In the final step, you can print the text that has been extracted from the image.
# print text
text = text.split(‘\n’)
print(text)
Complete Code: Extract Text from Image Using Python
Now that you should merge all the above code, let’s run it.
# text recognition
import cv2
import pytesseract
# read image
img = cv2.imread(‘quotes.jpg’)
# configurations
config = (‘-l eng –oem 1 –psm 3’)
# pytessercat
pytesseract.pytesseract.tesseract_cmd = ‘C:/Program Files/Tesseract-OCR/tesseract.exe’
text = pytesseract.image_to_string(img, config=config)
# print text
text = text.split(‘\n’)
print(text)
As a result of the above code, the following output is produced:
This dictionary contains information about the image you entered. A Tesseract scanner is ideal for clean scanning documents, and the image’s text can be easily converted to Word or any other format needed.
Extract Image and Save to text file
As a first step, you converted the image to grayscale and then specified the kernel’s shape and size. In the next step, you located the contours and looped over them, cutting the rectangle area. As a next step, you passed the rectangle area onto Pytesseract for extraction of text and subsequent writing to the text file.
# import modules
import cv2
import pytesseract
# read image
img = cv2.imread(‘quotes.png’)
# set configurations
config = (‘-l eng –oem 1 –psm 3’)
# Convert the image to gray scale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# OTSU threshold performing
ret, threshimg = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV)
# Specifying kernel size and structure shape.
rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (18, 18))
# Appplying dilation on the threshold image
dilation = cv2.dilate(threshimg, rect_kernel, iterations = 1)
# getting contours
img_contours, hierarchy = cv2.findContours(dilation, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
# Loop over contours and crop and extract the text file
for cnt in img_contours:
x, y, w, h = cv2.boundingRect(cnt)
# Drawing a rectangle
rect = cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)
# Cropping the text block
cropped_img = img[y:y + h, x:x + w]
# Open the text file in append mode
file = open(“recognized.txt”, “a”)
# Applying tesseract OCR on the cropped image
text = pytesseract.image_to_string(cropped_img)
# Appending the text into file
file.write(text)
file.write(“\n”)
# Close the file
File.close
Here is an example of what the code above will produce:
Conclusion
And there you have it. That is the code that you can use to convert images to text using Python. If you find the process difficult and time-taking, you can do the same thing with an OCR tool. But then again, for some users, the privacy factor in using these tools can be a deal-breaker.