Removing watermarks from scanned documents can be challenging, especially when they overlap with the text and share the same color values. However, to address your query regarding, “ I there any solution to remove it without affecting the text quality ?”
First, load the scanned document into Matlab using the imread function. Ensure that the image is in a format that Matlab can handle, such as JPEG or PNG.
% Load the scanned document
image = imread('scanned_document.jpg');
To remove the watermark, you need to detect and segment it from the text. One approach is to use image segmentation techniques like thresholding or edge detection to separate the watermark from the text.
% Convert the image to grayscale gray_image = rgb2gray(image);
% Apply thresholding to segment the watermark
threshold = graythresh(gray_image);
binary_image = imbinarize(gray_image, threshold);
Once you have segmented the watermark, we can proceed to remove it from the document. One way to achieve this is by inpainting the detected watermark region.For more information on this function, please refer to inpaintExemplar
% Inpaint the watermark region
clean_image = inpaintExemplar(image, binary_image);
After removing the watermark, it is essential to enhance the text quality to improve OCR accuracy. You can apply image enhancement techniques like contrast adjustment or noise reduction to make the text more legible.
% Enhance text quality (e.g., contrast adjustment)
enhanced_image = imadjust(clean_image);
With the watermark removed and the text quality enhanced, you can now proceed with OCR processing using Matlab's OCR functionality to extract text from the preprocessed document. For more information on ocr function, please refer to ocr
% Perform OCR on the preprocessed document
results = ocr(enhanced_image);
recognized_text = results.Text;
disp(recognized_text);
Hope, this should take care to resolve your problem.