Image processing - Is there any technics to remove this marks ?

20 views (last 30 days)
Hello folks, hope you doing well.
I have a scanned document that i want to pre-process (cleaning up) before jumping into OCR, but i coudn't remove the watermark because it share the same color value as the text and sometime it overlaps the text.
I there any solution to remove it without affecting the text quality ?

Answers (2)

Umar
Umar on 5 Aug 2024

Removing watermarks from scanned documents can be challenging, especially when they overlap with the text and share the same color values. However, to address your query regarding, “ I there any solution to remove it without affecting the text quality ?”

First, load the scanned document into Matlab using the imread function. Ensure that the image is in a format that Matlab can handle, such as JPEG or PNG.

% Load the scanned document

image = imread('scanned_document.jpg');

To remove the watermark, you need to detect and segment it from the text. One approach is to use image segmentation techniques like thresholding or edge detection to separate the watermark from the text.

% Convert the image to grayscale gray_image = rgb2gray(image);

% Apply thresholding to segment the watermark

threshold = graythresh(gray_image);

binary_image = imbinarize(gray_image, threshold);

Once you have segmented the watermark, we can proceed to remove it from the document. One way to achieve this is by inpainting the detected watermark region.For more information on this function, please refer to inpaintExemplar

% Inpaint the watermark region

clean_image = inpaintExemplar(image, binary_image);

After removing the watermark, it is essential to enhance the text quality to improve OCR accuracy. You can apply image enhancement techniques like contrast adjustment or noise reduction to make the text more legible.

% Enhance text quality (e.g., contrast adjustment)

enhanced_image = imadjust(clean_image);

With the watermark removed and the text quality enhanced, you can now proceed with OCR processing using Matlab's OCR functionality to extract text from the preprocessed document. For more information on ocr function, please refer to ocr

% Perform OCR on the preprocessed document

results = ocr(enhanced_image);

recognized_text = results.Text;

disp(recognized_text);

Hope, this should take care to resolve your problem.


Matt J
Matt J on 5 Aug 2024
Edited: Matt J on 5 Aug 2024
A=im2gray(imread('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1746876/image.jpeg'));
B=255+double(imcomplement(A));
C=bwareafilt(B>50,400);
imshow(imcomplement(C),[]);

Products


Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!