how to Automatically digitize a plot image?

% read in tiff image and convert it to double format
close all;
clear all
myimage = imread('wave.png');
% myimage = myimage(:,:,1);
% allocate space for thresholded image
image_thresholded = zeros(size(myimage));
% loop over all rows and columns
for ii=1:size(myimage,1)
for jj=1:size(myimage,2)
color=[myimage(ii,jj,1),myimage(ii,jj,2),myimage(ii,jj,3)];
cmax=max(color);
cmin=min(color);
if (cmax<200 && cmin >80)
new_pixel=0;
else
new_pixel=255;
end
image_thresholded(ii,jj)=new_pixel;
end
end
% figure()
% % subplot(1,2,1)
% imshow(myimage)
% title('original image')
% figure()
% % subplot(1,2,2)
% imshow(image_thresholded)
% title('thresholded image')
Data=[];
% Xmin=input('\nEnter Xmin:')
% Xmax=input('\nEnter Xmax:')
% Ymin=input('\nEnter Ymin:')
% Ymax=input('\nEnter Ymax:')
Xmin=0;Xmax=600;Ymin=-3000;Ymax=3000;
for ii=1:size(myimage,2)
for jj=5:size(myimage,1)
pixel=image_thresholded(jj,ii);
if pixel==0
Data=[Data;ii,size(myimage,2)-jj];
end
end
end
imxmax=max(Data(:,1));imxmin=min(Data(:,1));
imymax=max(Data(:,2));imymin=min(Data(:,2));
figure()
scatter(Data(:,1),Data(:,2))
title('Extacted Original Image Co-ordinate data')
%Scrapout Extra lines
Test=Data(Data(:,1)>imxmin+5,:);
Test=Test(Test(:,1)<imxmax-5,:);
Test=Test(Test(:,2)>imymin+5,:);
Test=Test(Test(:,2)<imymax-5,:);
X=Test(:,1);Y=Test(:,2);
% X=Data(:,1);Y=Data(:,2);
X1=[(X-imxmin)/(imxmax-imxmin)].*(Xmax-Xmin)+Xmin;
Y1=[(Y-imymin)/(imymax-imymin)].*(Ymax-Ymin)+Ymin;
XData=[(Data(:,1)-imxmin)/(imxmax-imxmin)].*(Xmax-Xmin)+Xmin;
YData=[(Data(:,2)-imymin)/(imymax-imymin)].*(Ymax-Ymin)+Ymin;
figure()
plot(X1,Y1);
hold on
scatter(XData,YData);
% title('Final Data extracted to plot scale')
Hi All I have tried to automatically extact the data from picture plot (attached image).
My idea is to extract the color of picture some how if the color of the pixel is with in the specific range store its x,y co-ordinates in the matrix.
Then use this matrix and transform the data in the matrix from image co-ordinates to plot co-ordinates.
I am relatively new to matlab coding so sorry for my rough syntax.
Now the problem is same image with minor variation is not able to transform the image co-ordinates.
Request you to help me understand where I am going wrong.
Thanks & Regards
Sriramakrishna Turaga

 Accepted Answer

Well the plot doesn't have enough resolution to get all the data. For example on some of the noisier areas, there is a bunch of dark values for a single column. What do you want to pick? The top one? The bottom one? The average? All of them?
I'd just scan them looking for dark values
[rows, columns, numberOfColorChannels] = size(rgbImage)
grayImage = min(rgbImage, [], 3);
binaryImage = grayImage < 128;
topRows = zeros(1, columns);
bottomRows = zeros(1, columns);
meanRows = zeros(1, columns);
for col = 1 : columns
nonZeroRows = find(binaryImage(:, col));
if ~isempty(nonZeroRows)
topRows(col) = nonZeroRows(1);
bottomRows(col) = nonZeroRows(end);
meanRows(col) = mean(nonZeroRows)
end
end
% Now make x and y
x = 1 : columns;
y = topRows; % or bottomRows or meanRows, whatever you want.

4 Comments

Dear Image Analyst,
Thank you very much for quick response, by the method I have done I am able to get all the possible data from the plot (see the attached image).
So with my method (may be long) where I am not loosing out any data as I am taking a range of colors not just binary black and white. But the problem transforming this data into actual plot co-ordinate system. My method actually worked for one image (wave.png) but the same image with little more extra pixels on all borders (wave1.png) the co-ordinate transformation didn't work.
If I follow the method suggested by you, like you mentioned yes I am loosing out lot of data and thereby low Res extracted data, so I am afraied to say this method will not work for me.
Uh, alright so you think your code works. But don't you realize that you can only get as much data as there are columns in your image? It does not matter if the plot has colors or shades, perhaps due to anti-aliasing. Let's say that your screen was 1920 pixels across and your data has a million data points, and you plotted it and saved it as a 1920x1080 image. You can only get 1980 data points, not the complete million so you're only getting like 1 in every 500 points (for that example).
And you don't want to follow my method (below) because your image's resolution is not high enough. Well that's not the problem of the algorithm. That's the problem of your image. If you want more resolution, get a higher resolution image. Or else you can make up data for the missing data if you want, but it would just be a guess.
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 22;
%--------------------------------------------------------------------------------------------------------
% READ IN IMAGE
folder = pwd;
baseFileName = 'wave.png';
grayImage = imread(baseFileName);
% Get the dimensions of the image.
% numberOfColorChannels should be = 1 for a gray scale image, and 3 for an RGB color image.
[rows, columns, numberOfColorChannels] = size(grayImage)
if numberOfColorChannels > 1
% It's not really gray scale like we expected - it's color.
% Use weighted sum of ALL channels to create a gray scale image.
grayImage = min(grayImage, [], 3);
end
%--------------------------------------------------------------------------------------------------------
% Display the image.
subplot(2, 2, 1);
imshow(grayImage);
axis('on', 'xy');
title('Gray Scale Image', 'FontSize', fontSize, 'Interpreter', 'None');
impixelinfo;
hFig = gcf;
hFig.WindowState = 'maximized'; % May not work in earlier versions of MATLAB.
drawnow;
set(gca,'ColorScale','log')
subplot(2, 2, 2);
imhist(grayImage);
% Threshold the image.
threshold = 190; % Whatever.
binaryImage = grayImage < threshold;
subplot(2, 2, 2);
imshow(binaryImage);
axis('on', 'xy');
title('Binary Image', 'FontSize', fontSize, 'Interpreter', 'None');
drawnow;
topRows = zeros(1, columns);
bottomRows = zeros(1, columns);
meanRows = zeros(1, columns);
for col = 1 : columns
nonZeroRows = find(binaryImage(:, col));
if ~isempty(nonZeroRows)
topRows(col) = nonZeroRows(1);
bottomRows(col) = nonZeroRows(end);
meanRows(col) = mean(nonZeroRows);
end
end
% Now make x and y
x = 1 : columns;
y = topRows; % or bottomRows or meanRows, whatever you want.
% Get rid of any data that was not assigned.
missingColumns = topRows == 0;
x(missingColumns) = [];
y(missingColumns) = [];
% Plot remaining good data.
subplot(2, 2, 3:4);
plot(x, y, 'b-', 'LineWidth', 2);
grid on;
xlabel('x', 'FontSize', fontSize);
ylabel('x', 'FontSize', fontSize);
title('topRows', 'FontSize', fontSize);
Here's what your algorithm does if I replace scatter with plot:
Oh Great, this is exactly what I am looking for. I know my resolution is limited by the image pixel density but i have to live with it.
One more problem is still this code is not working when I change the image from wave.png to wave1.png.
Both of them are screen grabs of same image. Can you please look into it and help me as I couldn't understand where it is going wrong.
I tried with different threshold values as I guess that is the only parameter we can play with. Can you please throw some light in have an intellegent guess value for this threshold.
That's because wave1.png has some white lines (garbage) at the top and bottom, like in the top and bottom 3 or 4 lines. Erase those if you don't want them.
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 22;
%--------------------------------------------------------------------------------------------------------
% READ IN IMAGE
folder = pwd;
baseFileName = 'wave1.png';
grayImage = imread(baseFileName);
% Get the dimensions of the image.
% numberOfColorChannels should be = 1 for a gray scale image, and 3 for an RGB color image.
[rows, columns, numberOfColorChannels] = size(grayImage)
if numberOfColorChannels > 1
% It's not really gray scale like we expected - it's color.
% Use weighted sum of ALL channels to create a gray scale image.
grayImage = min(grayImage, [], 3);
end
%--------------------------------------------------------------------------------------------------------
% Display the image.
subplot(2, 2, 1);
imshow(grayImage);
axis('on', 'xy');
title('Gray Scale Image', 'FontSize', fontSize, 'Interpreter', 'None');
impixelinfo;
hFig = gcf;
hFig.WindowState = 'maximized'; % May not work in earlier versions of MATLAB.
drawnow;
set(gca,'ColorScale','log')
subplot(2, 2, 2);
imhist(grayImage);
% Threshold the image.
threshold = 190; % Whatever.
binaryImage = grayImage < threshold;
% Erase garbage at top and bottom
binaryImage(1:50, :) = false;
binaryImage(end-50:end, :) = false;
subplot(2, 2, 2);
imshow(binaryImage);
impixelinfo;
axis('on', 'xy');
title('Binary Image', 'FontSize', fontSize, 'Interpreter', 'None');
drawnow;
topRows = zeros(1, columns);
bottomRows = zeros(1, columns);
meanRows = zeros(1, columns);
for col = 1 : columns
nonZeroRows = find(binaryImage(:, col));
if ~isempty(nonZeroRows)
topRows(col) = nonZeroRows(1);
bottomRows(col) = nonZeroRows(end);
meanRows(col) = mean(nonZeroRows);
end
end
% Now make x and y
x = 1 : columns;
y = topRows; % or bottomRows or meanRows, whatever you want.
% Get rid of any data that was not assigned.
missingColumns = topRows == 0;
x(missingColumns) = [];
y(missingColumns) = [];
% Plot remaining good data.
subplot(2, 2, 3:4);
plot(x, y, 'b-', 'LineWidth', 2);
grid on;
xlabel('x', 'FontSize', fontSize);
ylabel('x', 'FontSize', fontSize);
title('topRows', 'FontSize', fontSize);

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!