Technical Articles

New Features for High-Performance Image Processing in MATLAB

By Brett Shoelson, MathWorks and Bruce Tannenbaum, MathWorks

Advances in satellite, medical, hyperspectral, and other imaging systems are producing bigger and more numerous images than ever before. Recent PC-based computational architectures, such as multicore CPUs and clusters, have imposed additional complexities on programming for high-performance imaging applications.

Recent enhancements to MATLAB® and Image Processing Toolbox™ address these challenges, with image processing speed increased in some cases by orders of magnitude. This article describes several of these enhancements and then uses a segmentation example to demonstrate the performance improvements that can result from using them.

Accelerated Image Processing

Over the past few years, image processing users have seen performance enhancements in MATLAB math functions and Image Processing Toolbox functions. Since Release 2008a, MATLAB has incorporated major performance improvements to some of the math engines that underlie its matrix mathematics. These include upgrades to the Basic Linear Algebra Subroutine (BLAS) libraries, the Linear Algebra Package (LAPACK) library, and processor-specific math libraries. MATLAB now includes object-oriented computational geometry tools, as well as a library of computational geometry algorithms for improved robustness, performance, and memory efficiency.

In Image Processing Toolbox, one of the key methods introduced to improve performance was implicit multithreading. Figure 1 plots the speedup of the toolbox function imresize by release. imresize was improved by both core algorithm enhancements and implicit multithreading. While improvements to the core algorithms in R2007b resulted in a substantial speedup of imresize, the multithreading introduced in R2010a resulted in a speedup that was dramatic. Release R2010a of Image Processing Toolbox brought equally significant performance improvements to more than 50 functions, including imfilter, imopen, imclose, and edge.

Figure 1. The imresize function, showing significant speedup with MATLAB R2007b and even more gains with R2010a.

Displaying Large Images

With medical and satellite images now routinely in the gigapixel range, the ability to view large images that do not fit into the MATLAB workspace has become imperative. The enhanced function imshow lets you display a subsampled version of TIFF images. Additionally, a workflow introduced in R2009b lets you create spatially tiled, multiresolution image pyramids to display and navigate very large images. With the rsetwrite function you can create an R-Set file from large TIFF or NITF image files. You can open and explore R-Set files with imtool, just as you would a smaller image.

Block Processing Large Images

When images are larger than available DRAM, it is impractical to load and process them. To address this challenge, the blockproc function, added in release R2009b, lets you operate locally on smaller sections of an image without loading the entire image into a MATLAB variable. Because this function supports a file-to-file workflow, neither the input nor the output image needs to be contained entirely in memory. Consequently, the block processing approach extends naturally to arbitrarily large images. Because the processing of each block is an independent operation, block processing also lends itself to processing in parallel.

Parallel Computing

Many image processing operations are inherently parallelizable, as they perform localized and independent computations. While many MATLAB math and Image Processing Toolbox functions leverage multiple cores implicitly, Parallel Computing Toolbox™ provides functionally for explicit parallelization. Notably, the parfor function lets you specify that a for-loop is to be processed in parallel on up to 12 processors on your local computer and on more processors when connected to a cluster.

The benefits of using parallel computing tools can be overshadowed by the overhead of passing large data, such as images, back and forth. For this reason, it is often helpful to remove visualizations from the processing loop. Parallelizing a loop in MATLAB can then be as easy as changing the keyword for to the keyword parfor. Parallel computing functionality can be used on clusters, clouds, and other nonlocal hardware resources using MATLAB Parallel Server™.

Parallel computing and block processing can easily be used together, since the processing of each block is independent. You can use blockproc to process an image with a pool of MATLAB workers by setting the use parallel parameter to true. Note, however, that running a job in parallel can add significant overhead to smaller computations. The potential benefits of parallelization improve as the computational expense of each block increases.

Demonstrating the Performance Improvements: An Image Segmentation Example

In this example, the goal is to segment an image of rice grains (Figure 2). As we step through the problem, we will review the timing of certain function calls over several versions of MATLAB and compare different approaches to segmenting the image. (All timings come from the same computer, using MATLAB R2011a under 32-bit Windows 7.)

Figure 2. Original image of rice grains.
img = imresize(imread('rice.png'),4); 
Elapsed time is 0.014 seconds.
whos img
Name         Size                Bytes  Class
img       1024x1024            1048576  uint8

In this example, imread and imresize took 14 milliseconds. As the plot in Figure 1 showed, the imresize command is nearly 100 times faster in R2011a than in earlier releases.

The next step is to isolate the rice grains using segmentation. The simplest approach is to apply a threshold using im2bw. First, however, we must account for the fact that the background illumination varies across the image. To normalize the background, we apply a tophat filter and then use graythresh to calculate a global threshold value, which we apply uniformly to the image using im2bw.

im2 = imtophat(img,strel('disk',60));
bw = im2bw(im2, graythresh(im2));
Figure 3. Left: The speed-up in tophat filtering over eight releases, showing actual times rather than relative speeds. Center: Rice grain image after tophat filtering but before segmentation. Right: Rice grain image after segmentation.

The tophat function plus im2bw segmentation ran in 0.594 seconds using the R2011a version. The R2009a version took more than 3 seconds. The speedup is partly due to the use of the imdilate and imerode functions within imtophat, which both saw significant performance improvements in R2009b and R2010a.

Because tophat filtering can take a long time with very large images, we need an alternative to using a global threshold. One option is to use block processing, which lets us calculate a local threshold and perform segmentation separately for each block. While block processing uses a different algorithm, it achieves similar segmentation results.

In the following code, we block-process the original image using graythresh to calculate a unique threshold in each 128x128 block.

fun = @(block_struct) im2bw(,graythresh(;
segmented = blockproc(img, [128 128],fun,'bordersize',[3 3],...

The resulting image is almost identical to the globally thresholded image, but the code ran in 0.03 seconds using R2011a—nearly 20 times faster than the global threshold approach. These results show that it is often possible to come up with an algorithm that is more efficient at solving the problem simply by looking at the problem differently.

So far, we have been working with only one image file; however, most image processing workflows involve processing many images—sometimes thousands. To accelerate the processing of many images, we can use the parfor function.

To demonstrate, let's assume that we incorporated our image segmentation algorithm into the function MySegmentation:

function x = MySegmentation(imgName, parameters)

We can process multiple user-selected image files by calling this function in a for-loop:

[filenames,pathname] = uigetfile('*.tif','Select file(s)','multiselect','on');
% Series loop over files
x = zeros(numel(filenames),1);
for ii = 1:length(filenames)
    x(ii) = MySegmentation(fullfile(pathname,filenames{ii}), parameters);

With a call to matlabpool to start the parallel processes and a change from for to parfor, we can achieve an up-to-twelvefold increase in processing speed. This approach is particularly effective with a multiprocessor machine, such as a quad-core, dual-processor computer.

matlabpool open 12
% Parallel loop over files
x = zeros(numel(filenames),1);
parfor ii = 1:length(filenames)
    x(ii) = MySegmentation(fullfile(pathname,filenames{ii}), parameters);

In this article, we have described several methods for handling large images and improving image processing performance in MATLAB. We also suggested ways that parallel processing can be used to speed up image processing code, and found that a block-processing algorithm for segmentation was faster than a global thresholding algorithm.

Published 2012 - 92003v00