How to find error in experimental data using MATLAB code?

Question

Safi ullah on 7 Aug 2018

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/413961-how-to-find-error-in-experimental-data-using-matlab-code

Commented: Safi ullah on 9 Aug 2018

Hi everyone, I downloaded experimental raw data and then analyse with a software. I have used a standard analysis software package (GUISDAP) and extract electron density, Ne = 1×1000. Then I used the following Matlab code to get data from some specific columns,

A=Ne(1, 90:90:1000); %%this gives A=1×11

As the Matlab code is giving me data exactly after 90 data points. So in this case how I estimate or minimise the error/noise from the data. Someone give me the suggestion to find standard deviation. Actually I do not know in this case what means the noise and how to deal with it. Any guidance will be appreciated. Thanks

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Adam Danz on 7 Aug 2018

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/413961-how-to-find-error-in-experimental-data-using-matlab-code#answer_331950

Edited: Adam Danz on 8 Aug 2018

Open in MATLAB Online

Someone give me the suggestion to find standard deviation. Actually I do not know in this case what means the noise and how to deal with it.

Are you asking what 'noise' means in regard to data or are you asking how to interpret the noise in your data?

Noise is just variability of a measurement. If I step on a digital scale 5 times my weight might be [79.9, 80.1, 80.0, 79.8, 80.1] kg. The variation might be explained in how I was centered on the scale each time, my posture, or the lousy mechanics within the scale. Of course I have a True (capital T) weight at any point in time but the 'noisy' scale is the only way to estimate that True weight and along with the estimates comes noise (variability).

For example, let's say my True weight (which isn't knowable) is 80.0001425458541254789 kg. The mean of my 5 measurements is 79.98 and the standard deviation is 0.13038.

x = [79.9, 80.1, 80.0, 79.8, 80.1]; 
m = mean(x)
s = std(x)    %  <- That's how to calculate standard deviation

If your data are normally distributed, ~68% of your data will be within +/- 1 standard deviation from the mean. If the scale continues to behave the same and I quickly weight myself 1000 more times, ~68% of the measurement will be between 79.85 and 80.11 kg.

upper = m + 1*s
lower = m - 1*s

If your data are not approximately normally distributed, you should not be using stand deviations but you could use a non parametric test such as bootstrapped confidence intervals. It might be difficult to determine the distribution of your data since you only seem to have 11 data points in 'A'.

I'm not certain that this explanation is what you were looking for so please feel free to follow up or tell me how far off base my interpretation of your noisy question was ;)

25 Comments
Show 23 older commentsHide 23 older comments

Adam Danz on 8 Aug 2018

Edited: Adam Danz on 8 Aug 2018

Given a vector of noisy data, you can't really 'remove' the noise unless you know what the signal should be in the first place. You can, however, clean up the data to produce a less noisy estimate of the signal. The more data you have, the better. You seem to have 20 data points which might be enough depending on what method you use. I'll list some ideas below.

Without seeing your data I can't recommend a particular method. You could embed a screenshot of plot(A) which might help.

If all 20 points in 'A' are 20 repetitions of the same measure (ie, I step on the scale 20 times to estimate my weight), just take the mean (if normally distributed) or median (otherwise).

If your 20 points in 'A' are a time series you've got several options (even more than I have listed)

Do a moving average
or a moving average filter
Smooth your data ( examples )
or apply a filter
detect outliers and remove them - maybe even prior to the steps above
Here's an example of outlier detection & smoothing combo
Additional methods of outlier detection and smoothing
If you know what your data should look like, you could do curve fitting .
Don't remove the noise. Instead, plot the error using error bars and let the reader interpret it (this is the best solution in some cases)

These methods and the parameters you choose will provide different results which reinforces my statement that you really can't just remove noise - you can only reduce it to produce an estimate of the signal.

what does it mean when people write "here only those data points are considered which are two times of standard deviation"

First some background info. The standard deviation std() is a measure of dispersion. "Within one standard deviation" means all data that are between mean-std and mean+std. So if the mean is 20 and the std is 5, all data between 15 and 25 are within 1 std. If the data are normally distributed, 68.2% of the data will be within 1 std of the mean.

"Within 2 standard deviations" refers to all data between mean-2*std and mean+2*std. For a mean of 20 and std of 5, that's all data between 10 and 30 which will account for 95.4% of the data.

Finally, 3 standard deviations account for 99.6% of the data. For intuition on these percentages, see the first figure in this wiki article .

To drive this message home, below is a figure showing 1000 random numbers pulled from a normal distribution with mean 20 and std 5. The histogram shows the distribution along the x axis and the vertical reference lines shows the first 3 standard deviations.

The green lines show where the 2nd std falls. Data points along the x axis outside of those green lines are ignored if we're only considering data within 2 std of the mean.

Safi ullah on 8 Aug 2018

Open in MATLAB Online

@ Dear Adam Danz thanks for detail guidance. From your answer I learned a lot but still do not reached to my desire point. This time I want to explain clearly my question step wise [1]. I have different data sets from same experiment. For each data set I used the same code

A=Ne(1, 90:90:1800); %%this gives A=1×20

[2]. Finally I have combine all the “A” and then get A=1×350 The plot(A) is given below, where A=1×350

I have confusion that (a) either for each dataset we need to find std deviation or only for final A=1×350 we need to find it. (b). What is simply meaning of to minimise noise from the data? In my case by finding mean, variance and stad deviation which one I will show that this was the noise (c). will be helpful for me if you explain the below two sentences which I have taken from my research field papers. the sentences are “The requirement for points to be included is that the value of Z is twice or more than that of the background noise.” And another one is “In Figure 5, the Z is plotted only if the backscatter signal is larger than two times the standard deviation of the background signal” thanks

Adam Danz on 8 Aug 2018

Your question A)

I'm a bit confused since (a) has 20 elements and (A) has 350 elements but 350 is not divisible by 20.

Anyway, your question is impossible to answer without knowing what (a) or (A) represents. Is (A) a continuous signal that was split up into multiple datasets? Are all of the (a)s repeated measures of the same thing? I need more info about what (a) and/or (A) are. Also, what are the units of the x and y axes? For example, is x time (seconds)? Is y a spike rate (spikes / second)?

Your question B)

That again depends on what (a)/(A) are.

Your question C)

“The requirement for points to be included is that the value of Z is twice or more than that of the background noise.”

Without more info I can only guess by the verbiage that the data were z-scored , the background noise was estimated using standard deviation, and any data points that exceeded 2 z-scores were ignored.

"In Figure 5, the Z is plotted only if the backscatter signal is larger than two times the standard deviation of the background signal"

I explained this is my previous comment and I provided a plot to demonstrate this concept. Please read through that again. I don't know what the "Z" or the "backscatter signal" is but the data were only plotted if the data exceed 2 standard deviations of the mean.

Adam Danz on 8 Aug 2018

Q1) No. Given a vector of noisy data and no other data, you cannot know what is noise and what is signal. In fact, all of it is noise unless you have more information about the data. You can only estimate what the signal is amid the noise. In order to estimate that signal, you may want to get rid of outliers. But then the question is, what is an outlier? As an experimenter or analyst, you must define what an outlier is in a case-by-case basis. For example, "all data outside of 2 std of the mean is an outlier". The rest of the data can be considered to estimate the signal.

In some cases you may have more information which will help you estimate the signal from the noise. Let's say that your data (A) is from some device with a known minimum output of about 0.3 and you measured the output with a noisy sensor. Looking at your plot, you could eliminate all data under 0.3 and you can confidently call that "noise". The rest of the data, however, is not purely "signal". It's "signal + noise".

Q2) Can you use std() for what purpose? std() can be used to set thresholds that identify outliers but I'm not sure what's what you want to do. If you attach a mat file with (A) variable and clarify what you want to do, I'd be glad to continue helping you. I hope it's clear, though, why you can't just separate signal and noise unless you have more information about what the signal should be.

Adam Danz on 8 Aug 2018

Edited: Adam Danz on 8 Aug 2018

Data.m

Hi Safi, poking around with your data was a good idea and I have some more feedback for you.

First, you say two different things in your most recent comment. You say you want to 'reduce the noise' from A (ie, smoothing) and then later you say you want to remove outliers. So I'm still not quite sure which one you want to do, or both.

Judging from your last sentence, this is what I understand: you want to detect outliers in A and removing them and you consider the outliers to be the spikes in your data.

Open the embedded plot by right-clicking and opening it in a new tab (it will be bigger).

Your data (A) is plotted in the first subplot below and the distribution of (A) is plotted in the 2nd subplot. 'A' is definitely not from a normal distribution even though you said it was normal in your first comment under my answer.

Nevertheless, in the 1st and 2nd subplots I show you the mean and the 1st and 2nd standard deviations from the mean (see legend). You could choose one of these as a threshold for this dataset since the mean is so far from the spikes.

If you're using matlab 2017a or later, please read the documentation for the isoutlier () function.

Using that function, I identified outliers greater than 2 std from the mean of (A) and labeled them with blue dots in the first subplot. Then I re-plotted (A) without the outliers which is shown in the 3rd subplot. Note that the mean of (A) has changed since the outliers are no longer pulling it.

I attached the code used to produce these plots.

To answer your questions explicitly, if you remove the outliers as I have done, you have removed what you perceive to be some of the noise which gives you a better estimate of the signal.

Adam Danz on 9 Aug 2018

Hi Safi, your goal is very unclear to us and I think that's partially because it's unclear to you. Methods of reducing noise and detecting outliers differ depending on what type of data you're working with and you've been unclear about what your data is, what you perceive to be noise, and your goal (detect outliers, smooth the data, etc).

I can't give any recommendations to you before you can explicitly and precisely describe your data and your goal. Providing a picture, as @ImageAnalyst recommended is a good idea.

I recommend reading some basic chapters on "signal and noise" and "outliers". There's lots of free literature and videos out there if you search for it.

I just want to reiterate a final point I've made several times and I hope this helps with your understanding of the basic concept of noisy data. The plot below shows a noisy sine curve in blue. The red line is the actual sine curve but we cannot access that data because our sensors were noisy -- we only have access to the blue noisy signal. The 4 different subplots show different levels of noise. You can see that the signal is more identifiable in the last subplot because it is less noisy. There are several ways we can estimate the signal (the red line) but we can't simply "remove" the noise because the blue line is all we've got to work with.

Now consider this example where our sensor was quite precise and our signal was clean except for a burst of noise. In this example, it is reasonable to attempt to "remove" the noise from the signal. We could interpolate the window of noise or we could use curve-fitting or other techniques.

Lastly, consider this last example where our sensor was a little noisy but there were several samples that were far from the rest of the distribution. These are outliers (with red 'x') and are fairly easy to detect and remove although the method will vary depending on the distribution of your data and how you decide to define an outlier.

I hope this helps conceptually and I recommend spending some time with the literature or videos so these concepts become clearer to you. Even 1/2 a day of learning will make a big difference.

Adam Danz on 9 Aug 2018

My pleasure. Be sure to read the documentation on isoutlier() and choose parameter values that makes sense for your data and that you can explain in your paper.

Safi ullah on 9 Aug 2018

@ Dear Adam, sure I will.

Sign in to comment.

How to find error in experimental data using MATLAB code?

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

25 Comments
Show 23 older commentsHide 23 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

How to find error in experimental data using MATLAB code?

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

25 Comments Show 23 older commentsHide 23 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

25 Comments
Show 23 older commentsHide 23 older comments