Large datasets: Any way to perform regression analyses on select variables within a large table based on row name?
4 views (last 30 days)
Show older comments
I have a large dataset of soil profiles. I am trying to calculate regressions of organic carbon and profile depth. The data set is a csv with columns for 'profile_name', 'top_depth', 'bottom_depth' and 'organic_carbon'. There are other columns for spatial data that I shouldn't have to mention.
The data is organized so there are multiple rows for one profile, so the 'profile name' value is the same for anywhere from 2 to 10 rows while the 'top_depth' and 'bottom_depth' change to reflect the sample interval within the soil profile, and the 'organic_carbon' represents how much carbon is in the soil.
What I want to do is write a script that will run linear and/or logarithmic regressions of 'organic carbon' and the 'Bottom Depth' values within each distinct 'profile_name'. I might want to go further with some calculations but I think that would be the best start. The hurdle for me is sort of binning the profile data by 'profile_name'. Any clues would be greatly appreciated!
0 Comments
Answers (2)
Tom Lane
on 5 Jan 2013
If you have the Statistics Toolbox, you might find it handy to use "dataset" to read in the csv file and create a dataset array from it. Then I recommend that you convert the profile_name variable to a nominal variable. The following illustrates how you can operate on different subsets based on values of a nominal variable:
d = dataset(Origin,Displacement,Weight); % you would read this from a file
d.Origin = nominal(d.Origin); % convert text to nominal
org = unique(d.Origin);
for j=1:length(org);
t = d.Origin==org(j);
p = polyfit(d.Weight(t),d.Displacement(t),1);
fprintf('%s: %s\n',char(org(j)),num2str(p))
end
0 Comments
See Also
Categories
Find more on Linear Regression in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!