Accelerate Linear Model Fitting on GPU
This example shows how you can accelerate regression model fitting by running functions on a graphical processing unit (GPU). The example compares the time required to fit a model on a central processing unit (CPU) with the time required to fit the same model on a GPU. Using a GPU requires Parallel Computing Toolbox™ and a supported GPU device. For information about supported devices, see GPU Computing Requirements (Parallel Computing Toolbox).
Create a table of airline sample data from the file airlinesmall.csv
by using the readtable
function. Remove table rows corresponding to cancelled flights, and convert UniqueCarrier
to a categorical variable using the categorical
function.
A = readtable("airlinesmall.csv");
A = A(A.Cancelled~=1,:);
A.UniqueCarrier = categorical(A.UniqueCarrier)
A=121171×29 table
Year Month DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier FlightNum TailNum ActualElapsedTime CRSElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay
____ _____ __________ _________ _______ __________ _______ __________ _____________ _________ _______ _________________ ______________ _______ ________ ________ _______ _______ ________ ______ _______ _________ ________________ ________ ____________ ____________ ________ _____________ _________________
1987 10 21 3 642 630 735 727 PS 1503 {'NA'} 53 57 {'NA'} 8 12 {'LAX'} {'SJC'} 308 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 26 1 1021 1020 1124 1116 PS 1550 {'NA'} 63 56 {'NA'} 8 1 {'SJC'} {'BUR'} 296 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 23 5 2055 2035 2218 2157 PS 1589 {'NA'} 83 82 {'NA'} 21 20 {'SAN'} {'SMF'} 480 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 23 5 1332 1320 1431 1418 PS 1655 {'NA'} 59 58 {'NA'} 13 12 {'BUR'} {'SJC'} 296 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 22 4 629 630 746 742 PS 1702 {'NA'} 77 72 {'NA'} 4 -1 {'SMF'} {'LAX'} 373 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 28 3 1446 1343 1547 1448 PS 1729 {'NA'} 61 65 {'NA'} 59 63 {'LAX'} {'SJC'} 308 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 8 4 928 930 1052 1049 PS 1763 {'NA'} 84 79 {'NA'} 3 -2 {'SAN'} {'SFO'} 447 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 10 6 859 900 1134 1123 PS 1800 {'NA'} 155 143 {'NA'} 11 -1 {'SEA'} {'LAX'} 954 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 20 2 1833 1830 1929 1926 PS 1831 {'NA'} 56 56 {'NA'} 3 3 {'LAX'} {'SJC'} 308 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 15 4 1041 1040 1157 1155 PS 1864 {'NA'} 76 75 {'NA'} 2 1 {'SFO'} {'LAS'} 414 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 15 4 1608 1553 1656 1640 PS 1907 {'NA'} 48 47 {'NA'} 16 15 {'LAX'} {'FAT'} 209 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 21 3 949 940 1055 1052 PS 1939 {'NA'} 66 72 {'NA'} 3 9 {'LGB'} {'SFO'} 354 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 22 4 1902 1847 2030 1951 PS 1973 {'NA'} 88 64 {'NA'} 39 15 {'LAX'} {'OAK'} 337 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 16 5 1910 1838 2052 1955 TW 19 {'NA'} 162 137 {'NA'} 57 32 {'STL'} {'DEN'} 770 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 2 5 1130 1133 1237 1237 TW 59 {'NA'} 187 184 {'NA'} 0 -3 {'STL'} {'PHX'} 1262 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
1987 10 30 5 1400 1400 1920 1934 TW 102 {'NA'} 200 214 {'NA'} -14 0 {'SNA'} {'STL'} 1570 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
⋮
The table A
contains data for 12,1171 flights. The table variables Year
, Month
, and DayofMonth
contain data for the year, month, and day that each flight departed, respectively. ArrDelay
contains the delay in minutes between each flight's scheduled and actual arrival time. UniqueCarrier
contains data for the airline that operated each flight.
Measure Time to Fit Linear Model on CPU
Measure the time required to fit a linear regression model on a CPU to the predictor variables Year
, Month
, DayofMonth
, and UniqueCarrier
, and the response variable ArrDelay
. This example uses an Intel(R) Xeon(R) CPU E5-2623 v4 @ 2.60GHz.
Create a second table from the table A
variables Year
, Month
, DayofMonth
, UniqueCarrier
, and ArrDelay
.
tblCPU = table(A.Year,A.Month,A.DayofMonth,A.UniqueCarrier,A.ArrDelay, ... VariableNames=["Year" "Month" "DayofMonth" "UniqueCarrier" "ArrDelay"])
tblCPU=121171×5 table
Year Month DayofMonth UniqueCarrier ArrDelay
____ _____ __________ _____________ ________
1987 10 21 PS 8
1987 10 26 PS 8
1987 10 23 PS 21
1987 10 23 PS 13
1987 10 22 PS 4
1987 10 28 PS 59
1987 10 8 PS 3
1987 10 10 PS 11
1987 10 20 PS 3
1987 10 15 PS 2
1987 10 15 PS 16
1987 10 21 PS 3
1987 10 22 PS 39
1987 10 16 TW 57
1987 10 2 TW 0
1987 10 30 TW -14
⋮
Create an anonymous function that uses the fitlm
function to fit a linear regression model to the variables in tblCPU
. Measure the time required to run the anonymous function by using the timeit
function.
cpufit = @() fitlm(tblCPU,CategoricalVars=4); tcpu = timeit(cpufit)
tcpu = 0.3605
tcpu
contains the time required to fit the linear regression model on the CPU.
Measure Time to Fit Linear Model on GPU
Verify that a GPU device is available by using the gpuDevice
(Parallel Computing Toolbox) function.
gpuDevice
ans = CUDADevice with properties: Name: 'NVIDIA GeForce RTX 2080 SUPER' Index: 1 ComputeCapability: '7.5' SupportsDouble: 1 GraphicsDriverVersion: '512.15' DriverModel: 'WDDM' ToolkitVersion: 11.2000 MaxThreadsPerBlock: 1024 MaxShmemPerBlock: 49152 (49.15 KB) MaxThreadBlockSize: [1024 1024 64] MaxGridSize: [2.1475e+09 65535 65535] SIMDWidth: 32 TotalMemory: 8589606912 (8.59 GB) AvailableMemory: 6979084692 (6.98 GB) CachePolicy: 'balanced' MultiprocessorCount: 48 ClockRateKHz: 1815000 ComputeMode: 'Default' GPUOverlapsTransfers: 1 KernelExecutionTimeout: 1 CanMapHostMemory: 1 DeviceSupported: 1 DeviceAvailable: 1 DeviceSelected: 1
The output shows that this example uses an NVIDIA GeForce RTX 2080 SUPER
GPU.
Create a third table from the table tblCPU
variables. Copy the table's numeric and logical variables to GPU memory by using the gpuArray
(Parallel Computing Toolbox) function.
tblGPU = tblCPU; for ii = 1:width(tblCPU) if isnumeric(tblCPU.(ii)) || islogical(tblCPU.(ii)) tblGPU.(ii) = gpuArray(tblCPU.(ii)); end end tblGPU
tblGPU=121171×5 table
Year Month DayofMonth UniqueCarrier ArrDelay
____ _____ __________ _____________ ________
1987 10 21 PS 8
1987 10 26 PS 8
1987 10 23 PS 21
1987 10 23 PS 13
1987 10 22 PS 4
1987 10 28 PS 59
1987 10 8 PS 3
1987 10 10 PS 11
1987 10 20 PS 3
1987 10 15 PS 2
1987 10 15 PS 16
1987 10 21 PS 3
1987 10 22 PS 39
1987 10 16 TW 57
1987 10 2 TW 0
1987 10 30 TW -14
⋮
The tblGPU
table contains gpuArray
objects, each representing an array stored in GPU memory.
When you pass gpuArray
inputs to the fitlm
function, it automatically runs on the GPU. Measure the time required to fit the regression model on the GPU by using the gputimeit
(Parallel Computing Toolbox) function. This function is preferable to timeit
, because it ensures that all operations on the GPU are complete before it records the time. The function also compensates for the overhead of working on a GPU.
gpufit = @() fitlm(tblGPU,CategoricalVars=4); tgpu = gputimeit(gpufit)
tgpu = 0.1167
tgpu
contains the time required to fit the linear regression model on the GPU, which is over three times faster than on the CPU.
Determine Statistical Significance
To investigate the statistical significance of the linear regression model terms, fit the regression model on the GPU.
mdl = fitlm(tblGPU,CategoricalVars=4)
mdl = Linear regression model: ArrDelay ~ 1 + Year + Month + DayofMonth + UniqueCarrier Estimated Coefficients: Estimate SE tStat pValue _________ ________ ________ __________ (Intercept) -178.29 33.358 -5.3447 9.0739e-08 Year 0.091448 0.0166 5.5089 3.6173e-08 Month -0.076407 0.02558 -2.987 0.0028176 DayofMonth 0.037318 0.010005 3.7301 0.00019147 UniqueCarrier_AA 2.4741 1.3894 1.7807 0.074968 UniqueCarrier_AQ -4.0286 2.8177 -1.4297 0.1528 UniqueCarrier_AS 3.4784 1.4799 2.3505 0.018749 UniqueCarrier_B6 6.6857 1.7369 3.8493 0.00011853 UniqueCarrier_CO 2.623 1.4094 1.861 0.062748 UniqueCarrier_DH 2.5145 1.7972 1.3991 0.16177 UniqueCarrier_DL 3.0716 1.3883 2.2125 0.026932 UniqueCarrier_EA 4.6196 1.7333 2.6652 0.0076942 UniqueCarrier_EV 4.8318 1.5507 3.1159 0.0018342 UniqueCarrier_F9 3.1991 2.1563 1.4836 0.13792 UniqueCarrier_FL 4.2868 1.6087 2.6647 0.0077066 UniqueCarrier_HA -6.7512 2.2984 -2.9374 0.0033104 UniqueCarrier_HP 3.2339 1.4606 2.2141 0.026822 UniqueCarrier_ML (1) -3.7757 3.9288 -0.96105 0.33653 UniqueCarrier_MQ 3.7391 1.4448 2.5881 0.009653 UniqueCarrier_NW 0.93463 1.3994 0.66788 0.50421 UniqueCarrier_OH 2.5523 1.5813 1.6141 0.10651 UniqueCarrier_OO 0.65096 1.4665 0.44388 0.65713 UniqueCarrier_PA (1) 1.6532 2.2161 0.74601 0.45566 UniqueCarrier_PI 7.6092 1.7396 4.3741 1.2206e-05 UniqueCarrier_PS 1.854 3.6504 0.50789 0.61153 UniqueCarrier_TW 3.2357 1.4622 2.213 0.026902 UniqueCarrier_TZ -3.1902 2.4863 -1.2831 0.19946 UniqueCarrier_UA 3.9331 1.3927 2.8241 0.0047426 UniqueCarrier_US 2.4076 1.3928 1.7285 0.083898 UniqueCarrier_WN 0.73928 1.3833 0.53442 0.59305 UniqueCarrier_XE 3.6054 1.4992 2.405 0.016176 UniqueCarrier_YV 7.0361 1.7228 4.0842 4.426e-05 Number of observations: 120866, Error degrees of freedom: 120834 Root Mean Squared Error: 30.5 R-squared: 0.00259, Adjusted R-Squared: 0.00233 F-statistic vs. constant model: 10.1, p-value = 2.38e-48
mdl
contains the formula for the linear regression model and statistics about the estimated model coefficients. The table output contains a row for each continuous term and for each value in UniqueCarrier
. You can determine if a term or value has a statistically significant effect on the arrival delay by comparing its p-value to the significance level of 0.05.
To determine whether UniqueCarrier
contains a value that has a statistically significant effect on the arrival delay, perform an ANOVA with a 95% confidence level by using the function anova
. When you pass gpuArray
inputs to the anova
function, it automatically runs on the GPU.
aov = anova(mdl)
aov=5×5 table
SumSq DF MeanSq F pValue
__________ __________ ______ ______ __________
Year 28309 1 28309 30.348 3.6173e-08
Month 8322.7 1 8322.7 8.9223 0.0028176
DayofMonth 12979 1 12979 13.914 0.00019147
UniqueCarrier 2.4305e+05 28 8680.5 9.3059 1.6331e-39
Error 1.1271e+08 1.2083e+05 932.79
aov
contains the results of the ANOVA. The p-value in the row corresponding to UniqueCarrier
is smaller than the significance level of 0.05, indicating that at least one value in UniqueCarrier
has a statistically significant effect on the arrival delay.
See Also
gpuDevice
(Parallel Computing Toolbox) | gputimeit
(Parallel Computing Toolbox) | timeit
| fitlm
| anova
Related Topics
- Analyze and Model Data on GPU
- Measure and Improve GPU Performance (Parallel Computing Toolbox)
- Run MATLAB Functions on a GPU (Parallel Computing Toolbox)