## Valuation with Missing Data

### Introduction

The Capital Asset Pricing Model (CAPM) is a venerable but often maligned tool to characterize comovements between asset and market prices. Although many issues arise in CAPM implementation and interpretation, one problem that practitioners face is to estimate the coefficients of the CAPM with incomplete stock price data.

The following example shows how to use the missing data regression functions to estimate the coefficients of the CAPM.

### Capital Asset Pricing Model

Given a host of assumptions that can be found in the references
(see Sharpe [11], Lintner [6], Jarrow [5], and Sharpe, et. al. [12]),
the CAPM concludes that asset returns have a linear relationship with
market returns. Specifically, given the return of all stocks that
constitute a market denoted as *M* and the return
of a riskless asset denoted as *C*, the CAPM states
that the return of each asset *R*_{i} in the market has the expectational form

$$E[{R}_{i}]={\alpha}_{i}+C+{\beta}_{i}(E[M]-C)$$

for assets *i* = 1, ..., *n*,
where *β _{i}* is a parameter
that specifies the degree of comovement between a given asset and
the underlying market. In other words, the expected return of each
asset is equal to the return on a riskless asset plus a risk-adjusted
expected market return net of riskless asset returns. The collection
of parameters

*β*

_{1}, ...,

*β*is called asset betas.

_{n}The beta of an asset has the form

$${\beta}_{i}=\frac{\mathrm{cov}\left({R}_{i},M\right)}{\mathrm{var}\left(M\right)},$$

which is the ratio of the covariance between asset and market returns divided by the variance
of market returns.*Beta* is the price volatility of a financial
instrument relative to the price volatility of a market or index as a whole. Beta is
commonly used with respect to equities. A high-beta instrument is riskier than a
low-beta instrument. If an asset has a beta = 1, the asset is said to move with the
market; if an asset has a beta > 1, the asset is said to be more volatile than
the market. Conversely, if an asset has a beta < 1, the asset is said to be less
volatile than the market.

### Estimation of the CAPM

The standard CAPM model is a linear model with additional parameters
for each asset to characterize residual errors. For each of *n* assets
with *m* samples of observed asset returns *R _{k,i}*,
market returns

*M*, and riskless asset returns

_{k}*C*, the estimation model has the form

_{k}$${R}_{k,i}={\alpha}_{i}+{C}_{k}+{\beta}_{i}({M}_{k}-{C}_{k})+{V}_{k,i}$$

for samples *k* = 1, ..., *m* and
assets *i* = 1, ..., *n*, where *α _{i}* is
a parameter that specifies the nonsystematic return of an asset,

*β*is the asset beta, and

_{i}*V*is the residual error for each asset with associated random variable

_{k,i}*V*.

_{i}The collection of parameters *α*_{1}, ..., *α _{n}* are
called asset alphas. The strict form of the CAPM specifies that alphas
must be zero and that deviations from zero are the result of temporary
disequilibria. In practice, however, assets may have nonzero alphas,
where much of active investment management is devoted to the search
for assets with exploitable nonzero alphas.

To allow for the possibility of nonzero alphas, the estimation model generally seeks to estimate alphas and to perform tests to determine if the alphas are statistically equal to zero.

The residual errors *V _{i}* are
assumed to have moments

$$E\left[{V}_{i}\right]=0$$

and

$$E\left[{V}_{i}{V}_{j}\right]={S}_{ij}$$

for assets *i,j* = 1, ..., *n*,
where the parameters *S*_{11}, ..., *S _{nn}* are
called residual or nonsystematic variances/covariances.

The square root of the residual variance of each asset, for
example, sqrt(*S _{ii}*) for

*i*= 1, ...,

*n*, is said to be the residual or nonsystematic risk of the asset since it characterizes the residual variation in asset prices that are not explained by variations in market prices.

### Estimation with Missing Data

Although betas can be estimated for companies with sufficiently long histories of asset returns, it is difficult to estimate betas for recent IPOs. However, if a collection of sufficiently observable companies exists that can be expected to have some degree of correlation with the new company's stock price movements, that is, companies within the same industry as the new company, it is possible to obtain imputed estimates for new company betas with the missing-data regression routines.

### Estimation of Some Technology Stock Betas

To illustrate how to use the missing-data regression routines, estimate betas for 12 technology stocks, where a single stock (GOOG) is an IPO.

Load dates, total returns, and ticker symbols for the 12 stocks from the MAT-file

`CAPMuniverse`

.load CAPMuniverse whos Assets Data Dates

Name Size Bytes Class Attributes Assets 1x14 1568 cell Data 1471x14 164752 double Dates 1471x1 11768 double

The assets in the model have the following symbols, where the last two series are proxies for the market and the riskless asset:

Assets(1:7) Assets(8:14)

ans = 'AAPL' 'AMZN' 'CSCO' 'DELL' 'EBAY' 'GOOG' 'HPQ' ans = 'IBM' 'INTC' 'MSFT' 'ORCL' 'YHOO' 'MARKET' 'CASH'

The data covers the period from January 1, 2000 to November 7, 2005 with daily total returns. Two stocks in this universe have missing values that are represented by

`NaN`

s. One of the two stocks had an IPO during this period and, so, has significantly less data than the other stocks.Compute separate regressions for each stock, where the stocks with missing data have estimates that reflect their reduced observability.

[NumSamples, NumSeries] = size(Data); NumAssets = NumSeries - 2; StartDate = Dates(1); EndDate = Dates(end); fprintf(1,'Separate regressions with '); fprintf(1,'daily total return data from %s to %s ...\n', ... datestr(StartDate,1),datestr(EndDate,1)); fprintf(1,' %4s %-20s %-20s %-20s\n','','Alpha','Beta','Sigma'); fprintf(1,' ---- -------------------- '); fprintf(1,'-------------------- --------------------\n'); for i = 1:NumAssets % Set up separate asset data and design matrices TestData = zeros(NumSamples,1); TestDesign = zeros(NumSamples,2); TestData(:) = Data(:,i) - Data(:,14); TestDesign(:,1) = 1.0; TestDesign(:,2) = Data(:,13) - Data(:,14); % Estimate CAPM for each asset separately [Param, Covar] = ecmmvnrmle(TestData, TestDesign); % Estimate ideal standard errors for covariance parameters [StdParam, StdCovar] = ecmmvnrstd(TestData, TestDesign, ... Covar, 'fisher'); % Estimate sample standard errors for model parameters StdParam = ecmmvnrstd(TestData, TestDesign, Covar, 'hessian'); % Set up results for output Alpha = Param(1); Beta = Param(2); Sigma = sqrt(Covar); StdAlpha = StdParam(1); StdBeta = StdParam(2); StdSigma = sqrt(StdCovar); % Display estimates fprintf(' %4s %9.4f (%8.4f) %9.4f (%8.4f) %9.4f (%8.4f)\n', ... Assets{i},Alpha(1),abs(Alpha(1)/StdAlpha(1)), ... Beta(1),abs(Beta(1)/StdBeta(1)),Sigma(1),StdSigma(1)); end

This code fragment generates the following table.

Separate regressions with daily total return data from 03-Jan-2000 to 07-Nov-2005 ... Alpha Beta Sigma -------------------- -------------------- -------------------- AAPL 0.0012 ( 1.3882) 1.2294 ( 17.1839) 0.0322 ( 0.0062) AMZN 0.0006 ( 0.5326) 1.3661 ( 13.6579) 0.0449 ( 0.0086) CSCO -0.0002 ( 0.2878) 1.5653 ( 23.6085) 0.0298 ( 0.0057) DELL -0.0000 ( 0.0368) 1.2594 ( 22.2164) 0.0255 ( 0.0049) EBAY 0.0014 ( 1.4326) 1.3441 ( 16.0732) 0.0376 ( 0.0072) GOOG 0.0046 ( 3.2107) 0.3742 ( 1.7328) 0.0252 ( 0.0071) HPQ 0.0001 ( 0.1747) 1.3745 ( 24.2390) 0.0255 ( 0.0049) IBM -0.0000 ( 0.0312) 1.0807 ( 28.7576) 0.0169 ( 0.0032) INTC 0.0001 ( 0.1608) 1.6002 ( 27.3684) 0.0263 ( 0.0050) MSFT -0.0002 ( 0.4871) 1.1765 ( 27.4554) 0.0193 ( 0.0037) ORCL 0.0000 ( 0.0389) 1.5010 ( 21.1855) 0.0319 ( 0.0061) YHOO 0.0001 ( 0.1282) 1.6543 ( 19.3838) 0.0384 ( 0.0074)

The

`Alpha`

column contains alpha estimates for each stock that are near zero as expected. In addition, the t-statistics (which are enclosed in parentheses) generally reject the hypothesis that the alphas are nonzero at the 99.5% level of significance.The

`Beta`

column contains beta estimates for each stock that also have t-statistics enclosed in parentheses. For all stocks but GOOG, the hypothesis that the betas are nonzero is accepted at the 99.5% level of significance. It seems, however, that GOOG does not have enough data to obtain a meaningful estimate for beta since its t-statistic would imply rejection of the hypothesis of a nonzero beta.The

`Sigma`

column contains residual standard deviations, that is, estimates for nonsystematic risks. Instead of*t*-statistics, the associated standard errors for the residual standard deviations are enclosed in parentheses.

### Grouped Estimation of Some Technology Stock Betas

To estimate stock betas for all 12 stocks, set up a joint regression
model that groups all 12 stocks within a single design. (Since each
stock has the same design matrix, this model is actually an example
of seemingly unrelated regression.) The routine to estimate model
parameters is `ecmmvnrmle`

, and
the routine to estimate standard errors is `ecmmvnrstd`

.

Because GOOG has a significant number of missing values, a direct use of the missing data
routine `ecmmvnrmle`

takes 482 iterations to
converge. This can take a long time to compute. For the sake of brevity, the
parameter and covariance estimates after the first 480 iterations are contained in a
MAT-file and are used as initial estimates to compute stock betas.

load CAPMgroupparam whos Param0 Covar0

Name Size Bytes Class Attributes Covar0 12x12 1152 double Param0 24x1 192 double

Now estimate the parameters for the collection of 12 stocks.

fprintf(1,'\n'); fprintf(1,'Grouped regression with '); fprintf(1,'daily total return data from %s to %s ...\n', ... datestr(StartDate,1),datestr(EndDate,1)); fprintf(1,' %4s %-20s %-20s %-20s\n','','Alpha','Beta','Sigma'); fprintf(1,' ---- -------------------- '); fprintf(1,'-------------------- --------------------\n'); NumParams = 2 * NumAssets; % Set up grouped asset data and design matrices TestData = zeros(NumSamples, NumAssets); TestDesign = cell(NumSamples, 1); Design = zeros(NumAssets, NumParams); for k = 1:NumSamples for i = 1:NumAssets TestData(k,i) = Data(k,i) - Data(k,14); Design(i,2*i - 1) = 1.0; Design(i,2*i) = Data(k,13) - Data(k,14); end TestDesign{k} = Design; end % Estimate CAPM for all assets together with initial parameter % estimates [Param, Covar] = ecmmvnrmle(TestData, TestDesign, [], [], [],... Param0, Covar0); % Estimate ideal standard errors for covariance parameters [StdParam, StdCovar] = ecmmvnrstd(TestData, TestDesign, Covar,... 'fisher'); % Estimate sample standard errors for model parameters StdParam = ecmmvnrstd(TestData, TestDesign, Covar, 'hessian'); % Set up results for output Alpha = Param(1:2:end-1); Beta = Param(2:2:end); Sigma = sqrt(diag(Covar)); StdAlpha = StdParam(1:2:end-1); StdBeta = StdParam(2:2:end); StdSigma = sqrt(diag(StdCovar)); % Display estimates for i = 1:NumAssets fprintf(' %4s %9.4f (%8.4f) %9.4f (%8.4f) %9.4f (%8.4f)\n', ... Assets{i},Alpha(i),abs(Alpha(i)/StdAlpha(i)), ... Beta(i),abs(Beta(i)/StdBeta(i)),Sigma(i),StdSigma(i)); end

This code fragment generates the following table.

Grouped regression with daily total return data from 03-Jan-2000 to 07-Nov-2005 ... Alpha Beta Sigma ---------------------- ---------------------------------------- AAPL 0.0012 ( 1.3882) 1.2294 ( 17.1839) 0.0322 ( 0.0062) AMZN 0.0007 ( 0.6086) 1.3673 ( 13.6427) 0.0450 ( 0.0086) CSCO -0.0002 ( 0.2878) 1.5653 ( 23.6085) 0.0298 ( 0.0057) DELL -0.0000 ( 0.0368) 1.2594 ( 22.2164) 0.0255 ( 0.0049) EBAY 0.0014 ( 1.4326) 1.3441 ( 16.0732) 0.0376 ( 0.0072) GOOG 0.0041 ( 2.8907) 0.6173 ( 3.1100) 0.0337 ( 0.0065) HPQ 0.0001 ( 0.1747) 1.3745 ( 24.2390) 0.0255 ( 0.0049) IBM -0.0000 ( 0.0312) 1.0807 ( 28.7576) 0.0169 ( 0.0032) INTC 0.0001 ( 0.1608) 1.6002 ( 27.3684) 0.0263 ( 0.0050) MSFT -0.0002 ( 0.4871) 1.1765 ( 27.4554) 0.0193 ( 0.0037) ORCL 0.0000 ( 0.0389) 1.5010 ( 21.1855) 0.0319 ( 0.0061) YHOO 0.0001 ( 0.1282) 1.6543 ( 19.3838) 0.0384 ( 0.0074)

Although the results for complete-data stocks are the same, the beta estimates for AMZN and GOOG (the two stocks with missing values) are different from the estimates derived for each stock separately. Since AMZN has few missing values, the differences in the estimates are small. With GOOG, however, the differences are more pronounced.

The *t*-statistic for the beta estimate of GOOG is now significant at the
99.5% level of significance. However, the *t*-statistics for beta
estimates are based on standard errors from the sample Hessian which, in contrast to
the Fisher information matrix, accounts for the increased uncertainty in an estimate
due to missing values. If the *t*-statistic is obtained from the
more optimistic Fisher information matrix, the *t*-statistic for
GOOG is `8.25`

. Thus, despite the increase in uncertainty due to
missing data, GOOG nonetheless has a statistically significant estimate for
beta.

Finally, note that the beta estimate for GOOG is `0.62`

— a value that
may require some explanation. Although the market has been volatile over this period
with sideways price movements, GOOG has steadily appreciated in value. So, it is
less tightly correlated with the market, implying that it is less volatile than the
market (beta < 1).

### References

[1] Caines, Peter E. *Linear Stochastic Systems.*
John Wiley & Sons, Inc., 1988.

[2] Cramér, Harald. *Mathematical Methods of
Statistics.* Princeton University Press, 1946.

[3] Dempster, A.P, N.M. Laird, and D.B Rubin. “Maximum Likelihood
from Incomplete Data via the EM Algorithm.”*Journal of the
Royal Statistical Society, Series B.* Vol. 39, No. 1, 1977, pp.
1-37.

[4] Greene, William H. *Econometric Analysis.* 5th
ed., Pearson Education, Inc., 2003.

[5] Jarrow, R.A. *Finance Theory.* Prentice-Hall,
Inc., 1988.

[6] Lintner, J. “The Valuation of Risk Assets and the Selection of
Risky Investments in Stocks.” *Review of Economics and
Statistics.* Vol. 14, 1965, pp. 13-37.

[7] Little, Roderick J. A and Donald B. Rubin. *Statistical
Analysis with Missing Data.* 2nd ed., John Wiley & Sons,
Inc., 2002.

[8] Meng, Xiao-Li and Donald B. Rubin. “Maximum Likelihood
Estimation via the ECM Algorithm.” *Biometrika.*
Vol. 80, No. 2, 1993, pp. 267-278.

[9] Sexton, Joe and Anders Rygh Swensen. “ECM Algorithms that
Converge at the Rate of EM.” *Biometrika.* Vol. 87,
No. 3, 2000, pp. 651-662.

[10] Shafer, J. L. *Analysis of Incomplete Multivariate
Data.* Chapman & Hall/CRC, 1997.

[11] Sharpe, W. F. “Capital Asset Prices: A Theory of Market
Equilibrium Under Conditions of Risk.” *Journal of
Finance.* Vol. 19, 1964, pp. 425-442.

[12] Sharpe, W. F., G. J. Alexander, and J. V. Bailey.
*Investments.* 6th ed., Prentice-Hall, Inc., 1999.

## See Also

`mvnrmle`

| `mvnrstd`

| `mvnrfish`

| `mvnrobj`

| `ecmmvnrmle`

| `ecmmvnrstd`

| `ecmmvnrfish`

| `ecmmvnrobj`

| `ecmlsrmle`

| `ecmlsrobj`

| `ecmmvnrstd`

| `ecmmvnrfish`

| `ecmnmle`

| `ecmnstd`

| `ecmnfish`

| `ecmnhess`

| `ecmnobj`

| `convert2sur`

| `ecmninit`