General Problem Objectives:
Often in Industry, a worker has to test numerical procedures before selecting which one will be used in production numerical procedures. In MATLAB you can fit dependent ordinate data (e.g. in y) as a function of the independent coordinate x by two (2) of three (3) methods as, long as the function being fitted is a polynomial. The model polynomials will be cubics (degree 3) and will be used to fit separately quarterly Stock Market Index Log-Return Means and quarterly Log-Return Volatilities (Standard Deviations) for Standard and Poor's 500 Data for the 16 years from 1988-2003.
Choose at least one of the first two fit methods in addition to SVD:
The objective is to compare two of three methods by fitting to existing data to see which one of the two you should recommend to the boss.
Problem Statement:
The problem is to compare two of the three methods on the Standard and Poor's 500 Stock Index Data for the 16 years from 1988 to 2003 by fitting both the quarterly stock log-return means and log-return volatilities (standard deviations).
The data in raw form is available at Yahoo! Finance, ``S&P 500, Symbol ^GSPC, Historical Prices,'' at starting at URL:
then Set Date Range -> Get Prices -> Download to Spreadsheet. The data is a table of Date, Open, High, Low, Close, Volume, Adj. Close items in reverse chronological order, but has been converted and edited to plain text format listing the date and closings only at
with a convenient slash field delimiters for the date file with other MS/Excl manipulations to simplify the input. A sample record or single line is the first three lines of the date-closing file:
31/12/2003 1111.92 30/12/2003 1109.64 29/12/2003 1109.48
and the last three lines:
05/01/1988 258.64 04/01/1988 255.94 31/12/1987 247.09
representing the "Day/Month/Year Close" fields, respectively. The very last line is for the last trading day in 1987 to allow calculation of the change for the first day of 1988. Also, the data comes in backward time order, so that the data will have to be put in forward order.
The number of trading days is about 250 per year, but varies slightly, while the market is closed on weekends and holidays, so that these factors need to be taken into account to avoid bias when the data is used in investment models. Hence, the days of each year must be re numbered to exclude non-trading days and renormalized as the fraction of trading days for each year. Reading the needed data is also complicated, so if desired the following MATLAB script can be used to read the numerical data fields separately:
fiddateclose = fopen('sp500y88toy03dateclosings.txt','r');
[Dates,ndata] = fscanf(fiddateclose,'%2f/%2f/%4f %*f',[3 inf]); % form in 3 rows;
frewind(fiddateclose);
[Closes,ndata] = fscanf(fiddateclose,'%*2f/%*2f/%*4f %f',[1 inf]); % form in 1 row;
statusdateclose = fclose(fiddateclose);
[ndaterow,ndatecol] = size(Dates); % Dates array size;
ndays = ndatecol; % Number of total trading days.
Note that the data is stored as one long single row vector of length ndatecol which is the total number of trading days. It is suggested that you separate the long row vectors into numerical vectors for each Day, Month, Year, Close in proper time order.
The objective in this problem is to compute the log-returns (roughly the relative return for small changes) of all the closing vector, for example
LogReturn = log(Close(1,ndatecol-1:-1:1)) - log(Close(1,ndatecol:-1:2)) ; % Note reverse time ordering and having one less element due to the difference. % Correction: Log-Return corrected to forward time order.
For each of the 4*16 = 64 quarters take the midpoint of the quarter
TM = (1988-1) + iy + 0.25/2 + 0.25*(iqy-1),
noting "(1988-1)" has replaced "1988" since iy=1 should start TM with 1988, for iy = 1:16 years and iqy = 1:4 quarters (Jan-Mar, Apr-Jun, Jul-Sep, Oct-Dec) for each year. A better calculation would just count official trading day and take the fraction representing the midpoint of the quarter's trading days converted into fractions of a year, but the above formula should be used for simplicity.
The ultimate objective is to fit the means for each quarter using the MATLAB mean function and the volatilities for each quarter using the MATLAB mean function std function.
Two cubic polynomial fit models are required, one for quarterly log-return means and one for quarterly log-return volatilities. Assign the quarterly means and volatilities to the time at
TM(iqy + 4*(iy-1)).
Using polyfit, MATLAB will rightly complain that the fit is ill-conditioned, so you must center (c) and scale (s) the quarterly time at mid-quarter, such that
tpoly = (TM-mean(TM))./std(TM);
Some Hints on Methods:
>> help ones
>> help size
>> A=[ones(size(x)) x x.^2 x.^3]; % for cubic model fitting
noting that array or component-wise exponentiation operation (.^) is needed rather than regular matrix exponentiation (^) (WATCH those periods!).
"a = apoly",
"x = tpoly"
"y = y-data"
vector data, then use polyval to compute the the predicted values of
help polyval and help polyfit
>> aslash = A\y;
say, since when A has more rows (m) that columns (n) the back-slash also finds the least squares solution instead of the inverse when m = n. Do not forget that you have to find the predicted values y=yslash given tpoly.
A = U*D*V',
>> asvd = V*(D\(U'*y));
Again you need to find the predicted y-values, say ysvd and the corresponding values, say ypoly, yslash, ysvd.
General Instructions::
For the two methods fit cubic polynomial models to both quarterly means and quarterly volatilities, you must also present documented output for
Project Report:
Your professional, individal report needs the following parts:
Project Resources:
MCS 507 HomePage: http://www.math.uic.edu/~hanson/mcs507/
Email Comments or Questions (MCS 507 only please) to Professor Hanson, hanson A T uic edu