spatial temporal autoregressive model for prdictio: January 2012

.LOG
-can we assign the bettas to the lm.starlag, lm.starerr, lm.stargen
-^correlation between variables
-listw2mat for the STAR weights.
-different random selection
-wald test
-different start value of optimization
optimiztaion does not respond to many running. all of them are the

same.
-different Lphi function
-what if the matrices are not lower triangular
-why minus in phis
-is there a minus in logliklihood?
"Tue Jan 17 10:52:16 2012"
we rand the experiments on lag,error and general. the result shows

that the star model is incompetent in the prediction process and is

always behind the other models and in the best situation is works like

the base model. We should try and remove the heteroskedasticity.

Also there is a big problem and it is that since our star model is

constructed based on literature, and it differs with other models in that

it dose not come from lm(), this could be the problem. Since some of

the preliminery results for the error model showed good stats

(pdpi>50) whihc is good. However for the case of clarity and prove

that our model works we will proceed.
"Tue Jan 17 10:58:00 2012"
time to make a backup of our data.
"Tue Jan 17 11:00:56 2012"
now we will proceed with another probable cause of variation which

is the likelihood function. we will try to see the variation results.
the issue of no neighbor for time and in face some of the weights that

are zero can be annoying.

"Tue Jan 17 13:00:37 2012"
definitly our weight matrices are not making any difference. we shuld

then try to see
-at least one different neighbour
-at least one different weight
-remove triangulation and see
"Tue Jan 17 13:27:56 2012"
nope, the matrices of time and space are not making any difference at

all.... the result shows that my star model is not effective at all and in

the best scenario,
"Tue Jan 17 13:59:27 2012"
at least the result shows that -since the result of error star model

result is the same as the lm.base model, the code for making the test

results is correct. and of course all the computations does not make

any difference. so now I am going to test if remving triangulation

works.
the time matrix is lower triangular by itself since we order ascending.

But the mulitiplication is not lower triangular.
TS is row standardized with exactly 11 zeros:
11 regions with no links:
1 2 3 20 21 23 125 142 146 174 175
ST is not row standardized. does not have any zeros. but it has a

combination of 1s and other less than 1 values. it has 170 of 1 and 54

other values.
-test remove lower triangularity
the result showed not improvement with T only in the optimization.

Now repeating with full parameters.
"Tue Jan 17 14:39:42 2012"
nopes, it didnt have any impact whatsoever.
checking to see how the true residual will work in the prediction.
"Tue Jan 17 14:57:20 2012"
nope true residual also did not work reducing heteroskedasticity.

definitly there was an impact though since (I-W) has some.
-checking (I-W): it is row-standardized and of course contains

negative records.

Next: what if we say we want to not work on pur y and instead we

want to work on pure e so that we will workout the prediciton

formula that has nothing attached to e. and we will compute the yhat

again and see what will be the impact.

"Tue Jan 17 16:22:01 2012"
The break through was that the betas coming from the regression

model (errorsarlm () style) was completely conformant to the betas

that we computed manually.
now lets see how this will help our prediction.
nope. it didnt. the %pdp is 25 that it was.

"Tue Jan 17 16:57:23 2012"
my study shows that there is perfect relationshp betwee the lm.starerr

and lm.base. their residulas are conforming to a strait line, their

coefficients are almost the same. their fitted values are also almost the

same. These results are from error model with full matrix neighbours

(not lower triangular spatial).

Next we want to check if using only S will be the same as the

lm.errorsarlm results?
this will be a good test of our procedure since we will only optimze

on S (which is the rho for error sarlm) .
the result shows that using only S in our procedure does not create a

result close to errorsarlm().
between IV%*%(yV -

(ter1RIorigV_S1aWT1W_STARW_ERR8%*%yV))
and (IV%*%(yV - (listw2mat(ter1RIorigV_S1aW)%*%yV)))-yV
is alot of difference. that mean my weight matrix does not actually add

anything and if fact it is nothing.

Next: what is the difference between ter1RIorigV_S1aW_matrix and

ter1RIorigV_STARW_matrix?
can we use the later in the errorsarlm?
I did use mat2listw for the W matrix that I created with only S

parameter and it worked (probably because we dont have negtive

weights on the final W) and I insert it to the errorsarlm. Got a lot of

warnings and the result was the same as using the lagged varible

inside errorsarlm.
As if my W does not make any difference at all whatsoever.
"Tue Jan 17 18:12:00 2012"
the conclusion is that our laged varibles are not working well. we

need to check the other neighbours.
MFA is minus in coefficient while for us and lm.base it is positive.
> lm.errorsarlm

Call:
errorsarlm(formula = yV ~ XV - 1, listw = ter1RIorigV_S1aW,

method = "spam")
Type: error

Coefficients:
lambda XVLA XVMFA XVAFA XVCBDdist
0.851 230.696 -17.694 636.777 2.814

Log likelihood: -1111
> lm(yV~XV-1)

Call:
lm(formula = yV ~ XV - 1)

Coefficients:
XVLA XVMFA XVAFA XVCBDdist
329.778 606.487 827.800 0.315

> lm.starerr

Call:
lm(formula = IV %*% (yV -

(ter1RIorigV_S1aWT1W_STARW_ERR8 %*%
yV)) ~ IV %*% (XV - ter1RIorigV_S1aWT1W_STARW_ERR8

%*% XV) -
1)

Coefficients:
IV %*% (XV - ter1RIorigV_S1aWT1W_STARW_ERR8 %*%

XV)LA
329.778
IV %*% (XV - ter1RIorigV_S1aWT1W_STARW_ERR8 %*%

XV)MFA
606.485
IV %*% (XV - ter1RIorigV_S1aWT1W_STARW_ERR8 %*%

XV)AFA
827.799
IV %*% (XV - ter1RIorigV_S1aWT1W_STARW_ERR8 %*%

XV)CBDdist
0.315

"Tue Jan 17 18:16:18 2012"
checking diferent time neighbours to see if it will make a difference.

"Tue Jan 17 22:29:34 2012"
next we did for T3 of 1 year or 365 days to see how it goes. for T2

that was 200 days the pdpi droped to 24 from original 25.
T3 shows only 24 again. and the same results for heteroskedasticity.

all of the above was for error model.
all the codes that we have done till now should be corrrected because

we have changed the code for the starerr model and now we are

using the lm.starerr instead of doing the manual computation over the

AIC, bptest and real estate indicators.

"Wed Jan 18 13:08:22 2012"
running ter1.r to make different samples for ter1RIorigV and M. to

see the effect. and will also run test.r which is basically ERR1 with

lower triangularity of S not removed but it is row-standardized since

we used style W. Code for making sure that the S and T are lower

triangular and rowstandardized in commened out. these are test

results for making the raw data ter1RIorigM and V using different

random sampling.
tab
AIC bp.pvalue bp.statistic Median Ri COD PRD %PDP
[1,] 2614 0.00e+00 6.09e-32 1.15 23.7 1.24 28.1
[2,] 2578 2.71e-07 3.34e+01 1.08 59.4 1.23 15.6
[3,] 2579 2.61e-07 3.34e+01 1.11 57.1 1.23 15.6
[4,] 2578 2.71e-07 3.34e+01 1.09 59.4 1.23 15.6
AIC bp.pvalue bp.statistic Median Ri COD PRD %PDP
[1,] 2257 0.0000 1.87e-33 0.990 18.9 1.05 42.7
[2,] 2275 0.0291 9.01e+00 0.940 23.4 1.04 22.9
[3,] 2226 0.8131 9.51e-01 0.986 13.6 1.03 55.2
[4,] 2276 0.0264 9.23e+00 0.932 23.5 1.04 22.9
AIC bp.pvalue bp.statistic Median Ri COD PRD %PDP
[1,] 2268 0.000 3.62e-31 0.985 25.5 1.10 34.4
[2,] 2291 0.119 5.85e+00 0.971 26.2 1.07 26.0
[3,] 2240 0.425 2.79e+00 0.972 17.5 1.06 55.2
[4,] 2291 0.121 5.82e+00 0.942 26.5 1.07 28.1
AIC bp.pvalue bp.statistic Median Ri COD PRD %PDP
[1,] 2368 0.0000 4.98e-34 1.008 39.5 1.26 40.6
[2,] 2377 0.0784 6.80e+00 0.992 31.8 1.13 26.0
[3,] 2367 0.0666 7.17e+00 1.000 28.8 1.15 32.3
[4,] 2377 0.0694 7.08e+00 0.958 32.3 1.12 25.0
AIC bp.pvalue bp.statistic Median Ri COD PRD %PDP
[1,] 2612 0.00e+00 3.91e-32 1.14 27.8 1.27 35.4
[2,] 2580 2.81e-06 2.85e+01 1.23 49.9 1.23 17.7
[3,] 2578 2.50e-06 2.88e+01 1.17 50.1 1.20 16.7
[4,] 2580 2.81e-06 2.85e+01 1.23 49.9 1.23 17.7
AIC bp.pvalue bp.statistic Median Ri COD PRD %PDP
[1,] 2297 0.000 9.24e-32 1.007 26.3 1.10 32.3
[2,] 2295 0.227 4.35e+00 1.004 22.5 1.05 30.2
[3,] 2238 0.979 1.94e-01 0.999 14.4 1.04 56.2
[4,] 2295 0.230 4.31e+00 1.009 22.6 1.05 28.1
AIC bp.pvalue bp.statistic Median Ri COD PRD %PDP
[1,] 2297 0.000 9.24e-32 1.007 26.3 1.10 32.3
[2,] 2295 0.227 4.35e+00 1.004 22.5 1.05 30.2
[3,] 2238 0.979 1.94e-01 0.999 14.4 1.04 56.2
[4,] 2295 0.230 4.31e+00 1.009 22.6 1.05 28.1
AIC bp.pvalue bp.statistic Median Ri COD PRD %PDP
[1,] 2358 0.000 4.23e-32 1.04 23.1 1.11 35.4
[2,] 2364 0.457 2.60e+00 1.04 23.7 1.08 26.0
[3,] 2353 0.403 2.92e+00 1.04 18.5 1.07 37.5
[4,] 2364 0.455 2.61e+00 1.03 23.7 1.08 25.0
AIC bp.pvalue bp.statistic Median Ri COD PRD %PDP
[1,] 2290 0.0000 9.94e-32 1.03 21.2 1.07 35.4
[2,] 2308 0.0984 6.29e+00 1.02 23.1 1.05 20.8
[3,] 2267 0.4758 2.50e+00 1.01 15.1 1.04 50.0
[4,] 2308 0.0983 6.29e+00 1.02 23.1 1.05 20.8

"Wed Jan 18 15:25:46 2012"
Yes the MRA and STAR are the same. That means my weigth lists

are not making any difference.
Repeat with the code that were commented out is not back to the

game.
AIC bp.pvalue bp.statistic Median Ri COD PRD %PDP
[1,] 2290 0.0000 9.94e-32 1.03 21.2 1.07 35.4
[2,] 2308 0.0984 6.29e+00 1.02 23.1 1.05 20.8
[3,] 2267 0.4758 2.50e+00 1.01 15.1 1.04 50.0
[4,] 2308 0.1024 6.20e+00 1.01 23.2 1.05 22.9

NEXT: cleanly see if we can reproduce the result of "sptial only" using

lagsarlm and errorsarlm using the spatial only weighting matrix and

optim or nlminb. This is the bottleneck and if we can solve it, I guess

we will be able to have good results.
The Idea is to recreate the
1) optimization parameter or rho and lambda that is done inside the

errorsarlm and lagsarlm with our procedure. so that we could see if

the sptatial autoregressive parameter coming out of the errorsarlm

and lagsarlm is equal to our optimization parameter.
2)later we should see if the betas are the same.

"Fri Jan 20 14:00:00 2012"
Starting with the imposition 1. we will only convert the listw opject

create by the nb2listw to matrix using mat2listw and then we will use

the that in optimiztion and in the errorsarlm and lag sarlm to see what

is the value for rho and then we will compare it.
by studying the lagsarlm and errorsarlm it turns out that this code

does seperate w from phi*w in the code. wo we are going to create a

wx and wy and will use the rho or lambad inside the regression which

is now simply called phi.
ok the result. showed that the optimize function and nlminb functions

create the same opjective (3134 for optimize there was a minus

before that) and the parameter is 0.21. This however was different

than the lagsarlm which showed 0.13 and log-liklihood was -2925.
increasing the tol.solve in lagsarlm made possible using eigen method

but the result is the same as before.

"Fri Jan 20 16:56:58 2012"
I couldnt find out how the lagsarlm could do the optimization. it

stocks at ..... So I gues we need to check out how the optimizatio is

done for error and lag in the literature. The candidates will be the

Anseline book, the the lesage tutorial with matlab and the Lesage

book.
Also changing lagsarlm and errorsarlm to accomodate ours and also

ingonring the problem and continuting with the experiment and see

how it will show up .
Finall we should ask roger himself for the problem.

"Fri Jan 20 18:47:28 2012"
it seems that the models with intercept are more accurate in terms of

sign of the betas.
Call:
lm(formula = yV ~ 1)

Coefficients:
(Intercept)
133799

Call:
lm(formula = yV ~ XV - 1)

Coefficients:
XVLA XVMFA XVAFA XVCBDdist
407.538 548.856 644.434 0.442

Call:
lm(formula = yV ~ XV)

Coefficients:
(Intercept) XVLA XVMFA XVAFA XVCBDdist
155480.59 221.58 -138.44 538.92 -2.62

Call:
lagsarlm(formula = yV ~ XV - 1, listw = ter1RIorigV_S1aW,

tol.solve = 1e-40)
Type: lag

Coefficients:
rho XVLA XVMFA XVAFA XVCBDdist
0.626 237.221 86.271 579.197 -0.301

Log likelihood: -1121

Call:
lagsarlm(formula = yV ~ XV, listw = ter1RIorigV_S1aW, tol.solve =

1e-40)
Type: lag

Coefficients:
rho (Intercept) XVLA XVMFA XVAFA

XVCBDdist
2.66e-01 1.03e+05 2.12e+02 -1.04e+02 5.47e+02 -

1.90e+00

Log likelihood: -1114

my own: rho=0.21
(solve(t(XV)%*%XV))%*%(t(XV))%*%(IV-

ter1RIorigV_S1aWT1W_STARW_LAG1)%*%yV
[,1]
LA 350.408
MFA 393.690
AFA 622.551
CBDdist 0.193

I read the manual for the le sage matlab code and it was interesting

that it had a procedure special to each model to find the parameters.

for example for error model it first runs a simple regression, then it

will take betas and residuals for that simple "base" regression to find

the best lambda (for us are phis). the criteria was to reduce the error

model close to error of the base model. I dont know why.
second the interesting things was as I reduced number of days two

observations were apart, the criteria for pdp was increaseing so that

for 2 days apart I had 34 for pdp and 23 for cod. this shows that the

combinations for the spatial and temporal arrangements should have

an impact on the results:
AIC bp.pvalue bp.statistic Median Ri COD PRD %PDP
[1,] 2297 0.000 9.24e-32 1.007 26.3 1.10 32.3
[2,] 2295 0.227 4.35e+00 1.004 22.5 1.05 30.2
[3,] 2227 0.970 5.36e-01 1.019 13.6 1.04 55.2
[4,] 2287 0.294 3.71e+00 0.917 23.4 1.05 34.4
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
the model with intercept and 2 days aprat:
> tab
AIC bp.pvalue bp.statistic Median Ri COD PRD %PDP
[1,] 2297 0.000 9.24e-32 1.007 26.3 1.10 32.3
[2,] 2234 0.961 6.20e-01 1.023 14.9 1.05 52.1
[3,] 2227 0.970 5.36e-01 1.019 13.6 1.04 55.2
[4,] 2231 0.954 6.77e-01 0.931 15.3 1.04 49.0
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
changing the weight matrices for both time and space to B and 2 days

:
AIC bp.pvalue bp.statistic Median Ri COD PRD %PDP
[1,] 2297 0.000 9.24e-32 1.007 26.3 1.10 32.29
[2,] 2234 0.961 6.20e-01 1.023 14.9 1.05 52.08
[3,] 2229 0.941 7.78e-01 1.018 13.9 1.04 56.25
[4,] 2245 0.797 1.67e+00 0.729 17.0 1.04 8.33

"Mon Jan 23 00:25:58 2012"
1- Prof bivand answer
He gave me his maximum likelihood function and also introduced me

to the problem of nlminb for estimating the parameters. I should see

my likelihood and check if something is missing there. I guess it

should be i-eigw.
2- splm package
this package tries to extend fixed effect models with panel data and

then will try to introduce the lag, error and general models to this kind

of problem. I read quite a log of stuff about the random and fixed

effect models as well mixed models that will help me to comprehend

better.
my observation is that I might be able to see and incorporate the

code for my lag, error and general model from splm and more

importantly, I might be able to see how their ml function is formed .
3.- I noticed that splm uses intercept model. why not we try model

with intercept or just add another option for our grand comparison.
4. see if we can implement Husman test in our thing.
5- extend the the lesage methodology (using his FAR code) to see if

we can improve our LAG models.

"Wed Jan 25 13:42:01 2012"
will start on the answer of bivand to see how we can improve our

prediction. Till now we could only improve the prediction with

reducing the number of days the transactions are different. we are

developing on the "test for imposition 2 lesage way" whihc will be our

benchmark for the problem.

"Wed Jan 25 14:12:30 2012"
I checked the code for lagsarlm. I think eventually the code will

resemble the Lesage case where both perform regression on (y~x-1)

and (wy~wx-1) and then tries to converge both residuals to find the

best set of parameters. The code for Bivand is very complicated and

he only gave me a clue about his lagsarlm likelihood function. I need

to see how I can skip the do_idet part.

10:27 PM 1/23/2012
the bivand ways need more knowledge and search since it uses a lot

of measures and also most of these measures are for speeding up the

process of finding the best parameter. So for now I think we will skip

looking at the bivand code and will see if we can come back after

some time and check again. especially after reading the Le sage book

on spatial econometrics.
"Wed Jan 25 14:49:42 2012"
the result for
Spatial weight matrix: lower-triangular and row-sochastic,

timed=2days, all models with intercept:
tab
AIC bp.pvalue bp.statistic Median Ri COD PRD %PDP
[1,] 2297 0.000 9.24e-32 1.007 26.3 1.10 32.3
[2,] 2234 0.961 6.20e-01 1.023 14.9 1.05 52.1
[3,] 2227 0.970 5.36e-01 1.019 13.6 1.04 55.2
[4,] 2231 0.954 6.77e-01 0.931 15.3 1.04 49.0
this means that lowertriangular and rowstandardization does not have

an impact on the result, time however still does and the results are

behind. repeating the experiment with all models whithout intercept.
AIC bp.pvalue bp.statistic Median Ri COD PRD %PDP
[1,] 2297 0.000 9.24e-32 1.007 26.3 1.10 32.3
[2,] 2295 0.227 4.35e+00 1.004 22.5 1.05 30.2
[3,] 2238 0.979 1.94e-01 0.999 14.4 1.04 56.2
[4,] 2287 0.294 3.71e+00 0.917 23.4 1.05 34.4
model without intercept actually reduced our resul dramatically.
LOOKS LIKE THERE IS A GOOD RELATIONSHIP BETWEEN

THE GOOD PDPI RESULTS AND THE AMOUNT OF

HETERSKEDASTICITY THAT IS REMOVED BY THE

MODELS.
NEXT: Add coeficients to the table.
"Wed Jan 25 16:45:53 2012"
added the r-squared and also the betas.
AIC bp.pvalue bp.statistic Median Ri COD PRD %PDP R-

squared
[1,] 2297 0.000 9.24e-32 1.007 26.3 1.10 32.3 0.000
[2,] 2234 0.961 6.20e-01 1.023 14.9 1.05 52.1 0.525
[3,] 2227 0.970 5.36e-01 1.019 13.6 1.04 55.2 0.569
[4,] 2231 0.954 6.77e-01 0.931 15.3 1.04 49.0 0.519

intercept LA MFA AFA CBDdist
[1,] 125910 NA NA NA NA
[2,] 137469 335 -40.88 318 -2.98
[3,] 126266 337 2.93 361 -2.66
[4,] 120221 350 -30.42 362 -2.84
NEXT: Correct for the LAG model based on the lasage code and

also the previous T3ERR1 code.
"Wed Jan 25 23:15:57 2012"
The imposition 1 was designed for one parameter so I am going to

have to change the LAG1 into the new process which is
1) change the optimization algorithm
2) add the intercept to the models
3) add the rsquare and betas to the code
4).....
DONT FORGET TO EQUIP THE CODE FOR ERROR AND

GENERAL TOO.
The lag code also gets the betas that we dont need. remove it from

the code.
"Thu Jan 26 00:27:49 2012"
reading La sage manual for optimzing the code for the lag.

MAIN ERR code: test of imposition 2 the le sage way.R
MAIN LAG code: test of impostition 1 lag 2.R

FACTFACT: if the matrix is not row standardized the minimum and

maximum values for the autocorrelation parameters are set by min

and max of the eigenvalues of the W. So we need to set that if our

weight matrix is not syle=W.
FACT: Lesage also return a negative of the log likelihood. So we

need to put a negative in front of the objective.
FACT: least squar is biased. we use ML to find parameters. The

dispersion of the parameters estimated using ML are made with

partial derivative over all parameters in the Log likelihood. Fisher

information matrix is the covariance matrix of all the estimated

paraemters: phis, sigma and betas. so here we will have for example:

4+1+4.
IDEA: follow the code from Le sage to create a t-statistics for our

estimate p. 62.
IDEA: see if we can make the information matrix for our parameters

using the R partial derivative and theta=c(phis,sigma,betas). or

whithoug betas since in our likelihood we will have only phis and

sigma.
IDEA: use temporal weight for error and spatial weights for error.

W1=spatial, W2=temporal. like the general spatial autoregressive

model.
IDEA:see if we can implement Husman test in our thing.
x_IDEA: add the coeficients to the table.
IDEA: read the damn Lesage book
x_IDEA: R square
IDEA: death to kappa
x_IDEA: make the lower boundary of phis as 0 instead of 1 and: this

has resolved since I now get good coeficients for phis see what

happens.
PROBLEM: should ask roger why mat2listw with the same listw that

has been converted to matrix gives error in errorsarlm.
PROBLEM: what is the difference between (I-W) and (I-eigW)?I

guess this is for increasing the speed of computation only.
x_NEXT: check to see if change yhat will make any differene: yes it

somehow does and we decided to continue with the procedure in the

errorsarlm and lagsarlm codes to find the rho and betas. and we dont

need to do it since the regression takes care of it.
IDEA-test add some value to no neighbour in time.

1:40 PM 1/25/2012

1:59 PM 1/25/2012

5:02 PM 1/25/2012

10:11 PM 1/26/2012

12:55 PM 1/27/2012

5:31 PM 1/27/2012

4:19 PM 1/28/2012

spatial temporal autoregressive model for prdictio

Saturday, January 28, 2012

Total Pageviews