Dear StataListers and Brilliant Fellow Travellers:
I have run into difficulties using GMM for a two equation system. Let me explain that I am using the standard GMM command, and not the more advanced system GMM commands. I merely have a two equation system to which I am applying GMM. As I shall demonstrate, there appears to be a problem within GMM in handling factor variables. By way of explanation, I start by running a Reg3 version of my model. That runs on the first try, including all the factor variables that run into trouble in my GMM models. Of course, Reg3 assumes homoskedastic errors and it uses all the instruments in every equation (see Theil, 1971). Those restrictions are the reason for my GMM specifcations, where I allow for clustering (my data are a panel), and where I allow the instruments to differ between the two equations. Please let me show you three models, two of which run, while the third does not run. After I show you the models, I'll editorialize a little.
First model, using Reg3. This runs without any difficulties.
reg3 (eq1: L2.Res_Count_All i.Fieldn i.Pref_Namen i.Interval
wdin_cit other_fields other_locs dmilgovrd dmfgrd avedist
res_all_MSA_1) (eq2: Frac_Citations_5 L2.Res_All_Sq i.Fieldn
i.Pref_Namen i.Interval), endog(wdin_cit L2.Res_All_Sq)
exog(swdift_cit owdift_cit pop_1980) vce(cluster Node_Name);
Second model, using GMM. This runs, but as others on this forum have noted, the factor variables cause the command to execute slowly.
gmm (L2.Res_Count_All - {xb: i.Fieldn i.Interval
wdin_cit other_fields other_locs dmilgovrd dmfgrd avedist
res_all_MSA_1 _cons}) (Frac_Citations_5 - {xc: i.Fieldn
i.Interval L2.Res_All_Sq _cons}),
winitial(unadjusted, independent) wmatrix(cluster Node_Name)
twostep instruments(1: i.Fieldn i.Interval
other_fields other_locs dmilgovrd dmfgrd avedist res_all_MSA_1
swdift_cit owdift_cit) instruments(2: i.Fieldn
i.Interval swdift_cit owdift_cit pop_1980)
variables(L2.Res_Count_All i.Fieldn
i.Interval other_fields other_locs dmilgovrd dmfgrd avedist
res_all_MSA_1 swdift_cit owdift_cit L2.Res_All_Sq pop_1980);
Third model, using GMM. This does not run and gives the well-known but inscrutable error message r(498).
gmm (L2.Res_Count_All - {xb: i.Fieldn i.Pref_Namen i.Interval
wdin_cit other_fields other_locs dmilgovrd dmfgrd avedist
res_all_MSA_1 _cons}) (Frac_Citations_5 - {xc: i.Fieldn
i.Pref_Namen i.Interval L2.Res_All_Sq _cons}),
winitial(unadjusted, independent) wmatrix(cluster Node_Name)
twostep instruments(1: i.Fieldn i.Pref_Namen i.Interval
other_fields other_locs dmilgovrd dmfgrd avedist res_all_MSA_1
swdift_cit owdift_cit) instruments(2: i.Fieldn i.Pref_Namen
i.Interval swdift_cit owdift_cit pop_1980)
variables(L2.Res_Count_All i.Fieldn i.Pref_Namen
i.Interval other_fields other_locs dmilgovrd dmfgrd avedist
res_all_MSA_1 swdift_cit owdift_cit L2.Res_All_Sq pop_1980);
The second model omits the factor variable i.Pref_Namen while the third model includes i.Pref_Namen. i.Pref_Namen forms a long vector of about 250 dummy variables and it is part of the story that I want to tell. By way of contrast the factor variables i.Interval and i.Fieldn respectively contain 4 and 19 dummy variables, much shorter than i.Pref_Namen.
I have been over the data many times and the data are fine--look at the account of my Reg3 results and look at my account of the simpler GMM results. I don't think that an upload of my data will make any difference. And by the way, Stata does not care that I lag a couple of variables, which are lagged two periods, I have tested that. And I tsset the data before lagging.
I don't think that I have a data problem. It has occurred to me that I could manually create all the dummy variables, but why should I have to do that and in any event, would it work if I did it?
If anyone at Stata has come across this kind of issue, I am all ears concerning what you have found. Any suggestions would be greatly appreciated.
Sincerely,
James Adams
I have run into difficulties using GMM for a two equation system. Let me explain that I am using the standard GMM command, and not the more advanced system GMM commands. I merely have a two equation system to which I am applying GMM. As I shall demonstrate, there appears to be a problem within GMM in handling factor variables. By way of explanation, I start by running a Reg3 version of my model. That runs on the first try, including all the factor variables that run into trouble in my GMM models. Of course, Reg3 assumes homoskedastic errors and it uses all the instruments in every equation (see Theil, 1971). Those restrictions are the reason for my GMM specifcations, where I allow for clustering (my data are a panel), and where I allow the instruments to differ between the two equations. Please let me show you three models, two of which run, while the third does not run. After I show you the models, I'll editorialize a little.
First model, using Reg3. This runs without any difficulties.
reg3 (eq1: L2.Res_Count_All i.Fieldn i.Pref_Namen i.Interval
wdin_cit other_fields other_locs dmilgovrd dmfgrd avedist
res_all_MSA_1) (eq2: Frac_Citations_5 L2.Res_All_Sq i.Fieldn
i.Pref_Namen i.Interval), endog(wdin_cit L2.Res_All_Sq)
exog(swdift_cit owdift_cit pop_1980) vce(cluster Node_Name);
Second model, using GMM. This runs, but as others on this forum have noted, the factor variables cause the command to execute slowly.
gmm (L2.Res_Count_All - {xb: i.Fieldn i.Interval
wdin_cit other_fields other_locs dmilgovrd dmfgrd avedist
res_all_MSA_1 _cons}) (Frac_Citations_5 - {xc: i.Fieldn
i.Interval L2.Res_All_Sq _cons}),
winitial(unadjusted, independent) wmatrix(cluster Node_Name)
twostep instruments(1: i.Fieldn i.Interval
other_fields other_locs dmilgovrd dmfgrd avedist res_all_MSA_1
swdift_cit owdift_cit) instruments(2: i.Fieldn
i.Interval swdift_cit owdift_cit pop_1980)
variables(L2.Res_Count_All i.Fieldn
i.Interval other_fields other_locs dmilgovrd dmfgrd avedist
res_all_MSA_1 swdift_cit owdift_cit L2.Res_All_Sq pop_1980);
Third model, using GMM. This does not run and gives the well-known but inscrutable error message r(498).
gmm (L2.Res_Count_All - {xb: i.Fieldn i.Pref_Namen i.Interval
wdin_cit other_fields other_locs dmilgovrd dmfgrd avedist
res_all_MSA_1 _cons}) (Frac_Citations_5 - {xc: i.Fieldn
i.Pref_Namen i.Interval L2.Res_All_Sq _cons}),
winitial(unadjusted, independent) wmatrix(cluster Node_Name)
twostep instruments(1: i.Fieldn i.Pref_Namen i.Interval
other_fields other_locs dmilgovrd dmfgrd avedist res_all_MSA_1
swdift_cit owdift_cit) instruments(2: i.Fieldn i.Pref_Namen
i.Interval swdift_cit owdift_cit pop_1980)
variables(L2.Res_Count_All i.Fieldn i.Pref_Namen
i.Interval other_fields other_locs dmilgovrd dmfgrd avedist
res_all_MSA_1 swdift_cit owdift_cit L2.Res_All_Sq pop_1980);
The second model omits the factor variable i.Pref_Namen while the third model includes i.Pref_Namen. i.Pref_Namen forms a long vector of about 250 dummy variables and it is part of the story that I want to tell. By way of contrast the factor variables i.Interval and i.Fieldn respectively contain 4 and 19 dummy variables, much shorter than i.Pref_Namen.
I have been over the data many times and the data are fine--look at the account of my Reg3 results and look at my account of the simpler GMM results. I don't think that an upload of my data will make any difference. And by the way, Stata does not care that I lag a couple of variables, which are lagged two periods, I have tested that. And I tsset the data before lagging.
I don't think that I have a data problem. It has occurred to me that I could manually create all the dummy variables, but why should I have to do that and in any event, would it work if I did it?
If anyone at Stata has come across this kind of issue, I am all ears concerning what you have found. Any suggestions would be greatly appreciated.
Sincerely,
James Adams
Comment