Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • GMM for a Two Equation System

    Dear StataListers and Brilliant Fellow Travellers:

    I have run into difficulties using GMM for a two equation system. Let me explain that I am using the standard GMM command, and not the more advanced system GMM commands. I merely have a two equation system to which I am applying GMM. As I shall demonstrate, there appears to be a problem within GMM in handling factor variables. By way of explanation, I start by running a Reg3 version of my model. That runs on the first try, including all the factor variables that run into trouble in my GMM models. Of course, Reg3 assumes homoskedastic errors and it uses all the instruments in every equation (see Theil, 1971). Those restrictions are the reason for my GMM specifcations, where I allow for clustering (my data are a panel), and where I allow the instruments to differ between the two equations. Please let me show you three models, two of which run, while the third does not run. After I show you the models, I'll editorialize a little.

    First model, using Reg3. This runs without any difficulties.

    reg3 (eq1: L2.Res_Count_All i.Fieldn i.Pref_Namen i.Interval
    wdin_cit other_fields other_locs dmilgovrd dmfgrd avedist
    res_all_MSA_1) (eq2: Frac_Citations_5 L2.Res_All_Sq i.Fieldn
    i.Pref_Namen i.Interval), endog(wdin_cit L2.Res_All_Sq)
    exog(swdift_cit owdift_cit pop_1980) vce(cluster Node_Name);

    Second model, using GMM. This runs, but as others on this forum have noted, the factor variables cause the command to execute slowly.

    gmm (L2.Res_Count_All - {xb: i.Fieldn i.Interval
    wdin_cit other_fields other_locs dmilgovrd dmfgrd avedist
    res_all_MSA_1 _cons}) (Frac_Citations_5 - {xc: i.Fieldn
    i.Interval L2.Res_All_Sq _cons}),
    winitial(unadjusted, independent) wmatrix(cluster Node_Name)
    twostep instruments(1: i.Fieldn i.Interval
    other_fields other_locs dmilgovrd dmfgrd avedist res_all_MSA_1
    swdift_cit owdift_cit) instruments(2: i.Fieldn
    i.Interval swdift_cit owdift_cit pop_1980)
    variables(L2.Res_Count_All i.Fieldn
    i.Interval other_fields other_locs dmilgovrd dmfgrd avedist
    res_all_MSA_1 swdift_cit owdift_cit L2.Res_All_Sq pop_1980);

    Third model, using GMM. This does not run and gives the well-known but inscrutable error message r(498).

    gmm (L2.Res_Count_All - {xb: i.Fieldn i.Pref_Namen i.Interval
    wdin_cit other_fields other_locs dmilgovrd dmfgrd avedist
    res_all_MSA_1 _cons}) (Frac_Citations_5 - {xc: i.Fieldn
    i.Pref_Namen i.Interval L2.Res_All_Sq _cons}),
    winitial(unadjusted, independent) wmatrix(cluster Node_Name)
    twostep instruments(1: i.Fieldn i.Pref_Namen i.Interval
    other_fields other_locs dmilgovrd dmfgrd avedist res_all_MSA_1
    swdift_cit owdift_cit) instruments(2: i.Fieldn i.Pref_Namen
    i.Interval swdift_cit owdift_cit pop_1980)
    variables(L2.Res_Count_All i.Fieldn i.Pref_Namen
    i.Interval other_fields other_locs dmilgovrd dmfgrd avedist
    res_all_MSA_1 swdift_cit owdift_cit L2.Res_All_Sq pop_1980);

    The second model omits the factor variable i.Pref_Namen while the third model includes i.Pref_Namen. i.Pref_Namen forms a long vector of about 250 dummy variables and it is part of the story that I want to tell. By way of contrast the factor variables i.Interval and i.Fieldn respectively contain 4 and 19 dummy variables, much shorter than i.Pref_Namen.

    I have been over the data many times and the data are fine--look at the account of my Reg3 results and look at my account of the simpler GMM results. I don't think that an upload of my data will make any difference. And by the way, Stata does not care that I lag a couple of variables, which are lagged two periods, I have tested that. And I tsset the data before lagging.

    I don't think that I have a data problem. It has occurred to me that I could manually create all the dummy variables, but why should I have to do that and in any event, would it work if I did it?

    If anyone at Stata has come across this kind of issue, I am all ears concerning what you have found. Any suggestions would be greatly appreciated.

    Sincerely,
    James Adams
    Last edited by James Adams; 18 Aug 2024, 17:16.

  • #2
    Hi, in response to my own post on problems running the gmm command, I went ahead and created dummy variables in place of the factor variables (see my post), and I inserted the dummies everywhere that the factor variables (i.Fieldn i.Pref_Namen i.Interval) appeared in the third model that failed to execute (again, see my post). Creating the dummy variables is easy using the numeric indexes used by the factor variables. For example, Pref_Namen runs from 1 to 255 with no gaps. So I run a foreach loop that creates dummies di1 to di255. I drop di1 from my model to avoid the dummy variable trap. And so on.

    It turns out that replacing the factor variables with the dummy variables allowed me to diagnose my problem. As I suspected, the problem was the role played by my institutional dummies (di1-d1255) in the second equation of my model. The multiple dimensional surface of the gmm criterion (or manifold) is flat with respect to the institutional dummies, going through more than 60 iterations and still not done, with only minor changes to the gmm criterion from iteration to iteration. The solution is to remove the institutional dummies and replace them with more informative variables. If you get error message r(498) in a model with indicator variables, creating your own dummy variables could help you in diagnosing what r(498) means. It worked for me.

    Comment

    Working...
    X