Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reghdfe presentign different results every time the code is ran?

    Dear all,

    I am estimating the following model using the reghdfe command:

    reghdfe log(dowry) age age_marriage educ_years, absorb(religion wave urbanrural) cluster(district)

    Every time I run the code, a different number of clusters appears (between 90 and 76) even though the code ran is always the same. The coefficients and significance of the values also changes, even after using "set seed 1212". I have a panel data structure with the id being group(district year of marriage wave) as I want to capture the variation at the district level, the year of marriage and the wave level. I have also tried to use the Vce option, but it stated that this is not allowed. If I use the VCE(cluster district) I continue to get a change in results every time I run the same code.

    Any help with be appreciated, apologies if it is a rather simple question, I am new to Stata.

    Thank you in advance,

    Enrique
    Last edited by Enrique Alameda; 08 Mar 2023, 20:23.

  • #2
    To answer the question you posed, there is no reason why -reghdfe- itself should produce indeterminate results. I suppose it is possible that it has some bug that has gone undetected despite widespread use for a long time, but that seems highly unlikely. What is far more likely is that there is something in the code before you reach the -reghdfe- command that is indeterminate. For example a command with a -by varlist, sort:- prefix where the varlist does not uniquely identify observations produces in indeterminate, and randomized sort order within the varlist clusters. If the command governed by that prefix depends on the order of observations, then the results will be indeterminate. You can verify this is what's going on by adding a command like -summ dowry age age_marriage educ_years if e(sample)- after the -reghdfe-. Run the entire do-file several times, and you will see that not only are the results changing from run to run, but so are the data in the estimation sample. As you have not shown any of that code, nothing more can be said about the source of the indeterminacy.

    The use of -set seed- does not really tell you much here. The problem of indeterminate sort order would be recognized by using -set sortseed-, as it is this seed that determines the randomization of -sort-ing. Also, if the problem is due to a randomized (incomplete) sort, setting the sortseed will suppress that variation. But that does not solve the problem. It just hides the problem. You can use it as a diagnostic to confirm that some command that sorts the data is the source of your trouble. But then you have to go back and review all of your commands that involve sorting to determine which one(s) cause the indeterminacy, and then fix those commands by specifying a complete sort key that uniquely specifies the sort order, or you have to remove the dependency of the commands that use the randomly sorted data on the sort order.
    Last edited by Clyde Schechter; 08 Mar 2023, 22:14.

    Comment


    • #3
      This makes sense. Thank you very much Clyde.

      Comment

      Working...
      X