Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • First differencing in Seasonal Agriculture Data

    I am using STATA 16.1 and trying to estimate the following household (hhid) model.

    reg D.y peak D.x peak#D.x

    (I actually run reg diff_y peak diff_x peak_diff_x, where, diff_y is D.y and alike, and peak_diff_x is peak*diff_x , peak is dummy, as D. operator is not allowed with factor variable I have to run this way.)

    I want to calculate the effect of D.x in peak (agricultural season) by adding coefficient of D.x and peak#D.x

    However, it seems to me STATA is giving me wrong first difference for my purpose for seasonal data.
    Please let me know where I go wrong?

    I got the following first difference as

    xtset hhid wave

    gen diff_y = D.y
    gen diff_x= D.x


    input int hhid float wave byte peak float(y diff_y x diff_x)
    2 1 0 2.6390574 . 1.568616 .
    2 1 1 2.833213 . 1.568616 .
    2 2 0 4.0943446 1.2611313 1.609438 .04082203
    2 2 1 4.477337 1.6441236 1.609438 .04082203
    2 3 0 5.252274 .7749367 1.609438 0
    2 3 1 6.042633 1.565296 1.609438 0
    3 1 0 4.4426513 . 1.0296195 .
    3 1 1 5.105946 . 1.0296195 .
    3 2 0 3.583519 -1.5224266 1.0986123 .06899285
    3 2 1 4.158883 -.9470625 1.0986123 .06899285
    3 3 0 4.553877 .3949938 1.0986123 0
    3 3 1 5.631212 1.4723287 1.0986123 0
    Last edited by Ishwor Adhikari; 12 Jun 2021, 16:13.

  • #2
    (I actually run reg diff_y peak diff_x peak_diff_x, where, diff_y is D.y and alike, and peak_diff_x is peak*diff_x , peak is dummy, as D. operator is not allowed with factor variable I have to run this way.)
    That is not true. See below. There is no problem combining factor variable notation with the D operator.

    Code:
    . clear*
    
    .
    . input int hhid float wave byte peak float(y diff_y x diff_x)
    
             hhid       wave      peak          y     diff_y          x     diff_x
      1. 2 1 0 2.6390574 . 1.568616 .
      2. 2 1 1 2.833213 . 1.568616 .
      3. 2 2 0 4.0943446 1.2611313 1.609438 .04082203
      4. 2 2 1 4.477337 1.6441236 1.609438 .04082203
      5. 2 3 0 5.252274 .7749367 1.609438 0
      6. 2 3 1 6.042633 1.565296 1.609438 0
      7. 3 1 0 4.4426513 . 1.0296195 .
      8. 3 1 1 5.105946 . 1.0296195 .
      9. 3 2 0 3.583519 -1.5224266 1.0986123 .06899285
     10. 3 2 1 4.158883 -.9470625 1.0986123 .06899285
     11. 3 3 0 4.553877 .3949938 1.0986123 0
     12. 3 3 1 5.631212 1.4723287 1.0986123 0
     13. end
    
    .
    . version
    version 16.1
    
    .
    . egen panel = group(hhid peak)
    
    . xtset panel wave
           panel variable:  panel (strongly balanced)
            time variable:  wave, 1 to 3
                    delta:  1 unit
    
    . regress D.y i.peak##c.D.x
    
          Source |       SS           df       MS      Number of obs   =         8
    -------------+----------------------------------   F(3, 4)         =      1.86
           Model |  4.73713736         3  1.57904579   Prob > F        =    0.2778
        Residual |  3.40416089         4  .851040224   R-squared       =    0.5819
    -------------+----------------------------------   Adj R-squared   =    0.2683
           Total |  8.14129826         7  1.16304261   Root MSE        =    .92252
    
    ------------------------------------------------------------------------------
             D.y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
          1.peak |   .4625329   .8952924     0.52   0.633    -2.023197    2.948263
                 |
               x |
             D1. |  -21.88472   15.79409    -1.39   0.238    -65.73614    21.96671
                 |
       peak#cD.x |
              1  |  -7.648223   22.33622    -0.34   0.749     -69.6635    54.36706
                 |
           _cons |   1.281927   .6330674     2.02   0.113    -.4757499    3.039604
    ------------------------------------------------------------------------------
    However, it seems to me STATA is giving me wrong first difference for my purpose for seasonal data.
    Please let me know where I go wrong?
    Since you do not show what Stata is giving you, nor give any explanation of which of the infinitely many ways in which it might be "wrong" you are seeing, I don't think anybody can help you.

    Moreover, you cannot possibly have done -xtset hhid wave- because you have two observations for every combination of hhid and wave, one with peak = 0 and another with peak = 1. Stata would throw an error message if you attempted that command. So you really need to explain a lot more about what you have done in order for people to help you out with this.

    Added: Also, shouldn't you be using -xtreg- or some other panel estimator instead of -regress- here?
    Last edited by Clyde Schechter; 12 Jun 2021, 17:17.

    Comment


    • #3

      Since you do not show what Stata is giving you, nor give any explanation of which of the infinitely many ways in which it might be "wrong" you are seeing, I don't think anybody can help you.
      My interest regression model is,
      Click image for larger version

Name:	Empirical Model.PNG
Views:	1
Size:	8.8 KB
ID:	1614448


      Moreover, you cannot possibly have done -xtset hhid wave- because you have two observations for every combination of hhid and wave, one with peak = 0 and another with peak = 1. Stata would throw an error message if you attempted that command. So you really need to explain a lot more about what you have done in order for people to help you out with this.

      STATA did not throw an error message as you mentioned because I declared -xtset hhid wave- when my data was wide in terms of season and made long to calculate the first differences and STATA did not ask again to declare -xtset-. This is the mistake. And it produced the first difference as I posted (which might be wrong in the sense that it is producing first difference of y for 'slack season', i.e. peak=0, by differentiating 'slack season' y from 'peak season', i.e peak=1, y.

      Thank you for helping me to navigate the problem.

      Your command seems to correct that, i.e. now it is differentiating y with respect to seasons.
      Code:
      egen panel = group(hhid peak)
      xtset panel wave
      Does differentiating this way serve the propose of empirical model I proposed? In my view it does as you worried -xtset hhid wave- might have throw error message if I have done it right. To run the-regress D.y i.peak##c.D.x- STATA again did not ask me to run -xtset-. Now, I know what's wrong.

      That is not true. See below. There is no problem combining factor variable notation with the D operator.
      Yeah I missed the i. and c. operator. Thank you for pointing it out.


      My another confusion is I want to use cluster at Primary Sample Unit(PSU). I created cluster as
      Code:
      egen psu_wave = group(psu wave)
      regress D.y i.peak##c.D.x , cluster(psu_wave)
      Is this the right way to cluster at psu level or I need to do something else. And do I need to use survey weight? (Weight is same for all household in all waves).

      Thank you.
      Last edited by Ishwor Adhikari; 13 Jun 2021, 06:31.

      Comment


      • #4
        I'm not sure what you mean when you say "weight is the same for all household in all waves." If you mean that the variable weight is a single constant number throughout the data set, then there is no point in using it. If you mean that for each specific household, the weight doesn't change from one wave to the next, but different households have different weights, then, yes, you should be using it as a survey weight. Use the -svyset- command to specify it (and also specify any stratification, psu's or other sampling units). Then use the -svy:- prefix with your -regress- command.

        If each household keeps the same weight across waves, and remains in the same psu across waves (which it definitely should do), then I think your cluster would just be the psu, not psu_wave. But that said, if you use the survey design information via the -svy:- prefix, I believe there is no reason to also use cluster-robust standard errors. The -svy:- prefix calculates design based standard errors, which should be appropriate for the purpose and adequately account for within sampling-unit dependencies. (I don't have an in-depth knowledge of survey data analysis, so if somebody who does disagrees with me on this, do speak up!)

        Comment


        • #5
          Thank you so much. Its really helpful.

          Comment

          Working...
          X