Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cross Sectional Data: Generating growth w-r-t education groups + Regression Analysis

    Hello Everyone,


    I am trying to generate the growth in income and consumption over education group given the age cohort (refereed in data as cbin). For example i want to construct data in a way that i can generate growth by changing the education group across the cbin. In particular growth in income of edu_group=1 across the cbin mentioned in the example cbin1 and cbin 2 followed by cbin 2 and cbin3. Next same process but now i want to generate growth wrt to edu_group 2.
    After constructing the growth in such a way i want to regress growth in consumption on growth in income. Please suggest me if it is possible and here is the data example.






    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float cbin byte edu_group float(c_age consumption year income ln_consumption ln_income)
    1 1 28 21.870466 2014  44.33569  3.085137   3.79179
    1 2 28 27.506866 2014  52.84018  3.314436  3.967272
    1 3 28  42.82582 2014  96.49571  3.757141 4.5694985
    2 1 34  22.46051 2014  45.76359  3.111759  3.823489
    2 2 34  28.74256 2014  59.81278  3.358379 4.0912194
    2 3 34  43.57341 2014 108.69667  3.774447  4.688561
    3 1 38  24.04302 2014  53.68461 3.1798446 3.9831264
    3 2 38  30.22904 2014 71.952835  3.408803  4.276011
    3 3 38  45.09255 2014 120.06631  3.808717  4.788044
    4 1 43  23.76835 2014  55.13097  3.168355 4.0097117
    4 2 43  32.20723 2014  84.55011  3.472191 4.4373446
    4 3 43  44.01722 2014 129.79472  3.784581  4.865954
    5 1 48  26.04756 2014  58.91539  3.259924 4.0761023
    5 2 48 34.204468 2014  91.61779  3.532356 4.5176253
    5 3 48  52.10867 2014  165.1864  3.953331  5.107074
    end







  • #2
    I don't understand what you want here. Do you mean, for example, something like this:

    Code:
    by edu_group (cbin), sort: gen income_growth = income/income[_n-1] - 1
    If so, there are a couple of potential complications to deal with. First, are there ever any gaps in the edu_group or cbin variables? If so, referencing income[_n-1] won't get you the preceding cbin's income but come earlier cbin's. So would you want to just leave the result missing there, or would you want to "annualize" the growth over the gap?

    Also, can it ever happen that income = 0? In that situation the growth cannot be calculated.

    Comment


    • #3
      I wanted to know if it is possible to consider cbin which is cohort bin as time variable.

      So for instance if i can create the growth by taking difference of cbin 1 and cbin 2 (for getting the growth in income and growth in consumption) for education group 1 ? and same again for education group 2 and 3. This is just one year data and i want to see cross-section variation in income growth and consumption growth so i thought what if i treat cohort as a time variable. Cbin 1 means those who were born between 1984-1988 and cbin 2 mean those who were born from 1979-1983.

      Comment


      • #4
        You are right when we create growth by the mentioned way it will go missing for every cbin 1, because now i am treating it as a time. But this is always the case in constructing the growth first value will always be missing. Now that the data is cross sectional and i want to regress growth in income over growth in consumption along with the controls like education and cbin or c_age. What do you suggest i should three different regression for every education group separately?

        Comment


        • #5
          You could do three separate regressions. Or you can just do a single regression that includes an edu_group#c.income_growth interaction term, followed by -margins-. If there are no other variable being included in the analysis, the results will be essentially the same. (The coefficients and robust standard errors will be identical, the p-values will be different because the combined analysis has greater statistical power and will give lower p-values.) You can compare these approaches in this example based on your data example:

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float cbin byte edu_group float(c_age consumption year income ln_consumption ln_income)
          1 1 28 21.870466 2014  44.33569  3.085137   3.79179
          1 2 28 27.506866 2014  52.84018  3.314436  3.967272
          1 3 28  42.82582 2014  96.49571  3.757141 4.5694985
          2 1 34  22.46051 2014  45.76359  3.111759  3.823489
          2 2 34  28.74256 2014  59.81278  3.358379 4.0912194
          2 3 34  43.57341 2014 108.69667  3.774447  4.688561
          3 1 38  24.04302 2014  53.68461 3.1798446 3.9831264
          3 2 38  30.22904 2014 71.952835  3.408803  4.276011
          3 3 38  45.09255 2014 120.06631  3.808717  4.788044
          4 1 43  23.76835 2014  55.13097  3.168355 4.0097117
          4 2 43  32.20723 2014  84.55011  3.472191 4.4373446
          4 3 43  44.01722 2014 129.79472  3.784581  4.865954
          5 1 48  26.04756 2014  58.91539  3.259924 4.0761023
          5 2 48 34.204468 2014  91.61779  3.532356 4.5176253
          5 3 48  52.10867 2014  165.1864  3.953331  5.107074
          end
          
          by edu_group (cbin), sort: gen income_growth = income/income[_n-1] - 1
          by edu_group (cbin), sort: gen consumption_growth = consumption/consumption[_n-1] - 1
          
          regress consumption_growth c.income_growth##i.edu_group, robust
          margins edu_group, dydx(income_growth)
          
          by edu_group: regress consumption_growth income_growth, robust
          Another advantage of using the combined regression approach is that it enables you to also estimate the effects of edu_group itself, if that is of interest to you. And if you do in fact have other variables you want to adjust for in the model, the combined regression approach enables you to get results adjusted for those, and it gives you the flexibility to either constrain the effects of these covariates to be the same across all three groups or to allow them to vary freely across the three groups. (The separate regressions approach makes it difficult to constrain them to be the same across all three groups.)

          Added: To learn more about the -margins- command, I recommend you read the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf. It is the clearest explanation out there and has worked examples similar to your situation.

          Comment


          • #6
            Thank you so much sir for clarifying my ambiguity. Thanks for being my mentor in learning STATA.

            Comment

            Working...
            X