Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • count by

    Hi dear profs and colleagues,

    I am going to reach this statement in my dataset. Please share your ideas with me. Thanks.
    ' The unit of observation is one region in one year '

    year: 2010-2020
    region: 7 regions exist
    firms ID: NPC_FIC. In each year they are unique. But they repeat during the period.
    Code:
    input double(year NPC_FIC) float region
    2010 500135017 1
    2019 501301917 1
    2010 501833633 1
    2020 501102337 1
    2010 501022911 1
    2014 502207708 2
    2011 501129116 2
    2012 501077767 2
    2012 502230825 2
    2018 501081500 2
    2019 501346223 2
    2017 501023486 2
    2011 501829556 2
    2016 501205066 2
    2020 501170028 2
    2018 501032576 3
    2011 501031781 3
    2020 501179930 3
    2011 502216695 3
    2011 501273750 3
    2016 502228955 3
    2010 502485654 3
    2011 500985340 3


    Cheers,
    Paris

  • #2
    Paris:
    I am nost sure I got your question right; so please consider what follows as a tentative answer:
    Code:
    . egen wanted=group( region year)
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you so much, Carlo.

      Comment


      • #4
        I am going to run a panel data in which ' The unit of observation is one region in one year '.
        By doing so
        Code:
        egen wanted=group( region year)
        Shall I do the same procedure with the rest of the variables in the panel as well? I mean here dependent variable is the number of firms--n_firms--, and the explanatory variable is immigrant share -immi_sh.
        is this what I should do?:
        Code:
        egen n_firm= count (NPC_FIC), by(region year)
        egen immi_sh_year = sum(  immi_sh), by(region year)

        Comment


        • #5
          Paris:
          if you have panel data, you have a sample of uunits (paneles) which are measured on the very same variables at (theoretically) equally spaced time intervals.
          Therefore, each region is measured on the same set of variables each year.
          That said, I created a -depvar- and propose you the following answer to your question:
          Code:
          . g depvar=runiform()*100000
          
          . xtset region year
          repeated time values within panel
          r(451);
          
          . xtset region
          
          Panel variable: region (unbalanced)
          
          . xtreg depvar  i.year, fe
          
          Fixed-effects (within) regression               Number of obs     =         23
          Group variable: region                          Number of groups  =          3
          
          R-squared:                                      Obs per group:
               Within  = 0.4308                                         min =          5
               Between = 0.1999                                         avg =        7.7
               Overall = 0.2672                                         max =         10
          
                                                          F(8,12)           =       1.14
          corr(u_i, Xb) = -0.3574                         Prob > F          =     0.4068
          
          ------------------------------------------------------------------------------
                depvar | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                  year |
                 2011  |   39895.72   24782.69     1.61   0.133    -14101.13    93892.57
                 2012  |   26147.39   32056.69     0.82   0.431    -43698.14    95992.91
                 2014  |   69350.06   38210.67     1.81   0.095    -13903.84      152604
                 2016  |   65855.42   30179.24     2.18   0.050     100.4962    131610.3
                 2017  |   2703.408   38210.67     0.07   0.945    -80550.49     85957.3
                 2018  |   23746.54   30179.24     0.79   0.447    -42008.38    89501.47
                 2019  |   43534.61   26814.96     1.62   0.130    -14890.17    101959.4
                 2020  |   36116.05   24273.75     1.49   0.163    -16771.92    89004.02
                       |
                 _cons |   14984.22   18857.21     0.79   0.442    -26102.11    56070.55
          -------------+----------------------------------------------------------------
               sigma_u |  18427.016
               sigma_e |  29408.298
                   rho |  .28192802   (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(2, 12) = 1.43                       Prob > F = 0.2768
          
          .
          You obviously have to add more predictors in the right-hand side of your regression equation.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            Prof Carlo, Thank you for the clarification.

            To do aggregation, "sum firms in the same region" shall I collapse by region? or the way you did it has already aggregated?
            Because the main point is the estimation at the aggregated district-year level.

            Comment


            • #7
              Paris:
              first, Carlo is enough . Thanks.
              Then you need to -collapse-.
              Last edited by Carlo Lazzaro; 30 Apr 2023, 11:04.
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment

              Working...
              X