Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • collapse & index

    Hi

    I have 7 regions, 17 sectors. I created index and I am going to partition sectors and regions according to index. As you can see List, values are 1.32, 2.99 ....but sectors are 1,2,3,4,5,6,7. So where did come from this numbers? Would anyone assist me, please?

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(region sector index)
    4 3  7
    4 7 11
    4 7 11
    4 7 11
    4 7 11
    4 7 11
    4 7 11
    4 7 11
    4 7 11
    4 7 11
    4 7 11
    4 7 11
    4 7 11
    4 7 11
    4 7 11
    4 7 11
    4 7 11
    4 7 11
    4 7 11
    4 7 11
    4 7 11
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    5 3  8
    end
    
    generate index=(region+sector)
     collapse sector , by(index)
    
    . list
    
         +------------------+
         | index     sector |
         |------------------|
      1. |     1          1 |
      2. |     2   1.315152 |
      3. |     3   2.990105 |
      4. |     4   3.018043 |
      5. |     5   3.930294 |
         |------------------|
      6. |     6   3.256762 |
      7. |     7   6.395171 |
      8. |     8   6.788081 |
      9. |     9   7.196252 |
     10. |    10   7.250002 |
         |------------------|
     11. |    11   7.106759 |
     12. |    12   8.343517 |
     13. |    13   11.18524 |
     14. |    14   9.862413 |
     15. |    15   13.13492 |
         |------------------|
     16. |    16   13.46302 |
     17. |    17   13.52856 |
     18. |    18   15.02726 |
     19. |    19   14.79059 |
     20. |    20   16.00764 |
         |------------------|
     21. |    21   16.91695 |
     22. |    23         16 |
     23. |    24         17 |
         +------------------+

    Listed 50 out of 1996970 observations

  • #2
    The command -collapse sector, by(index)- tells Stata to calculate the average value of sector for each value of index.

    Also, "but sectors are 1,2,3,4,5,6,7" is inconsistent with "I have 7 regions, 17 sectors." So the values of the sector variable range between 1 and 17, not 1 and 7. And the averaged values of sector are, accordingly, also in that range.

    My question to you is what you are actually trying to do here. Regions 1 through 7 and sectors 17: these numbers are just arbitrary numbers. They are the numeric values that we use for what are actually categorical variable. So I do not understand what you are trying to do when you calculate an index as region + sector: you are adding two numbers that are just arbitrary and producing arbitrary results. Addition is not meaningful for numbers that are just representing categories. Similarly, calculating the average value of the sector numbers also makes no sense.

    So please explain what you actually want to do.

    Comment


    • #3
      egen sid = group(region sector)

      then collapse by sid

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        The command -collapse sector, by(index)- tells Stata to calculate the average value of sector for each value of index.

        Also, "but sectors are 1,2,3,4,5,6,7" is inconsistent with "I have 7 regions, 17 sectors." So the values of the sector variable range between 1 and 17, not 1 and 7. And the averaged values of sector are, accordingly, also in that range.

        Code:
        . tab sector
        
             sector |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  1 |      9,105        0.46        0.46
                  2 |      3,078        0.15        0.61
                  3 |    723,880       36.25       36.86
                  4 |      9,932        0.50       37.36
                  5 |      7,512        0.38       37.73
                  6 |     53,207        2.66       40.40
                  7 |    993,336       49.74       90.14
                  8 |     47,803        2.39       92.53
                  9 |     22,743        1.14       93.67
                 10 |     36,639        1.83       95.51
                 11 |     10,124        0.51       96.01
                 12 |     17,933        0.90       96.91
                 13 |     39,926        2.00       98.91
                 14 |      3,720        0.19       99.10
                 15 |      4,684        0.23       99.33
                 16 |      9,467        0.47       99.81
                 17 |      3,881        0.19      100.00
        ------------+-----------------------------------
              Total |  1,996,970      100.00
        
        . tab region
        
             region |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  1 |    510,172       39.33       39.33
                  2 |     19,183        1.48       40.81
                  3 |    306,266       23.61       64.41
                  4 |    405,586       31.27       95.68
                  5 |     39,439        3.04       98.72
                  6 |      5,040        0.39       99.11
                  7 |     11,562        0.89      100.00
        ------------+-----------------------------------
              Total |  1,297,248      100.00
        So please explain what you actually want to do.
        Well, I have a model :
        Click image for larger version

Name:	image_29997.png
Views:	2
Size:	3.3 KB
ID:	1699026

        Y: firms output
        I: firms (some thousands)
        S: 17 industrial sectors
        d: 7 regions
        t: 2010-2021
        fi: firms fixed effects
        FisT: sector-by-period (where T=2)
        Fi rT: region by period fixed effects
        Xit: Control varibal firm i, time t
        Sdt: explanatory variable : immigrants share in region d time t.

        since dimensions of Y Sdt Xit differ from each other. I decided to make an index of sector& region and then partition Y, FI, S, and X according to the index. It's easy to do OLS in this condition.

        Comment


        • #5
          since dimensions of Y Sdt Xit differ from each other. I decided to make an index of sector& region and then partition Y, FI, S, and X according to the index.
          I'm not really sure I understand what you're doing here, but perhaps what you want is a variable the indicates the combination of sector and region. You would get that with:
          Code:
          egen index = group(sector region)
          That will give you a variable, index, ranging from 1 through 119 (= 7 * 17), with each of its values corresponding to a combination of sector and region. That will still no, however, make it sensible to -collapse sector, by(index)-, as it is meaningless to perform any arithmetic operations on the sector variable. I still don't know what you hoped to accomplish with that command.

          Comment


          • #6
            So, I can say another way. Prof Schechter, I have got that econometric model. I am going to estimate that. First, I need to prepare the dataset. According to the above information, what's your suggestion?Please assist me. I used to estimate Yitd but not Yisdt.

            Comment


            • #7
              I don't think I can answer that question. It seems that you are taking a model that is defined in terms of effects at the firm, sector, and region level and you want to marginalize it to just firm and region. But you do have the data disaggregated by firm, sector, and region. So there could be two general approaches. One would be to aggregate the data itself up to the firm and region level. For some variables that might mean adding up the sector-specific values within region, and for others it might mean averaging, or taking medians, or some other way of doing it. The choice would depend on the meaning of the actual variables. These choices require econometric understanding that I do not have.

              The other approach is to estimate the original model including all of its original effects (including sector). Then getting average marginal effects at the firm and region level. The calculations can be done with the -margins- command. But there is the issue of whether this overall approach is even valid--again, an econometric question that I have no ability to answer. And there is also the issue of how representative the distribution of sectors in your data is compared to the real world distribution of sectors within regions and firms within sectors, which might require some kind of weighting instead of simple average marginal effects. So again, this comes down to econometric knowledge.

              If somebody else following this thread feels comfortable grappling with those issues, I would welcome their joining the discussion and contributing. But I can't take this any farther as it is way beyond issues that are just statistical or Stata.

              Comment


              • #8
                Actually, one way or let's say easy one would be making all variables in the same dimension. That's why I assumed by making an index of the sector and the region. why? well, if I could make, i.g secor1 and reion1= index11, sector 1 and region 2=index 12, ....sector17 and region 7= index 177 (imaginary). Now the rest would be easy as Yijt that is customary. I mean by making this index when comes to Fi s,t I should take s from index and t=2010-2021. In terms of F r,t , I could get r from the index (does not matter that the index also possess s , I only pick up r at this moment) and so on ...
                I guess if we could do this process in stata language the puzzle might be solved.
                Last edited by Paris Rira; 27 Jan 2023, 15:13.

                Comment


                • #9
                  Well, if you are confident that approach will work, the code shown in #3 or #5 (it's the same code) will create the index. And you won't need to "extract" the region from the index, because you already have a separate region variable to work with.

                  Comment


                  • #10
                    Originally posted by Clyde Schechter View Post
                    And you won't need to "extract" the region from the index, because you already have a separate region variable to work with.
                    The point is exactly here, not to use region merely or sector merely. By making an index I wanna make the same level for all variables.
                    I did code #3 though I could not go further. I knew theoretically, but how to make it is still challenging for me.

                    Comment


                    • #11
                      So something like:
                      Code:
                      reghdfe Y S X, absorb(firm index#time)
                      would do that.

                      Note: -reghdfe- is written by Sergio Correa and is available from SSC.

                      Comment


                      • #12
                        Originally posted by Clyde Schechter View Post
                        That will still no, however, make it sensible to -collapse sector, by(index)-, as it is meaningless to perform any arithmetic operations on the sector variable.
                        I am still thinking about your opinion about collapse by index. if it is not eligible to do it, so I am really unable to estimate. Because first I should collapse sector, and region by index then sector#index#time.
                        I should adjust the sector(region) according to the index, afterward time. My knowledge of Econometrics is shallow. I caught this idea once I was in a discussion with an econometric prof.

                        Comment


                        • #13
                          And one more thing, I access the syntax. But I really have no idea of preparing its data. Here are the codes to run the #4 model.
                          Code:
                          tsset id year
                          local control "TG_F foreign_affil D1984 D1990 D1995 D2000 D2005 "
                          local variable mig_jump 
                          
                          foreach x of local variable {
                          xi: reg            `x'  immi_sh                                                  `control'  i.region         i.sec            if year>=1996 , vce(cluster dep) 
                          outreg2              immi_sh                                                `control'                                   using `x'.xls, nocons bdec(3) replace
                          
                          xi: xtreg        `x'  immi_sh                                                  `control'  i.region        i.sec            if year>=1996 , fe cluster (dep) 
                          outreg2              immi_sh                                                `control'                                   using `x'.xls, nocons bdec(3) append
                          
                          log using iv_`x'.log, replace
                          xi: xtivreg2    `x' (immi_sh = impu_sh_origin )                             `control'  i.region*i.per i.sec*i.per    if year>=1996 , fe first cluster (dep) 
                          outreg2              immi_sh                                                `control'                                using `x'.xls, nocons bdec(3) append
                          
                          xi: xtivreg2    `x' (immi_sh S_tfp_op_sh= impu_sh_origin S_tfp_op_iv )         `control'  i.region*i.per i.sec*i.per    if year>=1996 , fe first cluster (dep) 
                          outreg2              immi_sh S_tfp_op_sh                                    `control'                                using `x'.xls, nocons bdec(3) append
                          
                          xi: xtivreg2    `x' (immi_sh S_emplo_sh= impu_sh_origin S_emplo_iv )         `control'  i.region*i.per i.sec*i.per    if year>=1996 , fe first cluster (dep) 
                          outreg2              immi_sh S_emplo_sh                                        `control'                                using `x'.xls, nocons bdec(3) append
                          
                          log close
                          
                          }

                          Comment


                          • #14
                            There isn't a whole lot of preparation of the data needed beyond the code already shown in #3 and #5. The code you show needs modifications to rely on the index rather than on the region and sector variables. Also, it can be improved in several other ways. Here's how I would modify it:

                            Code:
                            local control "TG_F foreign_affil D1984 D1990 D1995 D2000 D2005 "
                            egen index = group(region sector)
                            
                            reg            mig_jump  immi_sh                                                  `control'  i.index            if year>=1996 , vce(cluster dep)
                            outreg2              immi_sh                                                `control'                                   using mig_jump.xls, nocons bdec(3) replace
                            
                            reghdfe        mig_jump  immi_sh                                                  `control'          if year>=1996 , absorb(id index index#per) cluster (dep)
                            outreg2              immi_sh                                                `control'                                   using mig_jump.xls, nocons bdec(3) append
                            
                            log using iv_mig_jump.log, replace
                            xtset id
                            xi: xtivreg2    mig_jump (immi_sh = impu_sh_origin )                             `control'  i.index#i.per    if year>=1996 , fe first cluster (dep)
                            outreg2              immi_sh                                                `control'                                using mig_jump.xls, nocons bdec(3) append
                            
                            xi: xtivreg2    mig_jump (immi_sh S_tfp_op_sh= impu_sh_origin S_tfp_op_iv )         `control'  i.index#i.per    if year>=1996 , fe first cluster (dep)
                            outreg2              immi_sh S_tfp_op_sh                                    `control'                                using mig_jump.xls, nocons bdec(3) append
                            
                            xi: xtivreg2    mig_jump (immi_sh S_emplo_sh= impu_sh_origin S_emplo_iv )         `control'  i.index#i.per    if year>=1996 , fe first cluster (dep)
                            outreg2              immi_sh S_emplo_sh                                        `control'                                using mig_jump.xls, nocons bdec(3) append
                            
                            log close
                            Notes: -reghdfe- is written by Sergio Correa and is available from SSC. It is the easiest way to do fixed-effects regressions with multiple fixed effects.

                            I have changed -tsset id year- to -xtset id-. For your purposes, they will do the same thing, and the latter is clearer and simpler.

                            Having a local macro containing only a single variable name and then looping over it is silly. Just refer directly to mig_jump in the regression commands. -xi:- is obsolete. It has been replaced by factor-variable notation. I have made those changes in your code already. However, -xtivreg2- is not an official Stata command. I don't know if it supports factor variable notation or not. I recommend trying it without -xi:-. If you get an error message saying that it doesn't allow factor variables or that it does know what i.index#i.per is, then put the -xi:- back.

                            I do not use -outreg2- and am unfamiliar with it. So I don't know if you will need to modify the -outreg2- commands to accommodate these changes.

                            Your -reg- command makes no mention of the period variable, whereas all the others do. I have left it that way, but wanted to call your attention to this in case it is a mistake.

                            Comment


                            • #15
                              Originally posted by Clyde Schechter View Post
                              Notes: -reghdfe- is written by Sergio Correa and is available from SSC. It is the easiest way to do fixed-effects regressions with multiple fixed effects.
                              I am sorry. I am not familiar with reghdf, could you please explain why is the easiest way to do fixed-effect. apparently, it is not short cut by comparing these two codes with each other.

                              Originally posted by Clyde Schechter View Post
                              Your -reg- command makes no mention of the period variable, whereas all the others do. I have left it that way but wanted to call your attention to this in case it is a mistake.
                              In order to provide a comparison between With and Without periods, I don't add time in the first line.

                              Comment

                              Working...
                              X