collapse & index

Paris Rira

Join Date: Dec 2022
Posts: 385

26 Jan 2023, 15:01

Hi

I have 7 regions, 17 sectors. I created index and I am going to partition sectors and regions according to index. As you can see List, values are 1.32, 2.99 ....but sectors are 1,2,3,4,5,6,7. So where did come from this numbers? Would anyone assist me, please?

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float(region sector index)
4 3  7
4 7 11
4 7 11
4 7 11
4 7 11
4 7 11
4 7 11
4 7 11
4 7 11
4 7 11
4 7 11
4 7 11
4 7 11
4 7 11
4 7 11
4 7 11
4 7 11
4 7 11
4 7 11
4 7 11
4 7 11
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
5 3  8
end

generate index=(region+sector)
 collapse sector , by(index)

. list

     +------------------+
     | index     sector |
     |------------------|
  1. |     1          1 |
  2. |     2   1.315152 |
  3. |     3   2.990105 |
  4. |     4   3.018043 |
  5. |     5   3.930294 |
     |------------------|
  6. |     6   3.256762 |
  7. |     7   6.395171 |
  8. |     8   6.788081 |
  9. |     9   7.196252 |
 10. |    10   7.250002 |
     |------------------|
 11. |    11   7.106759 |
 12. |    12   8.343517 |
 13. |    13   11.18524 |
 14. |    14   9.862413 |
 15. |    15   13.13492 |
     |------------------|
 16. |    16   13.46302 |
 17. |    17   13.52856 |
 18. |    18   15.02726 |
 19. |    19   14.79059 |
 20. |    20   16.00764 |
     |------------------|
 21. |    21   16.91695 |
 22. |    23         16 |
 23. |    24         17 |
     +------------------+

Listed 50 out of 1996970 observations

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#2

26 Jan 2023, 15:19

The command -collapse sector, by(index)- tells Stata to calculate the average value of sector for each value of index.

Also, "but sectors are 1,2,3,4,5,6,7" is inconsistent with "I have 7 regions, 17 sectors." So the values of the sector variable range between 1 and 17, not 1 and 7. And the averaged values of sector are, accordingly, also in that range.

My question to you is what you are actually trying to do here. Regions 1 through 7 and sectors 17: these numbers are just arbitrary numbers. They are the numeric values that we use for what are actually categorical variable. So I do not understand what you are trying to do when you calculate an index as region + sector: you are adding two numbers that are just arbitrary and producing arbitrary results. Addition is not meaningful for numbers that are just representing categories. Similarly, calculating the average value of the sector numbers also makes no sense.

So please explain what you actually want to do.
1 like
Comment
George Ford

Join Date: Aug 2014

Posts: 3337
#3

26 Jan 2023, 15:57

egen sid = group(region sector)

then collapse by sid
Comment

Paris Rira

Join Date: Dec 2022
Posts: 385

26 Jan 2023, 17:41

Originally posted by Clyde Schechter View Post

The command -collapse sector, by(index)- tells Stata to calculate the average value of sector for each value of index.

Also, "but sectors are 1,2,3,4,5,6,7" is inconsistent with "I have 7 regions, 17 sectors." So the values of the sector variable range between 1 and 17, not 1 and 7. And the averaged values of sector are, accordingly, also in that range.

Code:

. tab sector

     sector |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |      9,105        0.46        0.46
          2 |      3,078        0.15        0.61
          3 |    723,880       36.25       36.86
          4 |      9,932        0.50       37.36
          5 |      7,512        0.38       37.73
          6 |     53,207        2.66       40.40
          7 |    993,336       49.74       90.14
          8 |     47,803        2.39       92.53
          9 |     22,743        1.14       93.67
         10 |     36,639        1.83       95.51
         11 |     10,124        0.51       96.01
         12 |     17,933        0.90       96.91
         13 |     39,926        2.00       98.91
         14 |      3,720        0.19       99.10
         15 |      4,684        0.23       99.33
         16 |      9,467        0.47       99.81
         17 |      3,881        0.19      100.00
------------+-----------------------------------
      Total |  1,996,970      100.00

. tab region

     region |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |    510,172       39.33       39.33
          2 |     19,183        1.48       40.81
          3 |    306,266       23.61       64.41
          4 |    405,586       31.27       95.68
          5 |     39,439        3.04       98.72
          6 |      5,040        0.39       99.11
          7 |     11,562        0.89      100.00
------------+-----------------------------------
      Total |  1,297,248      100.00

So please explain what you actually want to do.

Well, I have a model :

Click image for larger version

Name: image_29997.png
Views: 2
Size: 3.3 KB
ID: 1699026

Y: firms output
I: firms (some thousands)
S: 17 industrial sectors
d: 7 regions
t: 2010-2021
fi: firms fixed effects
FisT: sector-by-period (where T=2)
Fi rT: region by period fixed effects
Xit: Control varibal firm i, time t
Sdt: explanatory variable : immigrants share in region d time t.

since dimensions of Y Sdt Xit differ from each other. I decided to make an index of sector& region and then partition Y, FI, S, and X according to the index. It's easy to do OLS in this condition.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#5

26 Jan 2023, 18:52

since dimensions of Y Sdt Xit differ from each other. I decided to make an index of sector& region and then partition Y, FI, S, and X according to the index.

I'm not really sure I understand what you're doing here, but perhaps what you want is a variable the indicates the combination of sector and region. You would get that with:

Code:

egen index = group(sector region)

That will give you a variable, index, ranging from 1 through 119 (= 7 * 17), with each of its values corresponding to a combination of sector and region. That will still no, however, make it sensible to -collapse sector, by(index)-, as it is meaningless to perform any arithmetic operations on the sector variable. I still don't know what you hoped to accomplish with that command.
Comment
Paris Rira

Join Date: Dec 2022

Posts: 385
#6

27 Jan 2023, 03:14

So, I can say another way. Prof Schechter, I have got that econometric model. I am going to estimate that. First, I need to prepare the dataset. According to the above information, what's your suggestion?Please assist me. I used to estimate Yitd but not Yisdt.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#7

27 Jan 2023, 10:46

I don't think I can answer that question. It seems that you are taking a model that is defined in terms of effects at the firm, sector, and region level and you want to marginalize it to just firm and region. But you do have the data disaggregated by firm, sector, and region. So there could be two general approaches. One would be to aggregate the data itself up to the firm and region level. For some variables that might mean adding up the sector-specific values within region, and for others it might mean averaging, or taking medians, or some other way of doing it. The choice would depend on the meaning of the actual variables. These choices require econometric understanding that I do not have.

The other approach is to estimate the original model including all of its original effects (including sector). Then getting average marginal effects at the firm and region level. The calculations can be done with the -margins- command. But there is the issue of whether this overall approach is even valid--again, an econometric question that I have no ability to answer. And there is also the issue of how representative the distribution of sectors in your data is compared to the real world distribution of sectors within regions and firms within sectors, which might require some kind of weighting instead of simple average marginal effects. So again, this comes down to econometric knowledge.

If somebody else following this thread feels comfortable grappling with those issues, I would welcome their joining the discussion and contributing. But I can't take this any farther as it is way beyond issues that are just statistical or Stata.
1 like
Comment
Paris Rira

Join Date: Dec 2022

Posts: 385
#8

27 Jan 2023, 15:11

Actually, one way or let's say easy one would be making all variables in the same dimension. That's why I assumed by making an index of the sector and the region. why? well, if I could make, i.g secor1 and reion1= index11, sector 1 and region 2=index 12, ....sector17 and region 7= index 177 (imaginary). Now the rest would be easy as Yijt that is customary. I mean by making this index when comes to Fi s,t I should take s from index and t=2010-2021. In terms of F r,t , I could get r from the index (does not matter that the index also possess s , I only pick up r at this moment) and so on ...
I guess if we could do this process in stata language the puzzle might be solved.

Last edited by Paris Rira; 27 Jan 2023, 15:13.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#9

27 Jan 2023, 15:22

Well, if you are confident that approach will work, the code shown in #3 or #5 (it's the same code) will create the index. And you won't need to "extract" the region from the index, because you already have a separate region variable to work with.
Comment
Paris Rira

Join Date: Dec 2022

Posts: 385
#10

27 Jan 2023, 16:07

Originally posted by Clyde Schechter View Post

And you won't need to "extract" the region from the index, because you already have a separate region variable to work with.

The point is exactly here, not to use region merely or sector merely. By making an index I wanna make the same level for all variables.
I did code #3 though I could not go further. I knew theoretically, but how to make it is still challenging for me.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30357
#11

27 Jan 2023, 16:25

So something like:

Code:

reghdfe Y S X, absorb(firm index#time)

would do that.

Note: -reghdfe- is written by Sergio Correa and is available from SSC.
1 like
Comment
Paris Rira

Join Date: Dec 2022

Posts: 385
#12

27 Jan 2023, 16:33

Originally posted by Clyde Schechter View Post

That will still no, however, make it sensible to -collapse sector, by(index)-, as it is meaningless to perform any arithmetic operations on the sector variable.

I am still thinking about your opinion about collapse by index. if it is not eligible to do it, so I am really unable to estimate. Because first I should collapse sector, and region by index then sector#index#time.
I should adjust the sector(region) according to the index, afterward time. My knowledge of Econometrics is shallow. I caught this idea once I was in a discussion with an econometric prof.
Comment

Paris Rira

Join Date: Dec 2022
Posts: 385

#13

27 Jan 2023, 16:39

And one more thing, I access the syntax. But I really have no idea of preparing its data. Here are the codes to run the #4 model.

Code:

tsset id year
local control "TG_F foreign_affil D1984 D1990 D1995 D2000 D2005 "
local variable mig_jump 

foreach x of local variable {
xi: reg            `x'  immi_sh                                                  `control'  i.region         i.sec            if year>=1996 , vce(cluster dep) 
outreg2              immi_sh                                                `control'                                   using `x'.xls, nocons bdec(3) replace

xi: xtreg        `x'  immi_sh                                                  `control'  i.region        i.sec            if year>=1996 , fe cluster (dep) 
outreg2              immi_sh                                                `control'                                   using `x'.xls, nocons bdec(3) append

log using iv_`x'.log, replace
xi: xtivreg2    `x' (immi_sh = impu_sh_origin )                             `control'  i.region*i.per i.sec*i.per    if year>=1996 , fe first cluster (dep) 
outreg2              immi_sh                                                `control'                                using `x'.xls, nocons bdec(3) append

xi: xtivreg2    `x' (immi_sh S_tfp_op_sh= impu_sh_origin S_tfp_op_iv )         `control'  i.region*i.per i.sec*i.per    if year>=1996 , fe first cluster (dep) 
outreg2              immi_sh S_tfp_op_sh                                    `control'                                using `x'.xls, nocons bdec(3) append

xi: xtivreg2    `x' (immi_sh S_emplo_sh= impu_sh_origin S_emplo_iv )         `control'  i.region*i.per i.sec*i.per    if year>=1996 , fe first cluster (dep) 
outreg2              immi_sh S_emplo_sh                                        `control'                                using `x'.xls, nocons bdec(3) append

log close

}

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30357

#14

27 Jan 2023, 17:06

There isn't a whole lot of preparation of the data needed beyond the code already shown in #3 and #5. The code you show needs modifications to rely on the index rather than on the region and sector variables. Also, it can be improved in several other ways. Here's how I would modify it:

Code:

local control "TG_F foreign_affil D1984 D1990 D1995 D2000 D2005 "
egen index = group(region sector)

reg            mig_jump  immi_sh                                                  `control'  i.index            if year>=1996 , vce(cluster dep)
outreg2              immi_sh                                                `control'                                   using mig_jump.xls, nocons bdec(3) replace

reghdfe        mig_jump  immi_sh                                                  `control'          if year>=1996 , absorb(id index index#per) cluster (dep)
outreg2              immi_sh                                                `control'                                   using mig_jump.xls, nocons bdec(3) append

log using iv_mig_jump.log, replace
xtset id
xi: xtivreg2    mig_jump (immi_sh = impu_sh_origin )                             `control'  i.index#i.per    if year>=1996 , fe first cluster (dep)
outreg2              immi_sh                                                `control'                                using mig_jump.xls, nocons bdec(3) append

xi: xtivreg2    mig_jump (immi_sh S_tfp_op_sh= impu_sh_origin S_tfp_op_iv )         `control'  i.index#i.per    if year>=1996 , fe first cluster (dep)
outreg2              immi_sh S_tfp_op_sh                                    `control'                                using mig_jump.xls, nocons bdec(3) append

xi: xtivreg2    mig_jump (immi_sh S_emplo_sh= impu_sh_origin S_emplo_iv )         `control'  i.index#i.per    if year>=1996 , fe first cluster (dep)
outreg2              immi_sh S_emplo_sh                                        `control'                                using mig_jump.xls, nocons bdec(3) append

log close

Notes: -reghdfe- is written by Sergio Correa and is available from SSC. It is the easiest way to do fixed-effects regressions with multiple fixed effects.

I have changed -tsset id year- to -xtset id-. For your purposes, they will do the same thing, and the latter is clearer and simpler.

Having a local macro containing only a single variable name and then looping over it is silly. Just refer directly to mig_jump in the regression commands. -xi:- is obsolete. It has been replaced by factor-variable notation. I have made those changes in your code already. However, -xtivreg2- is not an official Stata command. I don't know if it supports factor variable notation or not. I recommend trying it without -xi:-. If you get an error message saying that it doesn't allow factor variables or that it does know what i.index#i.per is, then put the -xi:- back.

I do not use -outreg2- and am unfamiliar with it. So I don't know if you will need to modify the -outreg2- commands to accommodate these changes.

Your -reg- command makes no mention of the period variable, whereas all the others do. I have left it that way, but wanted to call your attention to this in case it is a mistake.

Comment

Paris Rira

Join Date: Dec 2022

Posts: 385
#15

28 Jan 2023, 03:43

Originally posted by Clyde Schechter View Post

Notes: -reghdfe- is written by Sergio Correa and is available from SSC. It is the easiest way to do fixed-effects regressions with multiple fixed effects.

I am sorry. I am not familiar with reghdf, could you please explain why is the easiest way to do fixed-effect. apparently, it is not short cut by comparing these two codes with each other.

Originally posted by Clyde Schechter View Post

Your -reg- command makes no mention of the period variable, whereas all the others do. I have left it that way but wanted to call your attention to this in case it is a mistake.

In order to provide a comparison between With and Without periods, I don't add time in the first line.
Comment

Announcement

collapse & index

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment