Aggregegation

Ali Abutaleb

Join Date: Jun 2022
Posts: 36

15 Jun 2023, 03:52

Dear all

I have individual longitudinal data. I want to estimate at the industry level ( aggregation and Collapse in Industry) to get the proportion of people on ZHC and the proportion of people in lockdown. I'm not sure what is the best way to do this. Could you help with this?

To be more specific, I want to estimate workers in zero-hours contracts (ZHC) during the lockdown period ( lockdown ) to answer this question ( what happens to ZHC during the lockdown in the industry?

This is my data and the only interesting variables:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double PERSID long industry1 byte quarter long LGWT20 byte(FLOW SEX) double HOURPAY long ZHC float log_HOURPAY double Lockdown float wdate
 970592020102  7 4 10018 3 1     . 0         . 0 44
7990493010102 12 4  7501 3 2     . 0         . 0 43
4230794720101  7 4  1004 3 2 11.83 0  2.470639 0 46
1290892010101 23 4 21860 3 2     . 0         . 0 47
4230291540102 12 4   819 3 2     . 0         . 0 41
1960892020102 13 4 13880 3 1     . 0         . 0 47
3160393040101 12 4 13593 3 1     . 0         . 0 42
9161293010101 10 4 19420 3 2     . 0         . 0 51
 600194020101 23 4 10257 3 2  8.51 0  2.141242 0 40
9221092020102 12 4 14481 3 1     . 0         . 0 49
9191092020103 23 4 16392 3 1     . 0         . 0 49
1891393060101  7 4 15634 3 2     . 0         . 0 52
2350994050101 12 4 14204 3 2     . 0         . 0 48
4230794570103  5 4  1901 3 1     . 0         . 0 46
1060593060101  2 4 10117 3 2     . 0         . 0 44
 550894030104 23 4 27289 3 2  8.72 0 2.1656191 0 47
 870594010101 10 4 13440 3 1 24.71 0  3.207208 0 44
 321192030102  7 4 16365 3 2     . 0         . 0 50
7880594010101  4 4  9408 3 1  31.2 0  3.440418 0 44
2990894030101  5 4  7730 3 1     . 0         . 0 47
2381191010101  7 4 11444 3 2     . 0         . 0 50
4230791120101 12 4  1060 3 2     . 0         . 0 46
1690694010101  5 4  3185 3 1     . 0         . 0 45
1170692030102  7 4 12177 3 2     . 0         . 0 45
2591092030101 18 4 15475 3 1     . 0         . 0 49
4231091860101 12 4  2341 3 2     . 0         . 0 49
4230392620101 17 4  2185 3 2     . 0         . 0 42
 150294010102 12 4  7364 3 2  8.61 0 2.1529243 0 41
 281393030102  5 4  7755 3 2     . 0         . 0 52
2750894040102 14 4 31985 3 1    11 0  2.397895 0 47
 350293010101 14 4  8557 3 2     . 0         . 0 41
1610194020102  7 4  7782 3 2 13.47 0  2.600465 0 40
 330894030101 12 4  8420 3 2 22.05 0 3.0933125 0 47
1030194010101 20 4  4852 5 2 24.43 0  3.195812 0 40
1361291020102 14 4 40509 3 1     . 0         . 0 51
2360391050101 13 4 25128 3 1     . 0         . 0 42
1090191040101 20 4 17537 3 1     . 0         . 0 40
7580191010101 12 4 20352 3 2     . 0         . 0 40
2600791010102 23 4 13344 3 1     . 0         . 0 46
1490594010102  1 4  8074 3 2 46.78 0  3.845456 0 44
1250493030102  2 4  5960 3 1     . 0         . 0 43
  30594010102 15 4  6672 3 1     . 0         . 0 44
3060392010103 19 4 49875 4 1     . 0         . 0 42
 870991030101 12 4 19276 3 1     . 0         . 0 48
 860792020102  2 4 22660 3 2     . 0         . 0 46
2850593020101  7 4  5833 9 2     . 0         . 0 44
9251093010103 14 4 32341 3 2     . 0         . 0 49
2200992010104 17 4 17868 4 2     . 0         . 0 48
1520993040101  5 4 12415 3 1     . 0         . 0 48
2401192020101 17 4 22110 3 2     . 0         . 0 50
 871094010102 17 4 17267 3 2 17.77 0  2.877512 0 49
1610392020101 23 4 13372 3 2     . 0         . 0 42
9100691010101 12 4 10525 3 2     . 0         . 0 45
1990594010102 12 4 34410 3 2  9.61 0 2.2628043 0 44
4230993210102 23 4  2350 3 1     . 0         . 0 48
4230392140103 23 4  1116 3 2     . 0         . 0 42
4230194750101 18 4  7573 3 1  8.46 0 2.1353493 0 40
 871391040101  7 4 21563 3 2     . 0         . 0 52
1661394020101  7 4 12820 3 2  9.24 0  2.223542 0 52
4230992950102 23 4  1193 3 2     . 0         . 0 48
 110394010102 18 4 20873 3 1 10.13 0 2.3155012 0 42
2390994010102 18 4 14471 3 1 21.22 0  3.054944 0 48
 550294010101  7 4  8237 3 1     . 0         . 0 41
4230492870101 12 4  1111 3 2     . 0         . 0 43
 550193030101  7 4 14040 3 2     . 0         . 0 40
1550392040101 17 4  6407 6 2     . 0         . 0 42
 421394010101 23 4 15955 3 1  6.98 0  1.943049 0 52
2830792010102 10 4  7400 3 2     . 0         . 0 46
1260692010101 18 4  6742 3 2     . 0         . 0 45
4230194030101 19 4   904 3 2 17.94 0  2.887033 0 40
2800294020102 18 4  9639 3 2 18.28 0 2.9058075 0 41
 880291010101 10 4 16435 3 1     . 0         . 0 41
 510792010102 23 4 10225 3 1     . 0         . 0 46
1040691020101 18 4 20079 3 1     . 0         . 0 45
 470691020101 17 4 15353 3 2     . 0         . 0 45
1690193010101 21 4  9184 3 1     . 0         . 0 40
 541194010101 12 4 10827 3 1 36.06 0  3.585184 0 50
2360293010102 14 4  5820 3 1     . 0         . 0 41
7820991020101 23 4  9782 3 1     . 0         . 0 48
1581191010101  1 4 13957 3 1     . 0         . 0 50
7910991010102  7 4 17608 3 2     . 0         . 0 48
2950691030102 19 4 11475 3 2     . 0         . 0 45
  30291010102 23 4 11180 3 2     . 0         . 0 41
4230594590102  1 4   434 3 2   7.4 0   2.00148 0 44
2161294030101  7 4 14601 3 1 17.21 0  2.845491 0 51
4230394850101 18 4  1100 3 1     . 0         . 0 42
2230394040101 19 4  8736 3 1    13 0  2.564949 0 42
4231092020103  1 4  2996 3 1     . 0         . 0 49
 500794020102 19 4  8031 3 2 27.54 0 3.3156395 0 46
1210693020102  4 4  9243 3 2     . 0         . 0 45
  91393020101  7 4  7512 3 2     . 0         . 0 52
1880791010101 12 4  8232 3 2     . 0         . 0 46
1780292010104 23 4 55938 3 2     . 0         . 0 41
 701094070102  7 4 23625 3 2     . 0         . 0 49
 670492050102 10 4 20321 3 2     . 0         . 0 43
1951091020103 20 4 31110 3 2     . 0         . 0 49
 621193030101 19 4 21545 3 2     . 0         . 0 50
1210794010101  4 4 11375 3 2     . 0         . 0 46
1680293030101 18 4  5889 3 1     . 0         . 0 41
1070694060102  7 4  6183 3 2  8.88 0 2.1838017 0 45
end
label values industry1 industry
label def industry 1 "Accommodation And Food Service Activities", modify
label def industry 2 "Administrative And Support Service Activities", modify
label def industry 4 "Arts, Entertainment And Recreation", modify
label def industry 5 "Construction", modify
label def industry 7 "Education", modify
label def industry 10 "Financial and insurance activities", modify
label def industry 12 "Human Health And Social Work Activities", modify
label def industry 13 "Information And Communication", modify
label def industry 14 "Manufacturing", modify
label def industry 15 "Mining and quarrying", modify
label def industry 17 "Other service activities", modify
label def industry 18 "Professional, Scientific And Technical Activities", modify
label def industry 19 "Public admin and defence", modify
label def industry 20 "Real estate activit ies", modify
label def industry 21 "Transportation And Storage", modify
label def industry 23 "Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles", modify
label values quarter quarter
label def quarter 4 "Oct-Des 2019", modify
label values LGWT20 LGWT22
label values FLOW FLOW
label def FLOW 3 "In employment at first quarter; in employment at final quarter (EE)", modify
label def FLOW 4 "In employment at first quarter; unemployed at final quarter (EU)", modify
label def FLOW 5 "In employment at first quarter; inactive at final quarter (EN)", modify
label def FLOW 6 "Unemployed at first quarter; in employment at final quarter (UE)", modify
label def FLOW 9 "Inactive at first quarter; in employment at final quarter (NE)", modify
label values SEX SEX
label def SEX 1 "Male", modify
label def SEX 2 "Female", modify
label values HOURPAY HOURPAY5
label values wdate wdate
label def wdate 40 "Oct-Des 2019 1", modify
label def wdate 41 "Oct-Des 2019 2", modify
label def wdate 42 "Oct-Des 2019 3", modify
label def wdate 43 "Oct-Des 2019 4", modify
label def wdate 44 "Oct-Des 2019 5", modify
label def wdate 45 "Oct-Des 2019 6", modify
label def wdate 46 "Oct-Des 2019 7", modify
label def wdate 47 "Oct-Des 2019 8", modify
label def wdate 48 "Oct-Des 2019 9", modify
label def wdate 49 "Oct-Des 2019 10", modify
label def wdate 50 "Oct-Des 2019 11", modify
label def wdate 51 "Oct-Des 2019 12", modify
label def wdate 52 "Oct-Des 2019 13", modify

I tried this command to Collapse at the industry level :

Code:

  collapse industry1 Lockdown LGWT20 FLOW SEX HOURPAY ZHC log_HOURPAY wdate , by( quarter PERSID )

Then run regression by :

Code:

xtreg  ZHC Lockdown [fw= LGWT22], fe vce (cluster PERSID )

Could you help with that? Is this the correct way to aggregation at the industry level?

Tags: None

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17735
#2

15 Jun 2023, 07:37

Ali:
two comments about your post:
1) in your excerpt:

Code:

. tab ZHC ZHC | Freq. Percent Cum. ------------+----------------------------------- 0 | 100 100.00 100.00 ------------+----------------------------------- Total | 100 100.00 .

that is, there's no variation (no regression is expected to work);
2) if your -ZHC- is a categorical or a count variable, why going -xtreg-?

Kind regards,
Carlo
(Stata 19.0)
Comment

Ali Abutaleb

Join Date: Jun 2022
Posts: 36

15 Jun 2023, 11:56

Originally posted by Carlo Lazzaro View Post

Ali:
two comments about your post:
1) in your excerpt:

Code:

. tab ZHC

ZHC | Freq. Percent Cum.
------------+-----------------------------------
0 | 100 100.00 100.00
------------+-----------------------------------
Total | 100 100.00

.

that is, there's no variation (no regression is expected to work);
2) if your -ZHC- is a categorical or a count variable, why going -xtreg-?

Thank you Carlo for your reply

1- The "ZHC" variable is a dummy variable which is reported if the person working in zero hours contracts and tasks value 0 or 1. My sample above did not show the variation because I have 71000 obs, and only 2000 reported they are working as zero-hours contracts ( the first 100 obs did not show that ). Please see this sample it illustrates the variation :

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input double PERSID long industry1 byte quarter long LGWT20 byte(FLOW SEX) double HOURPAY long ZHC float log_HOURPAY double Lockdown float wdate
4231193680102 12 5  2068 3 2                  . 0         .    0 63
 791001070101 21 5 27104 3 1                  . 0         .    0 62
9170491010102 19 5 20030 3 2              18.57 0 2.9215474    0 56
2191291010102 23 5 46493 3 1 12.030000000000001 0 2.4874036    0 64
4231093990101 14 5  1537 3 1                  . 0         .    0 62
2291094020102  2 5 36102 3 1                  . 0         .    0 62
1580493030101 13 5  5352 3 1                  . 0         .    0 56
 341392010101 19 5 13899 3 2                  . 0         .    0 65
2020693010102  5 5  2227 7 1                  . 0         .    0 58
2120992020101  2 5 12422 4 2                  . 0         .    0 61
4230191520102 19 5  1621 3 1              13.09 0 2.5718486    0 53
1380401010103  7 5 15235 3 2               12.5 0  2.525729    0 56
 361093030101 19 5 10399 3 2                  . 0         .    0 62
4231091290102 12 5  2616 3 1  8.120000000000001 0   2.09433    0 62
 540691030102  3 5 26540 3 1                  . 1         .    0 58
9430291020102  2 5 11540 3 2               8.08 0  2.089392    0 54
1650192030102  5 5 10178 3 2                  . 0         .    0 53
1021393030102  7 5  9044 3 2                  . 0         . 13.8 65
1510693030101 10 5  6354 3 1                  . 0         .    0 58
9420192020102  2 5 17429 3 2                  . 0         .    0 53
4230794880103  5 5  1361 3 1                  . 0         .    0 59
1010791050101 18 5  8627 3 2               41.2 0  3.718438    0 59
4231191510101  5 5  2376 3 1                  . 0         .    0 63
 371394010101 11 5  8008 3 1                  . 0         .    0 65
2740591040101  3 5 22981 3 1 12.780000000000001 0 2.5478814    0 57
7591001020101 12 5 20184 3 1               7.98 0 2.0769384    0 62
2960801020103  1 5 21769 5 2                  . 0         .    0 60
4230794160102 14 5  1004 3 2                  . 0         .    0 59
4231292170101  5 5  2632 3 1                  . 0         .    0 64
9400392040102  1 5 16544 3 2                  . 0         .    0 55
1280394070101 18 5  7896 3 1                  . 0         .    0 55
1640993040102 12 5 15382 3 2                  . 0         .    0 61
9110391010103 14 5 24604 6 2  8.120000000000001 0   2.09433    0 55
2390701050102 12 5 48771 3 1               12.5 0  2.525729    0 59
 650391020102 14 5 21852 3 2                  . 0         .    0 55
1640993040101  5 5 10459 3 1                  . 0         .    0 61
2330994010101 19 5 15785 3 2                  . 0         .    0 61
9360793020102 23 5 21556 5 1                  . 0         .    0 59
2150901030101 14 5 24682 3 1              24.68 0  3.205993    0 61
 291394020102 23 5 17556 3 1                  . 0         .   27 65
1150101010102 19 5 13992 3 1                  . 0         .    0 53
2380493020102  2 5  9557 3 1                  . 0         .    0 56
1970101030102 12 5  5531 3 2                  . 0         .    0 53
1380201030103  7 5 32131 5 2               3.09 0 1.1281711    0 54
 751191020103 23 5 30489 6 2                  . 0         .    0 63
9061392010102 21 5 15132 3 1                  . 0         .  7.7 65
 610392010102 12 5 15164 3 2                  . 0         .    0 55
 770393020101 13 5 18472 3 1                  . 0         .    0 55
9210101020101  5 5 10304 3 1                  . 0         .    0 53
2831393020101  7 5  7306 3 2                  . 0         . 13.8 65
3160394020102 14 5 15215 3 1                  . 0         .    0 55
1371394030101  7 5  3182 3 2                  . 0         . 13.8 65
4231001620102 14 5  7388 3 2                 10 0 2.3025851    0 62
 980694050103 13 5 37855 3 1                  . 0         .    0 58
4230394990101  5 5  1186 3 1                  . 0         .    0 55
 320894030102 14 5  7654 3 1                  . 0         .    0 60
9120192020102 10 5 22281 3 1                  . 0         .    0 53
3110994020102 10 5  8286 3 2                  . 0         .    0 61
2670894040101  5 5 13710 5 1                  . 0         .    0 60
 730593060101 13 5 27917 3 1                  . 0         .    0 57
2050401010102  5 5 32419 3 1                  . 0         .    0 56
1731391020104 23 5 53687 3 2               3.97 0  1.378766   27 65
1931392020102 12 5  7826 3 2                  . 0         .  3.5 65
4230193950102 19 5  1620 3 1                  . 0         .    0 53
4231394670102 12 5  2401 3 2                  . 0         .  3.5 65
1390193040101 12 5  9406 3 2                  . 0         .    0 53
2390793030101 21 5  8906 3 1                  . 0         .    0 59
1161093040102 19 5  7173 5 1                  . 0         .    0 62
1661394020102 14 5 11578 3 1                  . 0         . 22.7 65
1010792040102 18 5 13115 3 1                  . 0         .    0 59
9100691010101 12 5 10525 3 2              10.13 0 2.3155012    0 58
2970693030101 18 5  3839 3 1                  . 0         .    0 58
1110393040102 21 5 15004 3 1                  . 0         .    0 55
2340293030102 20 5  5981 3 2                  . 0         .    0 54
2680992040102  1 5 18377 3 1                  . 0         .    0 61
 160293010102  7 5  3971 6 2                  . 0         .    0 54
7541201020103 23 5 45783 3 2               8.19 0 2.1029139    0 64
  51194020102  7 5  9500 3 2                  . 0         .    0 63
 300492010102  7 5 20189 3 2                  . 0         .    0 56
9300994040102  1 5 13845 3 2                  . 0         .    0 61
2610693030101 19 5  8695 3 2                  . 0         .    0 58
2151393020101 14 5  5196 3 1                  . 0         . 22.7 65
2160592020101  5 5  8466 3 1                  . 0         .    0 57
2390601010101  7 5 16331 3 2              23.38 0  3.151881    0 58
 370593020102  7 5 13232 3 2                  . 0         .    0 57
1601092030101  7 5 11248 3 2                  . 0         .    0 62
2270792010101 12 5 32042 3 1                  . 1         .    0 59
 150593020104  1 5 26133 3 2                  . 0         .    0 57
1970892020101 19 5 13623 3 2                  . 0         .    0 60
 461391050102 10 5 17098 3 1               20.6 0  3.025291    0 65
2390701050101 19 5 11571 3 2              10.92 0  2.390596    0 59
1370893010102 12 5 18760 3 2                  . 0         .    0 60
9160491010102 23 5 15797 3 2                9.6 0  2.261763    0 56
1740994020102  5 5 19355 3 1                  . 0         .    0 61
 510501030102 10 5  8946 3 2              15.18 0  2.719979    0 57
 300593030101 12 5  7923 3 2                  . 0         .    0 57
1440301030101  1 5  8888 3 2              19.23 0 2.9564714    0 55
2080693030102  1 5 14590 5 2                  . 1         .    0 58
1900601010102  5 5  6282 3 1                  . 0         .    0 58
1321192010101 18 5  7667 3 2                  . 0         .    0 63
end
label values industry1 industry
label def industry 1 "Accommodation And Food Service Activities", modify
label def industry 2 "Administrative And Support Service Activities", modify
label def industry 3 "Agriculture, forestry and fishing", modify
label def industry 5 "Construction", modify
label def industry 7 "Education", modify
label def industry 10 "Financial and insurance activities", modify
label def industry 11 "Households as employers", modify
label def industry 12 "Human Health And Social Work Activities", modify
label def industry 13 "Information And Communication", modify
label def industry 14 "Manufacturing", modify
label def industry 18 "Professional, Scientific And Technical Activities", modify
label def industry 19 "Public admin and defence", modify
label def industry 20 "Real estate activit ies", modify
label def industry 21 "Transportation And Storage", modify
label def industry 23 "Wholesale And Retail Trade; Repair Of Motor Vehicles And Motorcycles", modify
label values quarter quarter
label def quarter 5 "Jan-Mar 2020", modify
label values LGWT20 LGWT22
label values FLOW FLOW
label def FLOW 3 "In employment at first quarter; in employment at final quarter (EE)", modify
label def FLOW 4 "In employment at first quarter; unemployed at final quarter (EU)", modify
label def FLOW 5 "In employment at first quarter; inactive at final quarter (EN)", modify
label def FLOW 6 "Unemployed at first quarter; in employment at final quarter (UE)", modify
label def FLOW 7 "Unemployed at first quarter; unemployed at final quarter (UU)", modify
label values SEX SEX
label def SEX 1 "Male", modify
label def SEX 2 "Female", modify
label values HOURPAY HOURPAY5
label values wdate wdate
label def wdate 53 "Jan-Mar 2020 1", modify
label def wdate 54 "Jan-Mar 2020 2", modify
label def wdate 55 "Jan-Mar 2020 3", modify
label def wdate 56 "Jan-Mar 2020 4", modify
label def wdate 57 "Jan-Mar 2020 5", modify
label def wdate 58 "Jan-Mar 2020 6", modify
label def wdate 59 "Jan-Mar 2020 7", modify
label def wdate 60 "Jan-Mar 2020 8", modify
label def wdate 61 "Jan-Mar 2020 9", modify
label def wdate 62 "Jan-Mar 2020 10", modify
label def wdate 63 "Jan-Mar 2020 11", modify
label def wdate 64 "Jan-Mar 2020 12", modify
label def wdate 65 "Jan-Mar 2020 13", modify

2- I used xtreg because I have panel data, I did not know if there is any problem with using that if my variable is a dummy. I appreciate if you have any suggestion

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17735
#4

16 Jun 2023, 00:34

Ali:
if your dependent variable is categorical, you should use -xtllogit-.
I am not sure that you have to -collapse- your data first to get what you're after.

Kind regards,
Carlo
(Stata 19.0)
Comment
Ali Abutaleb

Join Date: Jun 2022

Posts: 36
#5

16 Jun 2023, 03:03

Originally posted by Carlo Lazzaro View Post

Ali:
if your dependent variable is categorical, you should use -xtllogit-.
I am not sure that you have to -collapse- your data first to get what you're after.

I intend to use a linear probability model ( LPM), but the most important to me is the collapse of the group ( Industry ) which I'm not interested to estimate the individual effect

I don't know if there is any way to aggregate to the Industry level
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17735

16 Jun 2023, 03:59

Ali:
if you -collapse- on -industry- yon will not avoid the -fe- curse:

Code:

. xtset PERSID quarter

Panel variable: PERSID (strongly balanced)
 Time variable: quarter, 5 to 5
         Delta: 1 unit

. xtreg  ZHC Lockdown, fe vce (cluster PERSID )
note: Lockdown omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =        100
Group variable: PERSID                          Number of groups  =        100

R-squared:                                      Obs per group:
     Within  =      .                                         min =          1
     Between =      .                                         avg =        1.0
     Overall =      .                                         max =          1

                                                F(0,99)           =          .
corr(u_i, Xb) =      .                          Prob > F          =          .

                               (Std. err. adjusted for 100 clusters in PERSID)
------------------------------------------------------------------------------
             |               Robust
         ZHC | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
    Lockdown |          0  (omitted)
       _cons |        .03          .        .       .            .           .
-------------+----------------------------------------------------------------
     sigma_u |  .17144661
     sigma_e |          .
         rho |          .   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Kind regards,
Carlo
(Stata 19.0)

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30168
#7

16 Jun 2023, 08:45

I want to estimate workers in zero-hours contracts (ZHC) during the lockdown period ( lockdown ) to answer this question ( what happens to ZHC during the lockdown in the industry?

I don't understand your data in relation to this question. I would expect you to have an indicator ("dummy") variable indicating the lockdown period. But the only variable that looks like it is related to the Lockdown is a floating point variable taking on values like 3.5 and 22.7 ... What's that about? And how can you tell which observations are during the lockdown period in this data? Without that, you can't even begin to answer this research question.
Comment
Ali Abutaleb

Join Date: Jun 2022

Posts: 36
#8

16 Jun 2023, 10:00

Originally posted by Clyde Schechter View Post

I don't understand your data in relation to this question. I would expect you to have an indicator ("dummy") variable indicating the lockdown period. But the only variable that looks like it is related to the Lockdown is a floating point variable taking on values like 3.5 and 22.7 ... What's that about? And how can you tell which observations are during the lockdown period in this data? Without that, you can't even begin to answer this research question.

The variable "Lockdown" is a closure percentage for each Industry at a specific time "wdate", ranging from 0% to 88%.

This variable, "Lockdwon" is external data which is merged to the original data. The information collected by the industry related to closing in the form of a percentage and the time variable, "wdate".

The data in this variable starts from March 2010 to December 2020, so the period from January 2019 to February 2020 is missing, and the period from January 2021 to December 2021 is missing.

Can I answer this question if the variable is not a dummy but a closing percentage?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30168
#9

16 Jun 2023, 10:23

So, are you saying that you want to look for the association between the percentage of the industry closed down and the proportion of workers on ZHCs? That is, in principle, doable.

Perhaps I am still misunderstanding the Lockdown variable, but if it is the proportion of the industry that was shutdown, it should be the same for all observations of the same industry in the same quarter. But in your data it isn't. So either the data is wrong, or there is still something that needs to be clarified here.
Comment
Ali Abutaleb

Join Date: Jun 2022

Posts: 36
#10

16 Jun 2023, 11:14

Originally posted by Clyde Schechter View Post

So, are you saying that you want to look for the association between the percentage of the industry closed down and the proportion of workers on ZHCs? That is, in principle, doable.

Perhaps I am still misunderstanding the Lockdown variable, but if it is the proportion of the industry that was shutdown, it should be the same for all observations of the same industry in the same quarter. But in your data it isn't. So either the data is wrong, or there is still something that needs to be clarified here.

Let me clarify my data first:
I have quarterly longitudinal data , starting from January 2019 until December 2021, and there is a variable of zero-hours contracts, “ZHC“, which is a dummy variable that takes a value of 1 or 2.
I also have an industry variable, which is a categorical variable.

I am trying to answer this question: I have longitudinal data for people working on the ZHC since 2019; I want to know what happened to these workers during the Covid-19 shutdown period and beyond through the industry level in which they work. In other words, if people work on a ZHC in the education industry, what happens to them during the closure period? What is the percentage of the impact of these people who work on ZHC in the education industry?

On the other hand, I have data related to the lockdwon% in each industry on a two-week basis (for every two weeks, the percentage is taken) starting from the last week in March 2020 and ending in December 2020

Therefore, I converted the longitudinal data to weekly and merged it by matching it with the industry and weekly with the lockdown data.

I will think about the weekly dimension, not a quarter, because of the shutdown data I have every week. Therefore, yes, it is true that everyone who works in a specific industry during a specific week and was exposed to the shutdown will take the same percentage, and this is the reason why I want to estimate it according to the industry.

I hope this more clear now
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30168
#11

16 Jun 2023, 11:36

OK. One correction to your description: ZHC is coded 0 /1, which is as it should be, not 1/2.

As a first step, I would look at

Code:

xtset industry1 xtreg ZHC c.Lockdown i.industry1#c.Lockdown, fe margins industry1, dydx(Lockdown)

The -margins- output will show you the estimated marginal effect of Lockdown on the proportion of workers in ZHC in each industry.

In the example data, this runs, but not very well because the example data is very sparse. I imagine that will not be a problem in the real data set. You can consider adding embellishments to this analysis. The example data has only 15 distinct industries, which is not enough to justify clustering the standard errors on industry. But if the real data set has a large enough number, you should do that. You might also want to include some covariates in the analysis.

It is a limitation of your study design that your Lockdown variable is defined only at the industry wide level, and not at the level of the firm that PERSID worked for. It doesn't entirely invalidate your approach, but it does limit the conclusions you can draw from any analysis of this data. So you might conclude that each percentage point increase in an industry's lockdown percentage is associated with X difference in the industry's proportion of workers on ZHC, but you cannot further conclude that it was the workers whose employers were locked down that were so affected.
Comment
Ali Abutaleb

Join Date: Jun 2022

Posts: 36
#12

17 Jun 2023, 16:45

Originally posted by Clyde Schechter View Post

OK. One correction to your description: ZHC is coded 0 /1, which is as it should be, not 1/2.

As a first step, I would look at

Code:

xtset industry1 xtreg ZHC c.Lockdown i.industry1#c.Lockdown, fe margins industry1, dydx(Lockdown)

The -margins- output will show you the estimated marginal effect of Lockdown on the proportion of workers in ZHC in each industry.

In the example data, this runs, but not very well because the example data is very sparse. I imagine that will not be a problem in the real data set. You can consider adding embellishments to this analysis. The example data has only 15 distinct industries, which is not enough to justify clustering the standard errors on industry. But if the real data set has a large enough number, you should do that. You might also want to include some covariates in the analysis.

It is a limitation of your study design that your Lockdown variable is defined only at the industry wide level, and not at the level of the firm that PERSID worked for. It doesn't entirely invalidate your approach, but it does limit the conclusions you can draw from any analysis of this data. So you might conclude that each percentage point increase in an industry's lockdown percentage is associated with X difference in the industry's proportion of workers on ZHC, but you cannot further conclude that it was the workers whose employers were locked down that were so affected.

Thank you for your suggestion

According to my data documents, I have to weigh the data at estimation. Otherwise, I will face a bias in the analysis.

I run this command:

Code:

xtreg ZHC c.Lockdown i.industry1#c.Lockdown [fw= LGWT20], fe

Stata gave me this error :
"weight must be constant within industry1"

Also, I may not be experienced with using Stata, but I wonder, does this method consider that I have estimated at the industry level? Is this the only way for me to group by industry, or is there another way like "collapse"

Waiting for your suggestion. Thank you
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30168
#13

17 Jun 2023, 17:27

The error message, ""weight must be constant within industry1," is correct. In the -xt- suite of commands, if you apply weights, the weights must be constant attributes of the panel, not of the individual observations.

Even though it is the case that your data comes with instructions to weight the data, the data you show does not look, to me, like it would be suitable to use -fweights- with them. You have individual PERSID observations. Applying -fweights- would be very unusual in that circumstance because it implies that that person is actually supposed to be included in the data LGWT20 times, with exactly the same values for all variables. In fact, -fweights- most commonly arise with data that has already been -collapse-d and you are trying to analyze the pre-collapsed version of the data without actually expanding the data set back. Most likely it is really a -pweight-. (The other legal kind of weight with -xtreg, fe- is -aweight-s. But, again, these are almost never applicable to data about individuals--they are used when the data consist of group mean results, and the aweight is then the size of the original group.) But rather than me speculating, you should carefully read the data documentation to read the explanation of what the weight actually represents.

You have referred a few times to analyzing the data by first aggregating it up to the industry level with -collapse-. This might be a sensible approach in light of the weighting issue. The weighting would then be used in the -collapse- command, not in the subsequent regressions. So it would go something like this:

Code:

collapse (mean) ZHC HOURPAY Lockdown [pweight = LGWT20], by(FLOW SEX industry1 wdate) xtset industry1 xtreg ZHC i.industry1#c.LOCKDOWN, fe

Note a few things in the collapse. The resulting data set now has, for each combination of industry1, wdate, FLOW, and SEX the weighted average number of ZHC in that stratum, the weighted average HOURPAY, and the weighted average proportion locked down. Note that I did not include logHOURPAY in the -collapse-. Averaging logHOURPAY would not give you the log of the weighted mean of HOURPAY. It would give you the log of the weighted geometric mean of HOURPAY. (If that's what you want, then go ahead and include logHOURPAY. Legal and valid, but an unusual model.) So you average HOURPAY and if you want to log transform that average, then you can do that as a separate command.

Last edited by Clyde Schechter; 17 Jun 2023, 17:29.
Comment
Ali Abutaleb

Join Date: Jun 2022

Posts: 36
#14

20 Jun 2023, 08:03

Originally posted by Clyde Schechter View Post

The error message, ""weight must be constant within industry1," is correct. In the -xt- suite of commands, if you apply weights, the weights must be constant attributes of the panel, not of the individual observations.

Even though it is the case that your data comes with instructions to weight the data, the data you show does not look, to me, like it would be suitable to use -fweights- with them. You have individual PERSID observations. Applying -fweights- would be very unusual in that circumstance because it implies that that person is actually supposed to be included in the data LGWT20 times, with exactly the same values for all variables. In fact, -fweights- most commonly arise with data that has already been -collapse-d and you are trying to analyze the pre-collapsed version of the data without actually expanding the data set back. Most likely it is really a -pweight-. (The other legal kind of weight with -xtreg, fe- is -aweight-s. But, again, these are almost never applicable to data about individuals--they are used when the data consist of group mean results, and the aweight is then the size of the original group.) But rather than me speculating, you should carefully read the data documentation to read the explanation of what the weight actually represents.

You have referred a few times to analyzing the data by first aggregating it up to the industry level with -collapse-. This might be a sensible approach in light of the weighting issue. The weighting would then be used in the -collapse- command, not in the subsequent regressions. So it would go something like this:

Code:

collapse (mean) ZHC HOURPAY Lockdown [pweight = LGWT20], by(FLOW SEX industry1 wdate) xtset industry1 xtreg ZHC i.industry1#c.LOCKDOWN, fe

Note a few things in the collapse. The resulting data set now has, for each combination of industry1, wdate, FLOW, and SEX the weighted average number of ZHC in that stratum, the weighted average HOURPAY, and the weighted average proportion locked down. Note that I did not include logHOURPAY in the -collapse-. Averaging logHOURPAY would not give you the log of the weighted mean of HOURPAY. It would give you the log of the weighted geometric mean of HOURPAY. (If that's what you want, then go ahead and include logHOURPAY. Legal and valid, but an unusual model.) So you average HOURPAY and if you want to log transform that average, then you can do that as a separate command.

Thank you Clyde for the helpful comments, so I went back to the documents that came with the data it shows that the weighting factors variables serve two purposes: They compensate for non-response bias, and also produce estimates at the level of the population. In this case, I think we assume to use the weighted variable.

Other wondering issues:

I used this command :

Code:

collapse (mean) ZHC HOURPAY Lockdown temporary_job1 Full_time Part_time FLOW SEX [pweight = LGWT20], by(industry1 wdate) xtset industry1 wdate xtreg ZHC Lockdown i.industry1#c.Lockdown , fe

This is different from your suggestion. I include the time panel variable (wdate) before running the estimate, also, I kept only by(industry1 wdate); my question here is this correct ? what is the different if I put by(FLOW SEX industry1 wdate) ?

Another question: how can we interpret the result from this model? Can we say that it was the workers whose employers in ZHC were in lockdown happened that were so affected?

In terms of LogHOURPAY, I want to log the transform average, How can I do that as a separate command? Sorry I did not get this point

Thank you for your comments
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30168
#15

20 Jun 2023, 09:21

what is the different if I put by(FLOW SEX industry1 wdate) ?

In the code that I showed, sex remained an attribute of the worker, and so results could be interpreted as being adjusted for the sex of the worker, or you could go farther and separately estimate the results for male workers and female workers.

In the code that you wrote, SEX has a different meaning: it is the proportion of female workers in each industry. Results are therefore not interpretable as applying specifically to workers of either sex: rather they reflect the effects of the sex ratio in the industry.

I don't understand the variable FLOW, but given that it it has multiple levels that are, at most ordered, and perhaps even just arbitrary categories without ordering, it is meaningless to calculate the mean. So the way you have handled flow makes it a meaningless variable. You shouldn't do it.

Can we say that it was the workers whose employers in ZHC were in lockdown happened that were so affected?

No. By aggregating up to the industry level, you sacrifice the ability to say anything about individual workers or firms. You can only speak about industries and the properties of the industries as a whole. It is entirely possible, for example, that there are more layoffs in industries with more female workers (just making up an example here) but that it is the male workers who got laid off.

n terms of LogHOURPAY, I want to log the transform average, How can I do that as a separate command?

Code:

gen logHOURPAY = log(HOURPAY)
Comment

Announcement

Aggregegation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment