Statalist - General

Fractional response variable with many 0s and 1s

Robert Wagner — Sat, 27 Jul 2024 07:53:15 GMT

Hello Together,

I would like to carry out some analyses to investigate the drivers for the proportion of training places filled at company level. The values of the dependent variable are distributed as follows:

Array

I wonder what type of analysis is suitable for this kind of distribution structure of the dependent variable.

The following options have already been tested:

Fractional response logit: depvar specified by the share of filled positions

Code:

fracreg logit share_positions_filled x1 x2 x2

GLM binomial response: depvar specified by number of sucesses (total_positions_filled) in a series of binomial trials (total_positions_offered)

Code:

glm total_positions_filled y x1 x2 x3, family(binomial total_positions_offered) link(logit) robust

GLM binomial response: depvar specified by the share of filled positions

Code:

glm share_positions_filled y x1 x2 x3, family(binomial) link(logit)

Logit: depvar specified as Dummy (0 = below-average share of positions filled; 1 = above-average share of positions filled)

Code:

logit above_avg_positions_filled y x1 x2 x3

As I am very unsure which of the estimation methods is suitable, I would be pleased if some of you could give your thoughts on this.

Thank you very much for your help & Have a good one

synthetic control method panel data

JOHAN JOHNYS — Sat, 27 Jul 2024 07:17:03 GMT

Good evening.
I would like your help. I have panel data (company years and some variables). The variable of interest is the binary variable named Listed (0 before IPO and 1 when firm goes public ) and the outcome is the Etr index. The companies entered the stock market in different years (START VARIABLE). I want to apply the synthetic control method and examine the effect of the entry company in the stock market on the ETP indicator. It is difficult, I think.

dataex econ_year firm_id listed ETR ROA SIZE_CONTROL start stop

----------------------- copy starting from the next line -----------------------
[CODE]
* Example generated by -dataex-. For more info, type help dataex
clear
input int econ_year float firm_id byte listed float(ETR ROA SIZE_CONTROL) str30(start stop)
2004 790 0 .34032205 .5467306 13.872245 "12-Δεκέμβριος-1994" ""
2005 790 0 .3007714 .676904 14.046932 "12-Δεκέμβριος-1994" ""
2006 790 0 .31999955 .22212203 13.965886 "12-Δεκέμβριος-1994" ""
2007 790 0 .28999993 .5441705 15.573822 "12-Δεκέμβριος-1994" ""
2008 790 0 .24977253 .6065632 15.22519 "12-Δεκέμβριος-1994" ""
2009 790 0 .25224 .27402616 15.515424 "12-Δεκέμβριος-1994" ""
2010 790 0 .25 .1799916 15.323702 "12-Δεκέμβριος-1994" ""
2011 790 0 0 -.08687542 15.288954 "12-Δεκέμβριος-1994" ""
2012 790 0 0 -.04736073 15.179316 "12-Δεκέμβριος-1994" ""
2013 790 0 -.08996403 -.04128758 15.017408 "12-Δεκέμβριος-1994" ""
2014 790 0 0 -.08345628 14.939968 "12-Δεκέμβριος-1994" ""
2015 790 0 0 .011874527 14.819736 "12/12/1994" ""
2016 790 0 0 -.07755072 14.74357 "12/12/1994" ""
2017 790 1 0 .03270987 14.832586 "12/12/1994" ""
2018 790 0 0 -.29000336 14.64794 "12/12/1994" ""
2019 790 0 0 .10894744 15.357953 "12/12/1994" ""
2018 971 1 .1795453 .015156684 23.41389 "2/8/1950" ""
2019 971 1 0 -.05952055 23.32465 "2/8/1950" ""
2018 973 1 .3649029 .05821802 19.71245 "15/5/1930" ""
2019 973 1 .3017654 .10713505 19.79473 "15/5/1930" ""
2000 1394 1 -3.0845354 -.02237191 19.64642 "01-Ιανουάριος-1950" ""
2001 1394 1 .5732579 .09029305 19.81431 "01-Ιανουάριος-1950" ""
2002 1394 1 .3335664 .13597265 19.86958 "01-Ιανουάριος-1950" ""
2003 1394 1 .33435085 .1652293 19.947145 "01-Ιανουάριος-1950" ""
2004 1394 1 .3377546 .2320663 20.03664 "01-Ιανουάριος-1950" ""
2005 1394 1 .3559843 .1945449 20.109186 "01-Ιανουάριος-1950" ""
2006 1394 1 .1991756 .12997176 20.671274 "01-Ιανουάριος-1950" ""
2007 1394 1 .269208 .10655569 20.66763 "01-Ιανουάριος-1950" ""
2008 1394 1 .23378377 .11009266 20.668783 "01-Ιανουάριος-1950" ""
2009 1394 1 .2185519 .10465521 20.6744 "01-Ιανουάριος-1950" ""
2010 1394 1 .2416431 .05687732 20.699545 "01-Ιανουάριος-1950" ""
2011 1394 1 0 -.022519477 20.60041 "01-Ιανουάριος-1950" ""
2012 1394 1 0 -.05756895 20.5199 "01-Ιανουάριος-1950" ""
2013 1394 1 0 -.13632222 20.29363 "01-Ιανουάριος-1950" ""
2014 1394 1 0 -.2751217 20.10758 "01-Ιανουάριος-1950"

...

dtable-frequencies only?

Madison Avila — Sat, 27 Jul 2024 07:01:02 GMT

I have a silly question. How do I get a table with only frequencies? I don't want the percentages in the parentheses.

code example:

webuse nlswork.dta, clear

codebook race union year

tab year

tab year union if union == 1

keep if year > 86

label define yesno 0 "no" 1 "yes"
label values union yesno

dtable i.race i.union , by(year, nototals)

/*

--------------------------------------
Interview year
87 88
--------------------------------------
N 2,164 (48.8%) 2,272 (51.2%)
Race
White 1,572 (72.6%) 1,657 (72.9%)
Black 567 (26.2%) 589 (25.9%)
Other 25 (1.2%) 26 (1.1%)
1 if union
no 1,679 (77.7%) 1,423 (75.4%)
yes 482 (22.3%) 465 (24.6%)
--------------------------------------
*/

Thank you!

Variable omitted in xtgls regression but not in reg regression

jessica cheng — Sat, 27 Jul 2024 04:04:05 GMT

I am trying to figure out why there is one variable omitted in xtgls regression but not in reg regression.

any advice is highly appreciated!

Code:

. xtgls lnsale_w lnit_stock_w lncogs_w lnsga2_w i.fyear i.sic_2 , force i(gvkey) t(fyear) p(h) c(p)
(note: 83 observations dropped because only 1 obs in group)

Cross-sectional time-series FGLS regression

Coefficients:  generalized least squares
Panels:        heteroskedastic
Correlation:   panel-specific AR(1)

Estimated covariances      =       864          Number of obs     =      6,140
Estimated autocorrelations =       864          Number of groups  =        864
Estimated coefficients     =        28          Obs per group:
                                                              min =          2
                                                              avg =   7.106481
                                                              max =          9
                                                Wald chi2(28)     =   6.65e+15
                                                Prob > chi2       =     0.0000

------------------------------------------------------------------------------
    lnsale_w | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
lnit_stock_w |  -.0627259   .0006983   -89.83   0.000    -.0640946   -.0613573
    lncogs_w |    1.07105   .0002252  4755.05   0.000     1.070608    1.071491
    lnsga2_w |          0  (omitted)
             |
       fyear |
       2011  |  -.0087647   .0004499   -19.48   0.000    -.0096465   -.0078829
       2012  |   .0222538   .0009953    22.36   0.000     .0203031    .0242046
       2013  |   .0547922   .0012749    42.98   0.000     .0522934     .057291
       2014  |   .0570109   .0015397    37.03   0.000     .0539932    .0600286
       2015  |   .0754605   .0017639    42.78   0.000     .0720033    .0789178
       2016  |   .1378157   .0021297    64.71   0.000     .1336416    .1419899
       2017  |   .1647435    .002498    65.95   0.000     .1598475    .1696395
       2018  |   .1741515   .0027955    62.30   0.000     .1686724    .1796305
             |
       sic_2 |
         21  |   .3397569   .0382942     8.87   0.000     .2647017    .4148122
         22  |   .1858502   .0120713    15.40   0.000      .162191    .2095094
         23  |   .2686389   .0055983    47.99   0.000     .2576664    .2796113
         24  |   .2066785   .0100844    20.49   0.000     .1869134    .2264436
         25  |   .3223446   .0083365    38.67   0.000     .3060054    .3386838
         26  |  -.1115616   .0094189   -11.84   0.000    -.1300223    -.093101
         27  |   .0898889   .0143198     6.28   0.000     .0618227    .1179552
         28  |   1.941361   .0046328   419.04   0.000      1.93228    1.950441
         29  |  -.1352792   .0373032    -3.63   0.000    -.2083921   -.0621664
         30  |   .0343942   .0064241     5.35   0.000     .0218031    .0469853
         31  |   .1707927   .0312966     5.46   0.000     .1094525    .2321329
         32  |   .0448271   .0116934     3.83   0.000     .0219085    .0677457
         33  |  -.1197643    .008942   -13.39   0.000    -.1372904   -.1022383
         34  |   .1057662   .0071131    14.87   0.000     .0918247    .1197077
         35  |   .1533331   .0038878    39.44   0.000     .1457132     .160953
         36  |   .1390721   .0029855    46.58   0.000     .1332206    .1449236
         37  |  -.0769828   .0050326   -15.30   0.000    -.0868465    -.067119
         38  |   .3605061   .0049591    72.70   0.000     .3507864    .3702258
         39  |          0  (omitted)
             |
       _cons |          0  (omitted)
------------------------------------------------------------------------------

. reg lnsale_w lnit_stock_w lncogs_w lnsga2_w i.fyear i.sic_2

      Source |       SS           df       MS      Number of obs   =     6,223
-------------+----------------------------------   F(30, 6192)     =  12475.16
       Model |  19514.6432        30  650.488105   Prob > F        =    0.0000
    Residual |  322.867399     6,192  .052142668   R-squared       =    0.9837
-------------+----------------------------------   Adj R-squared   =    0.9836
       Total |  19837.5106     6,222  3.18828521   Root MSE        =    .22835

------------------------------------------------------------------------------
    lnsale_w | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
lnit_stock_w |   .0381838   .0038393     9.95   0.000     .0306574    .0457102
    lncogs_w |   .6876295   .0038201   180.00   0.000     .6801409    .6951182
    lnsga2_w |   .3107006   .0031863    97.51   0.000     .3044543    .3169468
             |
       fyear |
       2011  |  -.0054058   .0119152    -0.45   0.650    -.0287637     .017952
       2012  |  -.0047566   .0119508    -0.40   0.691    -.0281844    .0186711
       2013  |  -.0075589   .0119924    -0.63   0.529    -.0310681    .0159503
       2014  |  -.0067479   .0121654    -0.55   0.579    -.0305964    .0171005
       2015  |  -.0153265   .0125101    -1.23   0.221    -.0398506    .0091976
       2016  |   .0015709   .0139521     0.11   0.910      -.02578    .0289218
       2017  |   .0239516   .0150229     1.59   0.111    -.0054985    .0534017
       2018  |   .0184825   .0155345     1.19   0.234    -.0119706    .0489355
             |
       sic_2 |
         21  |   .1991205   .0513183     3.88   0.000     .0985188    .2997221
         22  |  -.0301966   .0301257    -1.00   0.316    -.0892534    .0288603
         23  |  -.1131618   .0229568    -4.93   0.000     -.158165   -.0681585
         24  |   .0122045   .0245732     0.50   0.619    -.0359675    .0603766
         25  |  -.0908516   .0227095    -4.00   0.000    -.1353701   -.0463331
         26  |   .0453891    .022114     2.05   0.040     .0020379    .0887403
         27  |   .0509572   .0260054     1.96   0.050    -.0000225    .1019369
         28  |   .1912057   .0147924    12.93   0.000     .1622075    .2202039
         29  |    .251608   .0290935     8.65   0.000     .1945746    .3086414
         30  |  -.0348874   .0229905    -1.52   0.129    -.0799566    .0101819
         31  |  -.0611215   .0314201    -1.95   0.052    -.1227158    .0004728
         32  |   .0364904   .0278588     1.31   0.190    -.0181224    .0911033
         33  |   .0872981    .020296     4.30   0.000     .0475109    .1270854
         34  |  -.0171009   .0189186    -0.90   0.366    -.0541879    .0199861
         35  |  -.0638843    .014574    -4.38   0.000    -.0924544   -.0353142
         36  |  -.0396236   .0145074    -2.73   0.006    -.0680632    -.011184
         37  |   .0314182   .0162489     1.93   0.053    -.0004353    .0632717
         38  |   .0056201   .0153525     0.37   0.714    -.0244763    .0357164
         39  |  -.0488434   .0248894    -1.96   0.050    -.0976352   -.0000515
             |
       _cons |   .7995171   .0223239    35.81   0.000     .7557544    .8432797
------------------------------------------------------------------------------

2-step weighting matrix vs 3-step weighting matrix from xtdpdgmm command

Abraham Risyad — Sat, 27 Jul 2024 02:51:51 GMT

Dear Statalist,

Currently, I am researching the impact of branding on the return of capital employed (ROCE). I suspect a dynamic nature in my unbalanced panel data and an endogeneity issue in the branding variable. Therefore, I use a two-step system GMM for my analysis. I used the xtdpdgmm command to run my regression, and the post-estimation result showed two different Hansen test results that confused me. The number of moment conditions from my regression result is 98, and I suspect this number also indicates the number of instruments used. This number is still below the number of banks in my dataset, which is 114 banks. The Hansen test result for 2-step moment functions with 2-step weighting matrix is insignificant. However, if I use the Hansen test result with 3-step weighting matrix, it is significant.

Can anyone please enlighten me on the difference between the 2-step and 3-step weighting matrix from the xtdpdgmm post-estimation result? Also, for my case here, should I rely on the result from the 2-step or 3-step weighting matrix, or should I rely simultaneously on both of them?

Code:

. xtdpdgmm ROCE L_ROCE $control, gmm(L_ROCE, lag(2 3)) gmm(DB_Branding, lag(1 2)) two vce(r) teffect
note: 254.Date omitted because of collinearity.

Generalized method of moments estimation

Fitting full model:
Step 1         f(b) =   77.04881
Step 2         f(b) =  .53330455

Group variable: Sandi                        Number of obs         =      2615
Time variable: Date                          Number of groups      =       114

Moment conditions:     linear =      98      Obs per group:    min =         4
                    nonlinear =       0                        avg =   22.9386
                        total =      98                        max =        24

                                (Std. err. adjusted for 114 clusters in Sandi)
------------------------------------------------------------------------------
             |              WC-Robust
        ROCE | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
      L_ROCE |   .4208216   .1976822     2.13   0.033     .0333715    .8082716
 DB_Branding |  -11.13101   3.905712    -2.85   0.004    -18.78607   -3.475956
        Size |  -.0309021    1.78705    -0.02   0.986    -3.533456    3.471651
         CAR |  -.0035476   .0093652    -0.38   0.705    -.0219029    .0148078
         NPL |   -2.08351   .8617056    -2.42   0.016    -3.772422   -.3945978
         LDR |  -.0022799   .0265356    -0.09   0.932    -.0542888    .0497289
         NIM |   1.385492    .510181     2.72   0.007      .385556    2.385429
Business_Mix |  -.1251238   .1944438    -0.64   0.520    -.5062266     .255979
   Inflation |   2.219998   .9313944     2.38   0.017     .3944988    4.045498
  GDP_Growth |  -.2106766   .1346259    -1.56   0.118    -.4745386    .0531854
             |
        Date |
        233  |  -1.350172   .9389638    -1.44   0.150    -3.190508    .4901628
        234  |  -.4392329   1.004881    -0.44   0.662    -2.408763    1.530297
        235  |  -2.500202   1.082535    -2.31   0.021    -4.621932   -.3784727
        236  |   .2346466   1.073901     0.22   0.827     -1.87016    2.339453
        237  |  -2.125878   1.044461    -2.04   0.042    -4.172983   -.0787731
        238  |  -1.862617   .8534568    -2.18   0.029    -3.535361   -.1898722
        239  |  -.9545664   .5586313    -1.71   0.087    -2.049464    .1403309
        240  |   .1372608   1.360941     0.10   0.920    -2.530135    2.804657
        241  |          0  (empty)
        242  |   1.213023   .6496135     1.87   0.062    -.0601956    2.486243
        243  |   .3086538   .7763399     0.40   0.691    -1.212944    1.830252
        244  |   2.849401   .9740243     2.93   0.003      .940348    4.758453
        245  |    3.97898   1.916145     2.08   0.038     .2234046    7.734556
        246  |    2.64422   .9373806     2.82   0.005     .8069881    4.481453
        247  |   1.275293   1.017346     1.25   0.210    -.7186695    3.269255
        248  |   1.940281   .9799561     1.98   0.048     .0196022    3.860959
        249  |  -3.595698   1.815245    -1.98   0.048    -7.153512   -.0378827
        250  |  -7.201618   3.259837    -2.21   0.027    -13.59078   -.8124556
        251  |  -7.300011   2.911381    -2.51   0.012    -13.00621   -1.593809
        252  |  -4.884409   2.438686    -2.00   0.045    -9.664147   -.1046724
        253  |  -2.552789   1.104793    -2.31   0.021    -4.718143   -.3874359
        254  |          0  (empty)
        255  |  -1.380236    .455906    -3.03   0.002    -2.273795   -.4866764
             |
       _cons |   .8589907   20.95298     0.04   0.967     -40.2081    41.92608
------------------------------------------------------------------------------
Instruments corresponding to the linear moment conditions:
 1, model(level):
   234:L2.L_ROCE 235:L2.L_ROCE 236:L2.L_ROCE 237:L2.L_ROCE 238:L2.L_ROCE
   239:L2.L_ROCE 240:L2.L_ROCE 241:L2.L_ROCE 242:L2.L_ROCE 243:L2.L_ROCE
   244:L2.L_ROCE 245:L2.L_ROCE 246:L2.L_ROCE 247:L2.L_ROCE 248:L2.L_ROCE
   249:L2.L_ROCE 250:L2.L_ROCE 251:L2.L_ROCE 252:L2.L_ROCE 253:L2.L_ROCE
   254:L2.L_ROCE 255:L2.L_ROCE 235:L3.L_ROCE 236:L3.L_ROCE 237:L3.L_ROCE
   238:L3.L_ROCE 239:L3.L_ROCE 240:L3.L_ROCE 241:L3.L_ROCE 242:L3.L_ROCE
   243:L3.L_ROCE 244:L3.L_ROCE 245:L3.L_ROCE 246:L3.L_ROCE 247:L3.L_ROCE
   248:L3.L_ROCE 249:L3.L_ROCE 250:L3.L_ROCE 251:L3.L_ROCE 252:L3.L_ROCE
   253:L3.L_ROCE 254:L3.L_ROCE 255:L3.L_ROCE
 2, model(level):
   233:L1.DB_Branding 234:L1.DB_Branding 235:L1.DB_Branding 236:L1.DB_Branding
   237:L1.DB_Branding 238:L1.DB_Branding 239:L1.DB_Branding 240:L1.DB_Branding
   241:L1.DB_Branding 242:L1.DB_Branding 243:L1.DB_Branding 244:L1.DB_Branding
   245:L1.DB_Branding 246:L1.DB_Branding 247:L1.DB_Branding 248:L1.DB_Branding
   249:L1.DB_Branding 250:L1.DB_Branding 251:L1.DB_Branding 252:L1.DB_Branding
   253:L1.DB_Branding 254:L1.DB_Branding 255:L1.DB_Branding 242:L2.DB_Branding
   243:L2.DB_Branding 245:L2.DB_Branding 246:L2.DB_Branding 247:L2.DB_Branding
   248:L2.DB_Branding 253:L2.DB_Branding 254:L2.DB_Branding
 3, model(level):
   233bn.Date 234.Date 235.Date 236.Date 237.Date 238.Date 239.Date 240.Date
   241.Date 242.Date 243.Date 244.Date 245.Date 246.Date 247.Date 248.Date
   249.Date 250.Date 251.Date 252.Date 253.Date 254.Date 255.Date
 4, model(level):
   _cons

. estat overid

Sargan-Hansen test of the overidentifying restrictions
H0: overidentifying restrictions are valid

2-step moment functions, 2-step weighting matrix       chi2(66)    =   60.7967
                                                       Prob > chi2 =    0.6580

2-step moment functions, 3-step weighting matrix       chi2(66)    =   94.3566
                                                       Prob > chi2 =    0.0126

. estat serial

Arellano-Bond test for autocorrelation of the first-differenced residuals
H0: no autocorrelation of order 1      z =   -3.2123   Prob > |z|  =    0.0013
H0: no autocorrelation of order 2      z =   -0.1084   Prob > |z|  =    0.9137

Sincerely,
Abraham

DID model for non-panel data

Brahim KHOUILED — Sat, 27 Jul 2024 00:01:21 GMT

Hi everyone,

I want to apply the DID model to non-panel data, given the following issue:
What is the contribution of agricultural policy to achieving food security in the country?

I have a dependent variable representing food security y_t,
and independent food variables x_ti,
and other non-food variables z_ti that are not affected by agricultural policies.

I encountered a problem in structuring the data, which does not include cross-section. On the other hand, I have the policy and a set of variables affected by the policy and a second set that is not affected by it.

Please help, and thank you in advance.

Shift in sign

Serena Menny — Fri, 26 Jul 2024 19:18:21 GMT

Dear members,

This table presents the marginal effects derived from probit regressions, where the dependent variable, sustainable_investment, is a binary indicator set to 1 if the investor holds a sustainable fund, and 0 otherwise. The key explanatory variables are defined as follows:

Sustainable_concerns: An average score derived from three items, each measured on a 7-point Likert scale, reflecting the level of concern individuals have regarding sustainability issues.
Sustainable_behavior: An average score derived from two items, each measured on a 7-point Likert scale, indicating the extent to which individuals engage in recycling and other environmentally protective behaviors.
Sustainable_vote: A single item measured on a 7-point Likert scale, assessing whether individuals vote for political parties with sustainable programs.

What could possibly explain the shift in the sign for sustainable_concerns?
Array

Array

The three variables—sustainable_concerns, sustainable_behavior, and sustainable_vote—reflect general attitudes towards sustainability issues and are not directly related to sustainable funds. Additionally, three other variables specifically related to sustainable funds are included, each measured on a 7-point Likert scale (sustainable_attitude, confidence, and impact).

After including the new variables related to sustainable funds, sustainable_concerns becomes significant while sustainable_behavior loses its significance.

xtdidregress suppress group FEs from output?

Alex Weckenman — Fri, 26 Jul 2024 17:49:56 GMT

Hello,

I am using -xtdidregress- and noticed that even when I include the option -aequations-, I do not see the output of the group FEs. However, I do believe the model is run including those FEs because when I try to recreate the DID model using the regular -reg- command, I get the same results as -xtdidregress- when I include group FEs. Can someone confirm that this is indeed the case (that aequations simply does not display the group FEs) and is there a reason for this choice in behavior?

Thanks,
Alex

Mediation in panel FE Poisson model

Daniela Kaiser — Fri, 26 Jul 2024 17:27:01 GMT

Hello,

I´m working with a panel dataset, where each row corresponds to a county-week. I am modeling a fixed effects Poisson regression, with a categorical independent variable (4 types of treatment - type of school instruction on each week) and a count outcome (count of reports by week):

xtset county week

xtpoisson num_rpts_county i.instruction_county, exposure(population) vce(robust) fe

I would like to incorporate a count mediator but I'm not sure if it's possible to adapt the mediate command (for a count outcome and a count mediator) to a panel dataset and a fixed effects model. I found this sem discussion on mediation in panel data with a continuous outcome but I'm not sure whether/how this can be adapted to a count outcome.

Any insight/advice on this issue would be greatly appreciated!

Sensitivity Analysis/Robustness checks

Padmavathi Bandaru — Fri, 26 Jul 2024 17:08:13 GMT

Hi,
Could please help me figure out how to do sensitivity analysis or robustness checks for this fixed effects regression.

Code:

reg tariff i.year##c.emissions i.year i.naics_4, vce(cluster naics_4)

I have dataset for 5 years for all variables per naics codes.

Thanking you
(Stata 18)

Need to combine variables

Euslaner — Fri, 26 Jul 2024 16:58:45 GMT

Sorry to ask for help on something that should be simple but I can't get it to work. I constructed the attached data set on roll call voting in the Senate. I
need to combine the variables yesvotes and novotes to obtain a measure of total voting behavior, But when I try to do so, I wind up with a total of 2 votes and the rest of the values are
missing, There must be a simple solution to the problem but I can't find it. Any help would be appreciated. Thanks,

Ric Uslaner

sdid and event-study plots

Lukas Lang — Fri, 26 Jul 2024 16:13:15 GMT

Hi,

First of all a big 'thank you' to the authors of the sdid command, it really is easy to use.

I have a query about the event-study plot that is described in the Stata paper introducing this new command by Clarke et al. (2023).

My problem is that in my application the estimated 'overall' ATT (the one in the printed output table) is 3 times higher than the values that I obtain when plotting the ATT by year using the event-study plot mentioned above.

What I have noticed is that while using controls makes a big difference in terms of the estimated 'overall' ATT, it does not make that much of a difference when producing the study event plot.

To make this point I will use the same application used in the Stata paper.

Let's start with the scenario that does not control for covariates:

Code:

webuse set www.damianclarke.net/stata/
webuse quota_example.dta, clear
egen m=min(year) if quota==1, by(country) //indicator for the year of adoption
egen mm=mean(m), by(country)
keep if mm==2002 | mm==. //keep only one time of adoption
drop if lngdp==.

sdid womparl country year quota, vce(noinference) graph g2_opt(ylab(-5(5)20) ///
        ytitle("Women in Parliament") scheme(sj))
matrix lambda = e(lambda)[1..12,1] //save lambda weight
matrix yco = e(series)[1..12,2] //control baseline
matrix ytr = e(series)[1..12,3] //treated baseline
matrix aux = lambda'*(ytr - yco) //calculate the pre-treatment mean
scalar meanpre_o = aux[1,1]
matrix difference = e(difference)[1..26,1..2] // Store Ytr-Yco
svmat difference
ren (difference1 difference2) (time d)
replace d = d - meanpre_o // Calculate vector in (8)

gen y=time>=2002 & time!=.
bys y: egen d_mean=mean(d)
sort time d
preserve
keep if time==2002
di d_mean
restore

In this case the estimated 'overall' ATT (6.85377) is essentially identical to the average of the ATTs by year (6.8537698)

However, in the scenario with controls:

Code:

webuse set www.damianclarke.net/stata/
webuse quota_example.dta, clear
egen m=min(year) if quota==1, by(country) //indicator for the year of adoption
egen mm=mean(m), by(country)
keep if mm==2002 | mm==. //keep only one time of adoption
drop if lngdp==.

sdid womparl country year quota, vce(noinference) graph g2_opt(ylab(-5(5)20) ///
        ytitle("Women in Parliament") scheme(sj)) ///
        covariates(lngdp lnmmrt, projected)
matrix lambda = e(lambda)[1..12,1] //save lambda weight
matrix yco = e(series)[1..12,2] //control baseline
matrix ytr = e(series)[1..12,3] //treated baseline
matrix aux = lambda'*(ytr - yco) //calculate the pre-treatment mean
scalar meanpre_o = aux[1,1]
matrix difference = e(difference)[1..26,1..2] // Store Ytr-Yco
svmat difference
ren (difference1 difference2) (time d)
replace d = d - meanpre_o // Calculate vector in (8)

gen y=time>=2002 & time!=.
bys y: egen d_mean=mean(d)
sort time d
preserve
keep if time==2002
di d_mean
restore

The estimated 'overall' ATT is bigger (7.11653) than the average of the ATTs by year (6.8687816) which is almost as the same level as the scenario without covariates (6.8537698).

In my application this issue is exacerbated, i.e. the average of the ATTs by year is a lot lower than the estimated 'overall' ATT.

Am I misunderstanding anything?

Should we actually expect the estimated 'overall' ATT to look similar to the average of the ATTs by year?

Perhaps Damian Clarke can help with this?

Many thanks,

Lukas

Heckman, Mill’s ratio, selection problem correction, selection bias

Chiara Tasselli — Fri, 26 Jul 2024 14:56:27 GMT

Good morning to everyone,
I have a sample of workers from several Italian companies, and I would like to make it representative of all national workers. I am currently standaridzying some variables (such as job classification, gender, contract type, etc.) in order to append my DB on the national one run a probit model that would provide the Mills ratio.

I have seen this approach commonly used with the probability of employment. Do you think it would be feasible to use the probability of being included in my sample as the dichotomous outcome instead? However, this would mean that no information in the national database would have that dummy variable equal to 1.

Alternatively, do you have any suggestions for constructing weights?

Thank you in advance for your help!

Moran's I on more than 800 observations: Mata

Alessandra Quintigliano — Fri, 26 Jul 2024 11:19:48 GMT

Hi all,

I was advised to perform the following tests on a spatial regression:

spwmatrix gecon latitude longitude, wname(weightsnames) wtype(inv) alpha(2) dband(0 50) eignvar(eignvar) row replace
spatgsa var, w(weightsnames) moran
reg var vars
spatdiag, weights(weightsnames)

Yet, my dataset is made of over 250,000 observations and the commands incur into the limit of 800 rows or columns for matrices in Stata.

Specifically, when I try to create the matrix with all the observations, it says:
J(): 3900 unable to allocate real [257224,257224]
spwmatrix_CalcSPweightM(): - function returned error
: - function returned error

When I do it with less observations, for example 10,000, it does creates the matrix, but then, when running spatgsa, it returns another error saying:

unable to allocate matrix;
You have attempted to create a matrix with too many rows or columns or attempted to fit a model with too many variables.

You are using Stata/BE which supports matrices with up to 800 rows or columns. See limits for how many more rows and columns Stata/SE and
Stata/MP can support.

If you are using factor variables and included an interaction that has lots of missing cells, try set emptycells drop to reduce the
required matrix size; see help set emptycells.

If you are using factor variables, you might have accidentally treated a continuous variable as a categorical, resulting in lots of
categories. Use the c. operator on such variables.

I thought I could try to create it manually in Mata and tried to set up the matrix:

mata:
X = st_data(., "X")
Y = st_data(., "Y")
n = rows(X)
W = J(n, n, 0)
end

But it gave the error:
J(): 3900 unable to allocate real [257224,257224]
: - function returned error

It doesn't give the same error when I reduce the number of observations, for example to 50,000.

Does anyone have ideas on how to perform these tests on my dataset?

Any help will be deeply appreciated!

Splitting the dataset into groups of 799 observations based on a variable

Alessandra Quintigliano — Fri, 26 Jul 2024 11:05:10 GMT

Hi all,

I have a dataset of 250,000 observations and there is a variable called countrynum providing a numeric code for the country.

I need to split the observations into groups of less than 800, because I then need to apply a command that only runs on less than 800 observations at a time. Whenever a country has less than 800 observations, I'm fine using countrynum as identifier: I temporarily keep if contrynum = i and run the command on that subset. However, certain countries have far more than 800 observations.

I would like to create an identifier that assigns a unique value to each subset of less than 800 observations (be it a whole country or a partition of it).In that way, I can keep if identifier = i and run the command on each subset separately.

For the whole-country part, I simply copy countrynum into a variable called identifier: gen identifier = 0, bysort countrynum: egen freq = count(countrynum), and replace identifier = countrynum if freq < 800 .

For the parition part, I would like to split the observations of each country in groups of less than 800, and assign a unique value in "identifier" to each subset.

Does anyone have ideas?

Thank you very much.