Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Dear Professor Santos Silva, Thank you very much for your reply and sharing the paper. I went through through the paper. I do not have staggered treatment. I am sharing the code and output for boottest with poisson. Thank you once again.
    Code:
    poisson Total_workers did i.state_cd i.year i.NIC_2008 [pweight=mult], vce(cluster NIC_2008)
    boottest did, cluster(NIC_2008) reps(9999) nograph
    Code:
     
    . boottest did, cluster(NIC_2008) reps(9999) nograph
    
    Overriding estimator's cluster/robust settings with cluster(NIC_2008)
    
    Re-running regression with null imposed.
    
    
    Iteration 0:   log pseudolikelihood = -6778465.9  
    Iteration 1:   log pseudolikelihood = -6356891.3  
    Iteration 2:   log pseudolikelihood = -6350690.3  
    Iteration 3:   log pseudolikelihood = -6350685.8  
    Iteration 4:   log pseudolikelihood = -6350685.8  
    
    Poisson regression                                    Number of obs =   14,709
                                                          Wald chi2(23) = 2.21e+07
    Log pseudolikelihood = -6350685.8                     Prob > chi2   =   0.0000
    
     ( 1)  [Total_workers]did = 0
                                             (Std. err. adjusted for 24 clusters in NIC_2008)
    -----------------------------------------------------------------------------------------
                            |               Robust
              Total_workers | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
    ------------------------+----------------------------------------------------------------
                        did |          0  (omitted)
                            |
       .......
    -----------------------------------------------------------------------------------------
    
    Score bootstrap-t, null imposed, 9999 replications, Wald test, clustering by NIC_2008, bootstr
    > ap clustering by NIC_2008, Rademacher weights:
      did
    
                                   z =    -1.6532
                            Prob>|z| =     0.1397

    Comment


    • #17
      Dear Chinmay Korgaonkar,

      Apologies for the late reply. I do not have experience with the boottest command, but from a quick look at the help file all looks fine. Note, however, that you have a very small number of clusters.

      Best wishes,

      Joao

      Comment


      • #18
        Dear Professor Santos Silva, Thank you very much for your reply. I apologize for the delayed response. Yes, as you noted, I have small number of clusters. So, I use bottest. Thank you once again.

        Comment


        • #19
          Hello everyone,
          I have a question regarding the use of Poisson PML for highly right skewed data. My dependent variable, total number of workers is highly right skewed.
          See
          Code:
          . summarize Total_workers, detail
          
                             5 AvgNoofPersonsWorked
          -------------------------------------------------------------
                Percentiles      Smallest
           1%            2              0
           5%            5              0
          10%            9              0       Obs              14,709
          25%           30              0       Sum of wgt.      14,709
          
          50%          109                      Mean           315.5896
                                  Largest       Std. dev.      975.1504
          75%          283          17085
          90%          700          17228       Variance       950918.3
          95%         1221          40441       Skewness       28.85828
          99%         3187          65463       Kurtosis       1590.338
          I am using Poisson PML, because as I understand it can handle zeros, overdispersion and is suitable for a multiplicative common trends assumption. The variable has 0.24% of valid zeros, which I do not want to drop by using logarithmic transformation. See the Poisson PML results

          Code:
          . poisson Total_workers did i.state_cd i.year i.NIC_2008 [pweight=mult], vce(cluster NIC_2008)
          
          Iteration 0:   log pseudolikelihood = -6735054.2  
          Iteration 1:   log pseudolikelihood = -6354565.2  
          Iteration 2:   log pseudolikelihood =   -6349369  
          Iteration 3:   log pseudolikelihood = -6349354.9  
          Iteration 4:   log pseudolikelihood = -6349354.9  
          
          Poisson regression                                      Number of obs = 14,709
                                                                  Wald chi2(22) =      .
          Log pseudolikelihood = -6349354.9                       Prob > chi2   =      .
          
                                                   (Std. err. adjusted for 24 clusters in NIC_2008)
          -----------------------------------------------------------------------------------------
                                  |               Robust
                    Total_workers | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
          ------------------------+----------------------------------------------------------------
                              did |  -.0911423   .0947337    -0.96   0.336     -.276817    .0945324
          While playing around with data, I tried logarithmic transformation. See

          Code:
           generate ln_Total_workers= log(Total_workers)
          (83 missing values generated)
          
          . reg ln_Total_workers did i.state_cd i.year i.NIC_2008 [pweight=mult], vce(cluster NIC_2008)
          (sum of wgt is 38,677.76633269)
          
          Linear regression                               Number of obs     =     14,686
                                                          F(22, 23)         =          .
                                                          Prob > F          =          .
                                                          R-squared         =     0.2557
                                                          Root MSE          =     1.3351
          
                                                   (Std. err. adjusted for 24 clusters in NIC_2008)
          -----------------------------------------------------------------------------------------
                                  |               Robust
                 ln_Total_workers | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
          ------------------------+----------------------------------------------------------------
                              did |   .0528328    .065816     0.80   0.430    -.0833179    .1889835
          I am curious why the estimated coefficients flip sign in two regressions. Thank you very much!
          Last edited by Chinmay Korgaonkar; 27 Mar 2026, 13:45.

          Comment


          • #20
            Dear Chinmay Korgaonkar,

            In both cases, the coefficient is essentially zero, so the sign flipping is not surprising.

            Best wishes,

            Joao

            Comment


            • #21
              Dear Professor Joao Santos Silva, Thank you very much for your reply.

              In another data, I have another highly right skewed dependent variable, firm investment with 46% zeroes.
              I use Poisson PML with raw levels and then with winsorized levels at the 99th and 95th percentiles. Statistical significance changes with winsorization, although magnitudes are comparable. I am not sure why. Also, as I understand, winsorization choices can be arbitrary.

              Thank you very much.

              Code:
               summarize investment, detail
              
                                       investment
              -------------------------------------------------------------
                    Percentiles      Smallest
               1%            0              0
               5%            0              0
              10%            0              0       Obs              82,941
              25%            0              0       Sum of wgt.      82,941
              
              50%            0                      Mean           9.45e+07
                                      Largest       Std. dev.      2.41e+09
              75%      1829428       1.41e+11
              90%     3.32e+07       1.42e+11       Variance       5.83e+18
              95%     1.16e+08       3.18e+11       Skewness       89.91193
              99%     9.85e+08       3.72e+11       Kurtosis       11229.19
              . * raw levels
              Code:
              . ppmlhdfe investment  did, absorb(firm_id fiscal_year) vce(cluster firm_id)
              (dropped 15085 observations that are either singletons or separated by a fixed effect)
              Iteration 1:   deviance = 2.6635e+13  eps = .         iters = 6    tol = 1.0e-04  min(eta) =  
              > -5.33  P   
              Iteration 2:   deviance = 1.4790e+13  eps = 8.01e-01  iters = 4    tol = 1.0e-04  min(eta) =  
              > -6.64      
              Iteration 3:   deviance = 1.2333e+13  eps = 1.99e-01  iters = 4    tol = 1.0e-04  min(eta) =  
              > -8.11      
              Iteration 4:   deviance = 1.1997e+13  eps = 2.80e-02  iters = 4    tol = 1.0e-04  min(eta) =  
              > -9.41      
              Iteration 5:   deviance = 1.1967e+13  eps = 2.53e-03  iters = 4    tol = 1.0e-04  min(eta) = -
              > 10.45      
              Iteration 6:   deviance = 1.1962e+13  eps = 3.84e-04  iters = 3    tol = 1.0e-04  min(eta) = -
              > 11.38      
              Iteration 7:   deviance = 1.1962e+13  eps = 7.33e-05  iters = 2    tol = 1.0e-04  min(eta) = -
              > 12.24      
              Iteration 8:   deviance = 1.1961e+13  eps = 1.13e-05  iters = 2    tol = 1.0e-05  min(eta) = -
              > 13.06      
              Iteration 9:   deviance = 1.1961e+13  eps = 1.12e-06  iters = 2    tol = 1.0e-05  min(eta) = -
              > 13.65   S  
              Iteration 10:  deviance = 1.1961e+13  eps = 4.34e-08  iters = 2    tol = 1.0e-06  min(eta) = -
              > 13.91   S  
              Iteration 11:  deviance = 1.1961e+13  eps = 2.37e-10  iters = 2    tol = 1.0e-07  min(eta) = -
              > 13.95   S  
              Iteration 12:  deviance = 1.1961e+13  eps = 3.12e-14  iters = 2    tol = 1.0e-09  min(eta) = -
              > 13.96   S O
              ----------------------------------------------------------------------------------------------
              > --------------
              (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance
              > )
              Converged in 12 iterations and 37 HDFE sub-iterations (tol = 1.0e-08)
              
              HDFE PPML regression                              No. of obs      =     67,856
              Absorbing 2 HDFE groups                           Residual df     =     10,824
              Statistics robust to heteroskedasticity           Wald chi2(1)    =       1.74
              Deviance             =  1.19614e+13               Prob > chi2     =     0.1877
              Log pseudolikelihood = -5.98068e+12               Pseudo R2       =     0.8304
              
              Number of clusters (firm_id)=     10,825
                                         (Std. err. adjusted for 10,825 clusters in firm_id)
              ------------------------------------------------------------------------------
                           |               Robust
                investment | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
              -------------+----------------------------------------------------------------
                       did |  -.5699234   .4325998    -1.32   0.188    -1.417803    .2779566
                     _cons |   22.58967   .1862943   121.26   0.000     22.22454     22.9548
              ------------------------------------------------------------------------------
              
              Absorbed degrees of freedom:
              -----------------------------------------------------+
               Absorbed FE | Categories  - Redundant  = Num. Coefs |
              -------------+---------------------------------------|
                   firm_id |     10825       10825           0    *|
               fiscal_year |        12           1          11     |
              -----------------------------------------------------+
              * = FE nested within cluster; treated as redundant for DoF computation

              * 99th percentile winsorization
              Code:
              .
              . generate investment_w99 = investment 
              (42,277 missing values generated)
              
              . winsor2 investment_w99, cuts(0 99) replace
              
              . ppmlhdfe investment_w99 did, absorb(firm_id fiscal_year) vce(cluster firm_id)
              (dropped 15085 observations that are either singletons or separated by a fixed effect)
              Iteration 1:   deviance = 5.3973e+12  eps = .         iters = 6    tol = 1.0e-04  min(eta) =  
              > -3.84  P   
              Iteration 2:   deviance = 3.9343e+12  eps = 3.72e-01  iters = 4    tol = 1.0e-04  min(eta) =  
              > -5.09      
              Iteration 3:   deviance = 3.7360e+12  eps = 5.31e-02  iters = 4    tol = 1.0e-04  min(eta) =  
              > -6.35      
              Iteration 4:   deviance = 3.7164e+12  eps = 5.27e-03  iters = 4    tol = 1.0e-04  min(eta) =  
              > -7.42      
              Iteration 5:   deviance = 3.7133e+12  eps = 8.59e-04  iters = 3    tol = 1.0e-04  min(eta) =  
              > -8.36      
              Iteration 6:   deviance = 3.7127e+12  eps = 1.58e-04  iters = 2    tol = 1.0e-04  min(eta) =  
              > -9.20      
              Iteration 7:   deviance = 3.7126e+12  eps = 2.26e-05  iters = 2    tol = 1.0e-04  min(eta) =  
              > -9.97      
              Iteration 8:   deviance = 3.7126e+12  eps = 1.92e-06  iters = 2    tol = 1.0e-05  min(eta) = -
              > 10.48   S  
              Iteration 9:   deviance = 3.7126e+12  eps = 5.30e-08  iters = 2    tol = 1.0e-06  min(eta) = -
              > 10.66   S  
              Iteration 10:  deviance = 3.7126e+12  eps = 1.47e-10  iters = 2    tol = 1.0e-07  min(eta) = -
              > 10.68   S  
              Iteration 11:  deviance = 3.7126e+12  eps = 5.33e-15  iters = 3    tol = 1.0e-09  min(eta) = -
              > 10.68   S O
              ----------------------------------------------------------------------------------------------
              > --------------
              (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance
              > )
              Converged in 11 iterations and 34 HDFE sub-iterations (tol = 1.0e-08)
              
              HDFE PPML regression                              No. of obs      =     67,856
              Absorbing 2 HDFE groups                           Residual df     =     10,824
              Statistics robust to heteroskedasticity           Wald chi2(1)    =      25.44
              Deviance             =  3.71258e+12               Prob > chi2     =     0.0000
              Log pseudolikelihood = -1.85629e+12               Pseudo R2       =     0.6482
              
              Number of clusters (firm_id)=     10,825
                                         (Std. err. adjusted for 10,825 clusters in firm_id)
              ------------------------------------------------------------------------------
                           |               Robust
              investmen~99 | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
              -------------+----------------------------------------------------------------
                       did |   .4544601   .0900964     5.04   0.000     .2778744    .6310458
                     _cons |   18.57466    .053766   345.47   0.000     18.46928    18.68004
              ------------------------------------------------------------------------------
              
              Absorbed degrees of freedom:
              -----------------------------------------------------+
               Absorbed FE | Categories  - Redundant  = Num. Coefs |
              -------------+---------------------------------------|
                   firm_id |     10825       10825           0    *|
               fiscal_year |        12           1          11     |
              -----------------------------------------------------+
              * = FE nested within cluster; treated as redundant for DoF computation

              . * 95th percentile winsorization
              Code:
              . 
              . generate investment_w95 = investment
              (42,277 missing values generated)
              
              . winsor2 investment_w95, cuts(0 95) replace
              
              .  ppmlhdfe investment_w95 did, absorb(firm_id fiscal_year) vce(cluster firm_id)
              (dropped 15085 observations that are either singletons or separated by a fixed effect)
              Iteration 1:   deviance = 1.6810e+12  eps = .         iters = 6    tol = 1.0e-04  min(eta) =  
              > -3.42  P   
              Iteration 2:   deviance = 1.4293e+12  eps = 1.76e-01  iters = 4    tol = 1.0e-04  min(eta) =  
              > -4.65      
              Iteration 3:   deviance = 1.4031e+12  eps = 1.87e-02  iters = 4    tol = 1.0e-04  min(eta) =  
              > -5.78      
              Iteration 4:   deviance = 1.3995e+12  eps = 2.57e-03  iters = 3    tol = 1.0e-04  min(eta) =  
              > -6.75      
              Iteration 5:   deviance = 1.3988e+12  eps = 4.68e-04  iters = 3    tol = 1.0e-04  min(eta) =  
              > -7.61      
              Iteration 6:   deviance = 1.3987e+12  eps = 6.86e-05  iters = 2    tol = 1.0e-04  min(eta) =  
              > -8.39      
              Iteration 7:   deviance = 1.3987e+12  eps = 6.09e-06  iters = 2    tol = 1.0e-05  min(eta) =  
              > -8.92      
              Iteration 8:   deviance = 1.3987e+12  eps = 1.86e-07  iters = 3    tol = 1.0e-06  min(eta) =  
              > -9.12   S  
              Iteration 9:   deviance = 1.3987e+12  eps = 6.26e-10  iters = 3    tol = 1.0e-07  min(eta) =  
              > -9.15   S  
              Iteration 10:  deviance = 1.3987e+12  eps = 3.30e-14  iters = 2    tol = 1.0e-08  min(eta) =  
              > -9.15   S  
              Iteration 11:  deviance = 1.3987e+12  eps = 0.00e+00  iters = 1    tol = 1.0e-09  min(eta) =  
              > -9.15   S O
              ----------------------------------------------------------------------------------------------
              > --------------
              (legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon below tolerance
              > )
              Converged in 11 iterations and 33 HDFE sub-iterations (tol = 1.0e-08)
              
              HDFE PPML regression                              No. of obs      =     67,856
              Absorbing 2 HDFE groups                           Residual df     =     10,824
              Statistics robust to heteroskedasticity           Wald chi2(1)    =      54.47
              Deviance             =  1.39874e+12               Prob > chi2     =     0.0000
              Log pseudolikelihood = -6.99371e+11               Pseudo R2       =     0.5256
              
              Number of clusters (firm_id)=     10,825
                                         (Std. err. adjusted for 10,825 clusters in firm_id)
              ------------------------------------------------------------------------------
                           |               Robust
              investmen~95 | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
              -------------+----------------------------------------------------------------
                       did |   .4212733   .0570778     7.38   0.000     .3094028    .5331438
                     _cons |   16.99295   .0357747   475.00   0.000     16.92284    17.06307
              ------------------------------------------------------------------------------
              
              Absorbed degrees of freedom:
              -----------------------------------------------------+
               Absorbed FE | Categories  - Redundant  = Num. Coefs |
              -------------+---------------------------------------|
                   firm_id |     10825       10825           0    *|
               fiscal_year |        12           1          11     |
              -----------------------------------------------------+
              * = FE nested within cluster; treated as redundant for DoF computation

              Comment


              • #22
                Dear Chinmay Korgaonkar,

                I don't think I have ever encountered a case where Winsorization is a sensible thing to do, so I would tend to ignore those results. However, you may try to understand the reason for the different results.

                Best wishes,

                Joao

                Comment


                • #23
                  Dear Professor Joao Santos Silva , Thank you very much for your reply. I apologize for my delayed response. I will avoid winsorization and examine why the results change. Thank you.

                  Comment

                  Working...
                  X