Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to run the regression if the frquency of a variable is higher than a specific number?

    Let say I have a dataset

    Code:
    SIC3        Freq.    Percent        Cum.
                            
        100        5    0.00        0.00
        101        278    0.11        0.11
        102        515    0.20        0.30
        103        444    0.17        0.47
        104        938    0.36        0.83
        106        6    0.00        0.83
        108        189    0.07        0.90
        109        547    0.21        1.11
    I want to run a normal OLS regress

    Code:
    reg y x
    I am wondering what I should add to run the regression if frequency of SIC3 is higher than 10. For example above, I will not run with SIC3=100 and 106

    Thanks in advance.

  • #2
    Phuc:
    the first line of the following code may be helpful (mutatis mutandis):
    Code:
    . use "https://www.stata-press.com/data/r16/nlswork.dta"
    (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
    
    . bysort idcode: egen wanted=count(year)
    
    . xtreg ln_wage i.msp if wanted>=10, fe vce(cluster idcode)
    
    Fixed-effects (within) regression               Number of obs     =     11,681
    Group variable: idcode                          Number of groups  =        982
    
    R-sq:                                           Obs per group:
         within  = 0.0027                                         min =         10
         between = 0.0002                                         avg =       11.9
         overall = 0.0011                                         max =         15
    
                                                    F(1,981)          =      11.16
    corr(u_i, Xb)  = -0.0217                        Prob > F          =     0.0009
    
                                   (Std. Err. adjusted for 982 clusters in idcode)
    ------------------------------------------------------------------------------
                 |               Robust
         ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
    -------------+----------------------------------------------------------------
           1.msp |   .0451795   .0135219     3.34   0.001     .0186443    .0717147
           _cons |   1.677765   .0083301   201.41   0.000     1.661419    1.694112
    -------------+----------------------------------------------------------------
         sigma_u |  .33266075
         sigma_e |  .31187398
             rho |   .5322173   (fraction of variance due to u_i)
    ------------------------------------------------------------------------------
    
    .
    Put differently:
    Code:
    bysort SIC3: egen wanted=count(SIC3)
    regress y x if wanted>10
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment

    Working...
    X