Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Lasso variables selection

    Dear all,


    I have an unbalanced panel, where I have a bunch of dummies, bunch of categorical variables, many control variables and some results of a quadratic relation on some indicators . In sort I am ending with about 200 predictors in the LHS, so I need a shrinkage LASSO proces to i would preferably like to keed the results of the quadratic and the main macro and dummies-categorical variables
    How am I supposed to do that?

    ​​​​​​
    The code for the panel regression I am running is the following but needs to be shrink
    Code:
      
     reghdfe growthgdp l.gdp cpi u Output dummy1 dummy2 c.indicator1##c.indicator1, absorb(time panelid) vce(cluster  panelid, dkraay(1))
    Sample of the data and part of the code are provided but maybe be different from the actual one.


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id ts) str1 group str97 country float(ggdp dummy1) byte(dummy2 length) float(indicator1 indicator2) long gdp float(cpi u) double Output byte count
    1 1990 "A" "Australia"           . 1 1 3  10.999176  10.999176   571564  7.333022  6.93 1.28877e+16 26
    1 1991 "A" "Australia"       -5720 0 1 2       12.2       12.2   565844  3.176675  9.58  1.3296e+15 27
    1 1992 "A" "Australia"       14434 0 1 2       12.2       12.2   580278 1.0122311 10.73 1.22544e+16 27
    1 1993 "A" "Australia"       22666 1 1 3  17.553352  17.553352   602944 1.7536534 10.87 1.18076e+16 28
    1 1994 "A" "Australia"       29533 0 1 3      19.11      19.11   632477 1.9696348  9.72 1.34421e+16 28
    1 2000 "A" "Australia"           . 0 1 3     28.634     28.634   798334  4.457435  6.28 1.39202e+16 30
    1 2001 "A" "Australia"       20810 1 1 3   26.41307   26.54496   819144 4.4071355  6.74 1.30234e+16 31
    1 2002 "A" "Australia"       33904 0 1 3   5.536341      6.908   853048 2.9815745  6.37 1.61436e+16 31
    1 2003 "A" "Australia"       23667 0 1 3   5.536341      6.908   876715  2.732596  5.93 2.02622e+16 31
    1 2004 "A" "Australia"       36212 1 1 3   7.350179   8.569935   912927 2.3432553  5.39 2.35621e+16 32
    1 2010 "A" "Australia"           . 1 1 3   6.241802   6.241802  1080050   2.91834  5.21 3.47466e+16 35
    1 2011 "A" "Australia"       30676 0 0 3      5.068      5.068  1110726   3.30385  5.08  4.0027e+15 35
    1 2012 "A" "Australia"       42237 0 0 3      5.068      5.068  1152963 1.7627802  5.22 4.13369e+16 35
    1 2013 "A" "Australia"       24591 1 1 3   7.929047   8.248285  1177554  2.449889  5.66 3.74135e+16 37
    1 2014 "A" "Australia"       30301 0 . .  15.081665     16.199  1207855  2.487923  6.08 3.47753e+16 37
    2 1990 "A" "France"              . 0 0 3        1.6        1.6  2081911 3.1942835  9.36 5.03813e+16 51
    2 1991 "A" "France"          21822 0 0 3        1.6        1.6  2103733  3.213407  9.13 4.93537e+16 52
    2 1992 "A" "France"          33646 0 1 1        1.6        1.6  2137379 2.3637605 10.21  5.2038e+15 53
    2 1993 "A" "France"         -13437 1 1 2   6.996447  -.8161401  2123942 2.1044629 11.32 5.80229e+16 54
    2 1994 "A" "France"          50090 0 0 2    8.69136     -1.575  2174032 1.6555153 12.59   5.071e+14 54
    2 2000 "A" "France"              . 0 0 5   2.364474      2.797  2564959   1.67596 10.22 7.85938e+16 57
    2 2001 "A" "France"          50881 0 0 5   2.364474      2.797  2615840 1.6347808  8.61 7.85391e+16 57
    2 2002 "A" "France"          29704 1 1 5  -.7668521  -.7921649  2645544 1.9234123   8.7 8.75676e+16 59
    2 2003 "A" "France"          21777 0 0 2 -2.9592714     -3.286  2667321  2.098472  8.31 1.03639e+17 59
    2 2004 "A" "France"          75479 0 0 2  -2.950783     -3.286  2742800 2.1420896  8.91  1.0654e+16 60
    2 2010 "A" "France"              . 0 0 3      3.891      3.891  2904699 1.5311227  8.87 9.63052e+16 64
    2 2011 "A" "France"          63691 0 0 1      3.891      3.891  2968390  2.111598  8.81 1.08307e+17 65
    2 2012 "A" "France"           9295 1 1 2 -2.5410414 -2.1623123  2977685 1.9541953   9.4 9.84001e+16 67
    2 2013 "A" "France"          17161 0 0 2  -6.333579     -5.608  2994846  .8637155  9.92 9.87864e+16 67
    2 2014 "A" "France"          28637 0 0 2  -6.108374     -5.608  3023483  .5077588 10.29 1.04153e+17 68
    3 1990 "B" "Germany"             . 0 1 3   4.255927      4.029  3090684 2.6964715  4.89           . 24
    3 1991 "B" "Germany"        154874 0 1 3   .5641079 -1.6926868  3245558 4.0470366  5.32 1.21946e+17 25
    3 1992 "B" "Germany"         62415 0 1 3   .3835026     -1.973  3307973  5.056979  6.32 1.29671e+17 25
    3 1993 "B" "Germany"        -32314 0 1 3   .3835026     -1.973  3275659  4.474575  7.68 1.14872e+17 25
    3 1994 "B" "Germany"         78350 1 1 4   .8169867 -1.2526814  3354009  2.693057  8.73 1.21188e+17 26
    3 2000 "B" "Germany"             . 0 1 4  1.9049623      2.806  3738235  1.440268  7.92 1.17777e+17 27
    3 2001 "B" "Germany"         62857 0 1 4  1.9049622      2.806  3801092  1.983857  7.77 1.17577e+17 27
    3 2002 "B" "Germany"         -7525 1 1 4   1.283737  2.0394616  3793567 1.4208056  8.48 1.22359e+17 28
    3 2003 "B" "Germany"        -26559 0 1 3 -1.3254085      -1.18  3767008 1.0342277  9.78 1.50381e+17 28
    3 2004 "B" "Germany"         44265 0 1 3 -1.3254085      -1.18  3811273 1.6657335 10.73 1.75373e+17 28
    3 2010 "B" "Germany"             . 0 1 4  1.1606808       -.05  4071113 1.1038091  6.97  2.0819e+16 30
    3 2011 "B" "Germany"        159799 0 1 4  1.1606808       -.05  4230912 2.0751746  5.82 2.44939e+17 30
    3 2012 "B" "Germany"         17706 0 1 4  1.1606808       -.05  4248618  2.008491  5.38 2.25506e+17 30
    3 2013 "B" "Germany"         18592 1 1 4  1.1165322      .0625  4267210  1.504721  5.23 2.32201e+17 31
    3 2014 "B" "Germany"         94286 0 . .  .01281774      2.875  4361496  .9067979  4.98 2.37452e+17 31
    4 1990 "B" "Italy"               . 0 0 2   3.829138       3.66  2199474  6.456609  9.79 4.78032e+16 49
    4 1991 "B" "Italy"           33838 0 1 2   3.777037       3.66  2233312      6.25  10.1 4.77836e+16 50
    4 1992 "B" "Italy"           18632 1 1 1   2.615249  2.9572604  2251944   5.27059  9.33 5.49646e+16 51
    4 1993 "B" "Italy"          -19205 0 1 1  1.4437795   .7736538  2232739 4.6267347 10.24 4.39227e+16 52
    4 1994 "B" "Italy"           48027 1 1 1  4.0116615  13.063806  2280766  4.051842 11.09 5.03757e+16 53
    4 2000 "B" "Italy"               . 0 1 1    5.97681   1.812674  2598506 2.5376854 10.84 7.22295e+16 58
    4 2001 "B" "Italy"           50706 1 1 4   7.402936  4.7593465  2649212  2.785165   9.6 7.13962e+16 59
    4 2002 "B" "Italy"            6728 0 0 4      8.534      8.534  2655940  2.465323  9.21 7.52609e+16 59
    4 2003 "B" "Italy"            3682 0 0 4      8.534      8.534  2659622 2.6725554  8.87 8.79973e+16 59
    4 2004 "B" "Italy"           37862 0 0 4      8.534      8.534  2697484 2.2067366  7.87 1.03212e+17 59
    4 2010 "B" "Italy"               . 0 0 3     -16.25          0  2680599  1.525516  8.36  1.1136e+16 62
    4 2011 "B" "Italy"           18960 0 0 3 -14.330358          0  2699559  2.780633  8.36 1.24328e+17 63
    4 2012 "B" "Italy"          -80471 0 1 1          0          0  2619088  3.041363 10.65 1.11628e+17 64
    4 2013 "B" "Italy"          -48219 1 1 1   .6806767 -1.9956785  2570869 1.2199935 12.15 1.13354e+17 67
    4 2014 "B" "Italy"            -117 0 0 1 -1.9673892     -2.941  2570752 .24104743 12.68 1.12803e+17 68
    5 1990 "C" "United Kingdom"      . 0 1 3     16.809     16.809  1846210  8.063461  6.97 5.76584e+16 18
    5 1991 "C" "United Kingdom" -20366 0 1 2     16.809     16.809  1825844  7.461783  8.55 5.51486e+16 18
    5 1992 "C" "United Kingdom"   7323 1 1 5  13.403038  13.403038  1833167 4.5915494  9.78 5.94441e+16 19
    5 1993 "C" "United Kingdom"  45655 0 1 5       12.1       12.1  1878822  2.558578 10.35 5.35544e+16 19
    5 1994 "C" "United Kingdom"  72260 0 1 5       12.1       12.1  1951082 2.2190125  9.65 5.90669e+16 19
    5 2000 "C" "United Kingdom"      . 0 1 4      1.806      1.806  2386524 1.1829562  5.56 6.64745e+16 20
    5 2001 "C" "United Kingdom"  65159 1 1 4   1.958802   1.958802  2451683 1.5323496   4.7 6.12017e+16 21
    5 2002 "C" "United Kingdom"  53417 0 1 4      2.076      2.076  2505100 1.5204024  5.04 6.22673e+16 21
    5 2003 "C" "United Kingdom"  83217 0 1 4      2.076      2.076  2588317 1.3765004  4.81 6.70562e+16 21
    5 2004 "C" "United Kingdom"  59176 0 1 4      2.076      2.076  2647493 1.3903975  4.59 7.82735e+16 21
    5 2010 "C" "United Kingdom"      . 1 1 3   3.564247   3.697813  2796536  2.492655  7.79 6.99631e+16 24
    5 2011 "C" "United Kingdom"  35676 0 . .   5.454876   5.670001  2832212 3.8561125  8.04 7.67414e+16 24
    5 2012 "C" "United Kingdom"  40510 0 . .   5.454876       5.67  2872722  2.573235  7.88 7.59697e+16 24
    5 2013 "C" "United Kingdom"  62803 0 . .   5.454876   5.670001  2935525 2.2916667  7.52 7.64401e+16 24
    5 2014 "C" "United Kingdom"  84036 0 . .   5.454876   5.670001  3019561   1.45112  6.11 7.94942e+16 24
    6 1990 "C" "United States"       . 1 1 2       14.9       14.9 10650444  5.397956   5.6           . 24
    6 1991 "C" "United States"  -11531 0 1 2       14.9       14.9 10638913  4.234964   6.8           . 25
    6 1992 "C" "United States"  374749 1 1 2       14.9       14.9 11013662 3.0288196   7.5           . 25
    6 1993 "C" "United States"  303072 0 1 2   .0375137   .0375137 11316734  2.951657   6.9           . 26
    6 1994 "C" "United States"  455928 1 1 2      -.781      -.781 11772662  2.607442  6.12           . 26
    6 2000 "C" "United States"       . 1 1 2      1.463      1.463 14931055  3.376857  3.99           . 29
    6 2001 "C" "United States"  142493 0 1 2   5.542341   5.542341 15073548  2.826171  4.73           . 30
    6 2002 "C" "United States"  255639 1 1 2      5.767      5.767 15329187 1.5860317  5.78           . 30
    6 2003 "C" "United States"  428635 0 1 2      5.767      5.767 15757822 2.2700949  5.99           . 31
    6 2004 "C" "United States"  607079 1 1 2      5.767      5.767 16364901  2.677237  5.53           . 31
    6 2010 "C" "United States"       . 1 1 2       -.36       -.36 17784695 1.6400435  9.63           . 34
    6 2011 "C" "United States"  275644 0 1 2       -.36       -.36 18060339 3.1568415  8.95           . 35
    6 2012 "C" "United States"  411899 1 1 2       -.36       -.36 18472238 2.0693374  8.07           . 35
    6 2013 "C" "United States"  340236 0 1 2   1.298654   1.298654 18812474 1.4648327  7.37           . 36
    6 2014 "C" "United States"  430387 0 1 .       1.39       1.39 19242861  1.622223  6.17           . 36
    end

  • #2
    h lasso linear

    Or look up the github version of the package lassopack

    Comment


    • #3
      Originally posted by Jared Greathouse View Post
      h lasso linear

      Or look up the github version of the package lassopack
      I just needed some help with the code since I have not used it before...

      Comment


      • #4
        The help file explains this. Without knowing the broader context, if you just wanna use the LASSO StataCorp gives us, here is one way to do this given your example data.
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input float(id ts) str1 group str97 country float(ggdp dummy1) byte(dummy2 length) float(indicator1 indicator2) long gdp float(cpi u) double Output byte count
        1 1990 "A" "Australia"           . 1 1 3  10.999176  10.999176   571564  7.333022  6.93 1.28877e+16 26
        1 1991 "A" "Australia"       -5720 0 1 2       12.2       12.2   565844  3.176675  9.58  1.3296e+15 27
        1 1992 "A" "Australia"       14434 0 1 2       12.2       12.2   580278 1.0122311 10.73 1.22544e+16 27
        1 1993 "A" "Australia"       22666 1 1 3  17.553352  17.553352   602944 1.7536534 10.87 1.18076e+16 28
        1 1994 "A" "Australia"       29533 0 1 3      19.11      19.11   632477 1.9696348  9.72 1.34421e+16 28
        1 2000 "A" "Australia"           . 0 1 3     28.634     28.634   798334  4.457435  6.28 1.39202e+16 30
        1 2001 "A" "Australia"       20810 1 1 3   26.41307   26.54496   819144 4.4071355  6.74 1.30234e+16 31
        1 2002 "A" "Australia"       33904 0 1 3   5.536341      6.908   853048 2.9815745  6.37 1.61436e+16 31
        1 2003 "A" "Australia"       23667 0 1 3   5.536341      6.908   876715  2.732596  5.93 2.02622e+16 31
        1 2004 "A" "Australia"       36212 1 1 3   7.350179   8.569935   912927 2.3432553  5.39 2.35621e+16 32
        1 2010 "A" "Australia"           . 1 1 3   6.241802   6.241802  1080050   2.91834  5.21 3.47466e+16 35
        1 2011 "A" "Australia"       30676 0 0 3      5.068      5.068  1110726   3.30385  5.08  4.0027e+15 35
        1 2012 "A" "Australia"       42237 0 0 3      5.068      5.068  1152963 1.7627802  5.22 4.13369e+16 35
        1 2013 "A" "Australia"       24591 1 1 3   7.929047   8.248285  1177554  2.449889  5.66 3.74135e+16 37
        1 2014 "A" "Australia"       30301 0 . .  15.081665     16.199  1207855  2.487923  6.08 3.47753e+16 37
        2 1990 "A" "France"              . 0 0 3        1.6        1.6  2081911 3.1942835  9.36 5.03813e+16 51
        2 1991 "A" "France"          21822 0 0 3        1.6        1.6  2103733  3.213407  9.13 4.93537e+16 52
        2 1992 "A" "France"          33646 0 1 1        1.6        1.6  2137379 2.3637605 10.21  5.2038e+15 53
        2 1993 "A" "France"         -13437 1 1 2   6.996447  -.8161401  2123942 2.1044629 11.32 5.80229e+16 54
        2 1994 "A" "France"          50090 0 0 2    8.69136     -1.575  2174032 1.6555153 12.59   5.071e+14 54
        2 2000 "A" "France"              . 0 0 5   2.364474      2.797  2564959   1.67596 10.22 7.85938e+16 57
        2 2001 "A" "France"          50881 0 0 5   2.364474      2.797  2615840 1.6347808  8.61 7.85391e+16 57
        2 2002 "A" "France"          29704 1 1 5  -.7668521  -.7921649  2645544 1.9234123   8.7 8.75676e+16 59
        2 2003 "A" "France"          21777 0 0 2 -2.9592714     -3.286  2667321  2.098472  8.31 1.03639e+17 59
        2 2004 "A" "France"          75479 0 0 2  -2.950783     -3.286  2742800 2.1420896  8.91  1.0654e+16 60
        2 2010 "A" "France"              . 0 0 3      3.891      3.891  2904699 1.5311227  8.87 9.63052e+16 64
        2 2011 "A" "France"          63691 0 0 1      3.891      3.891  2968390  2.111598  8.81 1.08307e+17 65
        2 2012 "A" "France"           9295 1 1 2 -2.5410414 -2.1623123  2977685 1.9541953   9.4 9.84001e+16 67
        2 2013 "A" "France"          17161 0 0 2  -6.333579     -5.608  2994846  .8637155  9.92 9.87864e+16 67
        2 2014 "A" "France"          28637 0 0 2  -6.108374     -5.608  3023483  .5077588 10.29 1.04153e+17 68
        3 1990 "B" "Germany"             . 0 1 3   4.255927      4.029  3090684 2.6964715  4.89           . 24
        3 1991 "B" "Germany"        154874 0 1 3   .5641079 -1.6926868  3245558 4.0470366  5.32 1.21946e+17 25
        3 1992 "B" "Germany"         62415 0 1 3   .3835026     -1.973  3307973  5.056979  6.32 1.29671e+17 25
        3 1993 "B" "Germany"        -32314 0 1 3   .3835026     -1.973  3275659  4.474575  7.68 1.14872e+17 25
        3 1994 "B" "Germany"         78350 1 1 4   .8169867 -1.2526814  3354009  2.693057  8.73 1.21188e+17 26
        3 2000 "B" "Germany"             . 0 1 4  1.9049623      2.806  3738235  1.440268  7.92 1.17777e+17 27
        3 2001 "B" "Germany"         62857 0 1 4  1.9049622      2.806  3801092  1.983857  7.77 1.17577e+17 27
        3 2002 "B" "Germany"         -7525 1 1 4   1.283737  2.0394616  3793567 1.4208056  8.48 1.22359e+17 28
        3 2003 "B" "Germany"        -26559 0 1 3 -1.3254085      -1.18  3767008 1.0342277  9.78 1.50381e+17 28
        3 2004 "B" "Germany"         44265 0 1 3 -1.3254085      -1.18  3811273 1.6657335 10.73 1.75373e+17 28
        3 2010 "B" "Germany"             . 0 1 4  1.1606808       -.05  4071113 1.1038091  6.97  2.0819e+16 30
        3 2011 "B" "Germany"        159799 0 1 4  1.1606808       -.05  4230912 2.0751746  5.82 2.44939e+17 30
        3 2012 "B" "Germany"         17706 0 1 4  1.1606808       -.05  4248618  2.008491  5.38 2.25506e+17 30
        3 2013 "B" "Germany"         18592 1 1 4  1.1165322      .0625  4267210  1.504721  5.23 2.32201e+17 31
        3 2014 "B" "Germany"         94286 0 . .  .01281774      2.875  4361496  .9067979  4.98 2.37452e+17 31
        4 1990 "B" "Italy"               . 0 0 2   3.829138       3.66  2199474  6.456609  9.79 4.78032e+16 49
        4 1991 "B" "Italy"           33838 0 1 2   3.777037       3.66  2233312      6.25  10.1 4.77836e+16 50
        4 1992 "B" "Italy"           18632 1 1 1   2.615249  2.9572604  2251944   5.27059  9.33 5.49646e+16 51
        4 1993 "B" "Italy"          -19205 0 1 1  1.4437795   .7736538  2232739 4.6267347 10.24 4.39227e+16 52
        4 1994 "B" "Italy"           48027 1 1 1  4.0116615  13.063806  2280766  4.051842 11.09 5.03757e+16 53
        4 2000 "B" "Italy"               . 0 1 1    5.97681   1.812674  2598506 2.5376854 10.84 7.22295e+16 58
        4 2001 "B" "Italy"           50706 1 1 4   7.402936  4.7593465  2649212  2.785165   9.6 7.13962e+16 59
        4 2002 "B" "Italy"            6728 0 0 4      8.534      8.534  2655940  2.465323  9.21 7.52609e+16 59
        4 2003 "B" "Italy"            3682 0 0 4      8.534      8.534  2659622 2.6725554  8.87 8.79973e+16 59
        4 2004 "B" "Italy"           37862 0 0 4      8.534      8.534  2697484 2.2067366  7.87 1.03212e+17 59
        4 2010 "B" "Italy"               . 0 0 3     -16.25          0  2680599  1.525516  8.36  1.1136e+16 62
        4 2011 "B" "Italy"           18960 0 0 3 -14.330358          0  2699559  2.780633  8.36 1.24328e+17 63
        4 2012 "B" "Italy"          -80471 0 1 1          0          0  2619088  3.041363 10.65 1.11628e+17 64
        4 2013 "B" "Italy"          -48219 1 1 1   .6806767 -1.9956785  2570869 1.2199935 12.15 1.13354e+17 67
        4 2014 "B" "Italy"            -117 0 0 1 -1.9673892     -2.941  2570752 .24104743 12.68 1.12803e+17 68
        5 1990 "C" "United Kingdom"      . 0 1 3     16.809     16.809  1846210  8.063461  6.97 5.76584e+16 18
        5 1991 "C" "United Kingdom" -20366 0 1 2     16.809     16.809  1825844  7.461783  8.55 5.51486e+16 18
        5 1992 "C" "United Kingdom"   7323 1 1 5  13.403038  13.403038  1833167 4.5915494  9.78 5.94441e+16 19
        5 1993 "C" "United Kingdom"  45655 0 1 5       12.1       12.1  1878822  2.558578 10.35 5.35544e+16 19
        5 1994 "C" "United Kingdom"  72260 0 1 5       12.1       12.1  1951082 2.2190125  9.65 5.90669e+16 19
        5 2000 "C" "United Kingdom"      . 0 1 4      1.806      1.806  2386524 1.1829562  5.56 6.64745e+16 20
        5 2001 "C" "United Kingdom"  65159 1 1 4   1.958802   1.958802  2451683 1.5323496   4.7 6.12017e+16 21
        5 2002 "C" "United Kingdom"  53417 0 1 4      2.076      2.076  2505100 1.5204024  5.04 6.22673e+16 21
        5 2003 "C" "United Kingdom"  83217 0 1 4      2.076      2.076  2588317 1.3765004  4.81 6.70562e+16 21
        5 2004 "C" "United Kingdom"  59176 0 1 4      2.076      2.076  2647493 1.3903975  4.59 7.82735e+16 21
        5 2010 "C" "United Kingdom"      . 1 1 3   3.564247   3.697813  2796536  2.492655  7.79 6.99631e+16 24
        5 2011 "C" "United Kingdom"  35676 0 . .   5.454876   5.670001  2832212 3.8561125  8.04 7.67414e+16 24
        5 2012 "C" "United Kingdom"  40510 0 . .   5.454876       5.67  2872722  2.573235  7.88 7.59697e+16 24
        5 2013 "C" "United Kingdom"  62803 0 . .   5.454876   5.670001  2935525 2.2916667  7.52 7.64401e+16 24
        5 2014 "C" "United Kingdom"  84036 0 . .   5.454876   5.670001  3019561   1.45112  6.11 7.94942e+16 24
        6 1990 "C" "United States"       . 1 1 2       14.9       14.9 10650444  5.397956   5.6           . 24
        6 1991 "C" "United States"  -11531 0 1 2       14.9       14.9 10638913  4.234964   6.8           . 25
        6 1992 "C" "United States"  374749 1 1 2       14.9       14.9 11013662 3.0288196   7.5           . 25
        6 1993 "C" "United States"  303072 0 1 2   .0375137   .0375137 11316734  2.951657   6.9           . 26
        6 1994 "C" "United States"  455928 1 1 2      -.781      -.781 11772662  2.607442  6.12           . 26
        6 2000 "C" "United States"       . 1 1 2      1.463      1.463 14931055  3.376857  3.99           . 29
        6 2001 "C" "United States"  142493 0 1 2   5.542341   5.542341 15073548  2.826171  4.73           . 30
        6 2002 "C" "United States"  255639 1 1 2      5.767      5.767 15329187 1.5860317  5.78           . 30
        6 2003 "C" "United States"  428635 0 1 2      5.767      5.767 15757822 2.2700949  5.99           . 31
        6 2004 "C" "United States"  607079 1 1 2      5.767      5.767 16364901  2.677237  5.53           . 31
        6 2010 "C" "United States"       . 1 1 2       -.36       -.36 17784695 1.6400435  9.63           . 34
        6 2011 "C" "United States"  275644 0 1 2       -.36       -.36 18060339 3.1568415  8.95           . 35
        6 2012 "C" "United States"  411899 1 1 2       -.36       -.36 18472238 2.0693374  8.07           . 35
        6 2013 "C" "United States"  340236 0 1 2   1.298654   1.298654 18812474 1.4648327  7.37           . 36
        6 2014 "C" "United States"  430387 0 1 .       1.39       1.39 19242861  1.622223  6.17           . 36
        end
        cls
         lasso linear ggdp gdp cpi dummy1 dummy2 c.indicator1##c.indicator1
         
         di e(allvars_sel)
        We run the standard LASSO and display the relevant covariates that're selected afterwards.

        Comment


        • #5
          Originally posted by Jared Greathouse View Post
          The help file explains this. Without knowing the broader context, if you just wanna use the LASSO StataCorp gives us, here is one way to do this given your example data.
          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(id ts) str1 group str97 country float(ggdp dummy1) byte(dummy2 length) float(indicator1 indicator2) long gdp float(cpi u) double Output byte count
          1 1990 "A" "Australia" . 1 1 3 10.999176 10.999176 571564 7.333022 6.93 1.28877e+16 26
          1 1991 "A" "Australia" -5720 0 1 2 12.2 12.2 565844 3.176675 9.58 1.3296e+15 27
          1 1992 "A" "Australia" 14434 0 1 2 12.2 12.2 580278 1.0122311 10.73 1.22544e+16 27
          1 1993 "A" "Australia" 22666 1 1 3 17.553352 17.553352 602944 1.7536534 10.87 1.18076e+16 28
          1 1994 "A" "Australia" 29533 0 1 3 19.11 19.11 632477 1.9696348 9.72 1.34421e+16 28
          1 2000 "A" "Australia" . 0 1 3 28.634 28.634 798334 4.457435 6.28 1.39202e+16 30
          1 2001 "A" "Australia" 20810 1 1 3 26.41307 26.54496 819144 4.4071355 6.74 1.30234e+16 31
          1 2002 "A" "Australia" 33904 0 1 3 5.536341 6.908 853048 2.9815745 6.37 1.61436e+16 31
          1 2003 "A" "Australia" 23667 0 1 3 5.536341 6.908 876715 2.732596 5.93 2.02622e+16 31
          1 2004 "A" "Australia" 36212 1 1 3 7.350179 8.569935 912927 2.3432553 5.39 2.35621e+16 32
          1 2010 "A" "Australia" . 1 1 3 6.241802 6.241802 1080050 2.91834 5.21 3.47466e+16 35
          1 2011 "A" "Australia" 30676 0 0 3 5.068 5.068 1110726 3.30385 5.08 4.0027e+15 35
          1 2012 "A" "Australia" 42237 0 0 3 5.068 5.068 1152963 1.7627802 5.22 4.13369e+16 35
          1 2013 "A" "Australia" 24591 1 1 3 7.929047 8.248285 1177554 2.449889 5.66 3.74135e+16 37
          1 2014 "A" "Australia" 30301 0 . . 15.081665 16.199 1207855 2.487923 6.08 3.47753e+16 37
          2 1990 "A" "France" . 0 0 3 1.6 1.6 2081911 3.1942835 9.36 5.03813e+16 51
          2 1991 "A" "France" 21822 0 0 3 1.6 1.6 2103733 3.213407 9.13 4.93537e+16 52
          2 1992 "A" "France" 33646 0 1 1 1.6 1.6 2137379 2.3637605 10.21 5.2038e+15 53
          2 1993 "A" "France" -13437 1 1 2 6.996447 -.8161401 2123942 2.1044629 11.32 5.80229e+16 54
          2 1994 "A" "France" 50090 0 0 2 8.69136 -1.575 2174032 1.6555153 12.59 5.071e+14 54
          2 2000 "A" "France" . 0 0 5 2.364474 2.797 2564959 1.67596 10.22 7.85938e+16 57
          2 2001 "A" "France" 50881 0 0 5 2.364474 2.797 2615840 1.6347808 8.61 7.85391e+16 57
          2 2002 "A" "France" 29704 1 1 5 -.7668521 -.7921649 2645544 1.9234123 8.7 8.75676e+16 59
          2 2003 "A" "France" 21777 0 0 2 -2.9592714 -3.286 2667321 2.098472 8.31 1.03639e+17 59
          2 2004 "A" "France" 75479 0 0 2 -2.950783 -3.286 2742800 2.1420896 8.91 1.0654e+16 60
          2 2010 "A" "France" . 0 0 3 3.891 3.891 2904699 1.5311227 8.87 9.63052e+16 64
          2 2011 "A" "France" 63691 0 0 1 3.891 3.891 2968390 2.111598 8.81 1.08307e+17 65
          2 2012 "A" "France" 9295 1 1 2 -2.5410414 -2.1623123 2977685 1.9541953 9.4 9.84001e+16 67
          2 2013 "A" "France" 17161 0 0 2 -6.333579 -5.608 2994846 .8637155 9.92 9.87864e+16 67
          2 2014 "A" "France" 28637 0 0 2 -6.108374 -5.608 3023483 .5077588 10.29 1.04153e+17 68
          3 1990 "B" "Germany" . 0 1 3 4.255927 4.029 3090684 2.6964715 4.89 . 24
          3 1991 "B" "Germany" 154874 0 1 3 .5641079 -1.6926868 3245558 4.0470366 5.32 1.21946e+17 25
          3 1992 "B" "Germany" 62415 0 1 3 .3835026 -1.973 3307973 5.056979 6.32 1.29671e+17 25
          3 1993 "B" "Germany" -32314 0 1 3 .3835026 -1.973 3275659 4.474575 7.68 1.14872e+17 25
          3 1994 "B" "Germany" 78350 1 1 4 .8169867 -1.2526814 3354009 2.693057 8.73 1.21188e+17 26
          3 2000 "B" "Germany" . 0 1 4 1.9049623 2.806 3738235 1.440268 7.92 1.17777e+17 27
          3 2001 "B" "Germany" 62857 0 1 4 1.9049622 2.806 3801092 1.983857 7.77 1.17577e+17 27
          3 2002 "B" "Germany" -7525 1 1 4 1.283737 2.0394616 3793567 1.4208056 8.48 1.22359e+17 28
          3 2003 "B" "Germany" -26559 0 1 3 -1.3254085 -1.18 3767008 1.0342277 9.78 1.50381e+17 28
          3 2004 "B" "Germany" 44265 0 1 3 -1.3254085 -1.18 3811273 1.6657335 10.73 1.75373e+17 28
          3 2010 "B" "Germany" . 0 1 4 1.1606808 -.05 4071113 1.1038091 6.97 2.0819e+16 30
          3 2011 "B" "Germany" 159799 0 1 4 1.1606808 -.05 4230912 2.0751746 5.82 2.44939e+17 30
          3 2012 "B" "Germany" 17706 0 1 4 1.1606808 -.05 4248618 2.008491 5.38 2.25506e+17 30
          3 2013 "B" "Germany" 18592 1 1 4 1.1165322 .0625 4267210 1.504721 5.23 2.32201e+17 31
          3 2014 "B" "Germany" 94286 0 . . .01281774 2.875 4361496 .9067979 4.98 2.37452e+17 31
          4 1990 "B" "Italy" . 0 0 2 3.829138 3.66 2199474 6.456609 9.79 4.78032e+16 49
          4 1991 "B" "Italy" 33838 0 1 2 3.777037 3.66 2233312 6.25 10.1 4.77836e+16 50
          4 1992 "B" "Italy" 18632 1 1 1 2.615249 2.9572604 2251944 5.27059 9.33 5.49646e+16 51
          4 1993 "B" "Italy" -19205 0 1 1 1.4437795 .7736538 2232739 4.6267347 10.24 4.39227e+16 52
          4 1994 "B" "Italy" 48027 1 1 1 4.0116615 13.063806 2280766 4.051842 11.09 5.03757e+16 53
          4 2000 "B" "Italy" . 0 1 1 5.97681 1.812674 2598506 2.5376854 10.84 7.22295e+16 58
          4 2001 "B" "Italy" 50706 1 1 4 7.402936 4.7593465 2649212 2.785165 9.6 7.13962e+16 59
          4 2002 "B" "Italy" 6728 0 0 4 8.534 8.534 2655940 2.465323 9.21 7.52609e+16 59
          4 2003 "B" "Italy" 3682 0 0 4 8.534 8.534 2659622 2.6725554 8.87 8.79973e+16 59
          4 2004 "B" "Italy" 37862 0 0 4 8.534 8.534 2697484 2.2067366 7.87 1.03212e+17 59
          4 2010 "B" "Italy" . 0 0 3 -16.25 0 2680599 1.525516 8.36 1.1136e+16 62
          4 2011 "B" "Italy" 18960 0 0 3 -14.330358 0 2699559 2.780633 8.36 1.24328e+17 63
          4 2012 "B" "Italy" -80471 0 1 1 0 0 2619088 3.041363 10.65 1.11628e+17 64
          4 2013 "B" "Italy" -48219 1 1 1 .6806767 -1.9956785 2570869 1.2199935 12.15 1.13354e+17 67
          4 2014 "B" "Italy" -117 0 0 1 -1.9673892 -2.941 2570752 .24104743 12.68 1.12803e+17 68
          5 1990 "C" "United Kingdom" . 0 1 3 16.809 16.809 1846210 8.063461 6.97 5.76584e+16 18
          5 1991 "C" "United Kingdom" -20366 0 1 2 16.809 16.809 1825844 7.461783 8.55 5.51486e+16 18
          5 1992 "C" "United Kingdom" 7323 1 1 5 13.403038 13.403038 1833167 4.5915494 9.78 5.94441e+16 19
          5 1993 "C" "United Kingdom" 45655 0 1 5 12.1 12.1 1878822 2.558578 10.35 5.35544e+16 19
          5 1994 "C" "United Kingdom" 72260 0 1 5 12.1 12.1 1951082 2.2190125 9.65 5.90669e+16 19
          5 2000 "C" "United Kingdom" . 0 1 4 1.806 1.806 2386524 1.1829562 5.56 6.64745e+16 20
          5 2001 "C" "United Kingdom" 65159 1 1 4 1.958802 1.958802 2451683 1.5323496 4.7 6.12017e+16 21
          5 2002 "C" "United Kingdom" 53417 0 1 4 2.076 2.076 2505100 1.5204024 5.04 6.22673e+16 21
          5 2003 "C" "United Kingdom" 83217 0 1 4 2.076 2.076 2588317 1.3765004 4.81 6.70562e+16 21
          5 2004 "C" "United Kingdom" 59176 0 1 4 2.076 2.076 2647493 1.3903975 4.59 7.82735e+16 21
          5 2010 "C" "United Kingdom" . 1 1 3 3.564247 3.697813 2796536 2.492655 7.79 6.99631e+16 24
          5 2011 "C" "United Kingdom" 35676 0 . . 5.454876 5.670001 2832212 3.8561125 8.04 7.67414e+16 24
          5 2012 "C" "United Kingdom" 40510 0 . . 5.454876 5.67 2872722 2.573235 7.88 7.59697e+16 24
          5 2013 "C" "United Kingdom" 62803 0 . . 5.454876 5.670001 2935525 2.2916667 7.52 7.64401e+16 24
          5 2014 "C" "United Kingdom" 84036 0 . . 5.454876 5.670001 3019561 1.45112 6.11 7.94942e+16 24
          6 1990 "C" "United States" . 1 1 2 14.9 14.9 10650444 5.397956 5.6 . 24
          6 1991 "C" "United States" -11531 0 1 2 14.9 14.9 10638913 4.234964 6.8 . 25
          6 1992 "C" "United States" 374749 1 1 2 14.9 14.9 11013662 3.0288196 7.5 . 25
          6 1993 "C" "United States" 303072 0 1 2 .0375137 .0375137 11316734 2.951657 6.9 . 26
          6 1994 "C" "United States" 455928 1 1 2 -.781 -.781 11772662 2.607442 6.12 . 26
          6 2000 "C" "United States" . 1 1 2 1.463 1.463 14931055 3.376857 3.99 . 29
          6 2001 "C" "United States" 142493 0 1 2 5.542341 5.542341 15073548 2.826171 4.73 . 30
          6 2002 "C" "United States" 255639 1 1 2 5.767 5.767 15329187 1.5860317 5.78 . 30
          6 2003 "C" "United States" 428635 0 1 2 5.767 5.767 15757822 2.2700949 5.99 . 31
          6 2004 "C" "United States" 607079 1 1 2 5.767 5.767 16364901 2.677237 5.53 . 31
          6 2010 "C" "United States" . 1 1 2 -.36 -.36 17784695 1.6400435 9.63 . 34
          6 2011 "C" "United States" 275644 0 1 2 -.36 -.36 18060339 3.1568415 8.95 . 35
          6 2012 "C" "United States" 411899 1 1 2 -.36 -.36 18472238 2.0693374 8.07 . 35
          6 2013 "C" "United States" 340236 0 1 2 1.298654 1.298654 18812474 1.4648327 7.37 . 36
          6 2014 "C" "United States" 430387 0 1 . 1.39 1.39 19242861 1.622223 6.17 . 36
          end
          cls
          lasso linear ggdp gdp cpi dummy1 dummy2 c.indicator1##c.indicator1
          
          di e(allvars_sel)
          We run the standard LASSO and display the relevant covariates that're selected afterwards.
          Is it valid for panel data?

          Comment


          • #6
            Indeed!!! I use LASSO for my forthcoming synthetic controls command.

            Now, I don't know about all the kinds of LASSO and every detail of the specifics underlying all of the options- different kinds of LASSO may work better than others (e.g., adaptive LASSO) for different applications, but LASSO in general is great for panel data, providing that you know how to use it. So my advice is, whatever situation you're specifically using it for, use it carefully, because different options and so on will yield different results.

            Comment


            • #7
              Originally posted by Jared Greathouse View Post
              Indeed!!! I use LASSO for my forthcoming synthetic controls command.

              Now, I don't know about all the kinds of LASSO and every detail of the specifics underlying all of the options- different kinds of LASSO may work better than others (e.g., adaptive LASSO) for different applications, but LASSO in general is great for panel data, providing that you know how to use it. So my advice is, whatever situation you're specifically using it for, use it carefully, because different options and so on will yield different results.

              HI Jared,

              Sorry to bring this up again. I have been trying to implement a lasso for a panel using the lassopack.
              I came across of the cross validation approach.
              I used the following code.

              Code:
              cvlasso   growthgdp l.gdp cpi u Output dummy1 dummy2 c.indicator1##c.indicator1,  plotcv
                
               cvlasso, lopt
               cvlasso, lse
              Not sure if that is correct or even the approach is correct, nor do I feel a hundred percent sure about the validity of this code. What I also do not get is how to get tested the lags if any in the variables. Also, from the quadratic relation it chooses some relations, but cannot select them individually as separated variables. How do I do that? Not sure, as I wrote, if this approach is correct. I would appreciate any help

              Comment


              • #8
                I'll post some sample code later

                Comment


                • #9
                  Originally posted by Jared Greathouse View Post
                  I'll post some sample code later
                  Thank you ! I appreciate your time and help in advance!

                  Comment


                  • #10
                    Hey so.... Here's some code, as promised. This is actually code that's quintessential to my command on synthetic controls. I'm likely using the LASSO in a different way than you, but this is the same idea anyways.

                    Consider a setting where we've one unit of 17 that's treated in 1975. All other 16 units are untreated. We want to construct a synthetic Basque Country post-1975, or, how would their GDP per capita look in the event that the terrorism wave they experienced didn't happen.
                    Code:
                    qui {
                    *import delim "https://raw.githubusercontent.com/SucreRouge/synth_control/master/basque.csv", clear
                    u "http://econ.korea.ac.kr/~chirokhan/panelbook/data/basque-clean.dta", clear
                    
                    replace regionname = "Asturias" if regionname=="Principado De Asturias" 
                    
                    loc int_time = 1975
                    
                    //sysuse basque, clear
                    
                    g treated = cond(regionno==17 & year >= `int_time',1,0)
                    
                    tempvar _XnormVar _xXnormVar
                    
                    labvars year gdpcap "Year" "ln(GDP per 100,000)"
                    
                    replace regionname = trim(regexr(regionname,"\(.+\) *",""))
                    
                    egen id = group(regionname), label(regionname) // makes a unique ID
                    
                    order id, b(year)
                    
                    *keep if year >= 1960
                    drop if inlist(id,18) //12
                    
                    keep gdp id year
                    xtset id year, y
                    
                    cls
                    
                    
                    br
                    
                    
                    greshape wide gdp, j(id) i( year)
                    
                    tsset year, y
                    
                    order gdpcap5, a(year)
                    
                    qui cvlasso gdpcap5 gdpcap1-gdpcap17 if year < `int_time', h(3) roll
                    }
                    collect clear
                    
                    collect: predict cf, lopt
                    
                    keep year gdpcap5 cf
                    
                    greshape long gdpcap, i(year) j(id)
                    
                    line gdp cf year, ///
                        lcol(black red) ///
                        legend(pos(5) ring(0) size(medium) region(fcolor(none))) ///
                        xli(1975, lcol(black) lpat(dash) lwidth(thick)) yti("ln(GDP per Capita)", size(medium)) ///
                        ylab(, labsize(medium)) ///
                        xlabel(, noticks labsize(medium)) xti(,size(medium)) ///
                        xsize(4) ysize(4) xti(Year) ///
                        legend(order(1 "Real Basque" 2 "LASSO Basque"))
                    After a little data cleaning and manipulation, we use the LASSO to predict the the Basque's GDP trends pre-1975, and then, using the untreated donors, predict the counterfactual post-1975.

                    All this is sort of background however. The real thing we're interested in is the predict command, the post-estimation command used by cvlasso. It shows the exact coefficients that we use to construct the counterfactual.


                    Note how here, I use the collect prefix in an attempt to collect (obviously!) the coefficients/donor units of interest..... I've only just gotten strted using collect, so I've no way of explicitly using collect just yet, but short of directly manipulating the predict command as FernandoRios advised me, the collect prefix may be the only way to to this, in this setting. Giorgio Di Stefano

                    Comment


                    • #11
                      I'VE DONE IT!!!! JESUS CHRIST, I'VE DONE IT!!! Here is my code. I'm gonna have to re-tool it to suit my own purposes, but this is how it's done.
                      Code:
                      qui {
                      *import delim "https://raw.githubusercontent.com/SucreRouge/synth_control/master/basque.csv", clear
                      u "http://econ.korea.ac.kr/~chirokhan/panelbook/data/basque-clean.dta", clear
                      
                      replace regionname = "Asturias" if regionname=="Principado De Asturias"
                      
                      loc int_time = 1975
                      
                      //sysuse basque, clear
                      
                      g treated = cond(regionno==17 & year >= `int_time',1,0)
                      
                      labvars year gdpcap "Year" "ln(GDP per 100,000)"
                      
                      replace regionname = trim(regexr(regionname,"\(.+\) *",""))
                      
                      egen id = group(regionname), label(regionname) // makes a unique ID
                      
                      order id, b(year)
                      
                      *keep if year >= 1960
                      drop if inlist(id,18) //12
                      
                      keep gdp id year
                      xtset id year, y
                      
                      cls
                      
                      greshape wide gdp, j(id) i( year)
                      
                      tsset year, y
                      
                      order gdpcap5, a(year)
                      }
                      qui cvlasso gdpcap5 gdpcap1-gdpcap17 if year < `int_time', h(1) roll postres
                      
                      qui cvlasso, lopt postres
                      
                      mat l e(beta)
                      
                      predict cf, lopt
                      FernandoRios Giorgio Di Stefano William Lisowski

                      Apparently, it was hidden in the postres post-estimation option the entire time.
                      Last edited by Jared Greathouse; 17 Jun 2022, 07:48.

                      Comment


                      • #12
                        Thank you so much for your code, Jared. I am a bit naive on this as it is the first time I am running a lasso approach.


                        I now have adapted the code you wrote in my case. These are the steps I followed.

                        My data are from 1945-2020.

                        1. I have already a panel id , so I did not create a new one. Not sure if I had to create a new one.


                        2
                        Code:
                         sort id ts
                        xtset id year
                        I used xtset instead of tsset. I think is the same. When I used tsset I go duplicates error.


                        3 I run the following code as below with one step ahead rolling.

                        Code:
                         cvlasso   growthgdp l.gdp cpi u Output dummy1 dummy2 c.indicator1##c.indicator1, h(1) roll  postres plotcv
                        Here is what I get on the display.


                        Code:
                        Rolling forecasting cross-validation with 1-step ahead forecasts. Elastic net with alpha=1.
                        Training from-to (validation point): 1980-2003 (2004), 1980-2004 (2005), 1980-2005 (2006), 1980-2006 (2007), 1980-2007 (2008), 1980-2008 (2009), 1980-2009 (2010), 1980-2010 (2011), 1980-2011 ( 2012), 1980-2012 (2013), 1980-2013 (2014).

                        and right after that I run

                        Code:
                        cvlasso, lopt postres
                         
                        mat l e(beta)
                        
                        predict cf, xb residuals lopt noisily


                        My training sample seems to be too large. It is 35 years approximately. Too large indeed. In fact, I am missing coefficients when I run the predict postestimation option for years before 1980. My sample data as said above is from 1945 to 2020. I could accept some penalty of missing data, say 10-12 years training sample, even if that takes longer but not 30 years. I get no prediction coefficients for the first 35 years!
                        My questions are

                        i) Is the code as I wrote correct?

                        ii) I have a quadratic relation in the code. Does large training sample have to do with the quadratic relation I'm having and how do I solve this in the code

                        iii) The predict post estimation option, is on the depended on variable, (Y) coefficient, growthgdp in my case. I am correct on this?

                        l have, also run the code twice using both options lopt and lse. Furthermore, I see that lse is more rigorous as it increases lampda. While I do understand that it is based on standard deviation MSE, it shrinks even more the selected variables, leading perhaps to a more parsimonious model. Which one is generally preferred in the literature?

                        Again, sorry for the naive questions, but it's just my first exercise of this kind of lasso technique.



                        Thank you again for your time and work

                        Giorgio!
                        Last edited by Giorgio Di Stefano; 20 Jun 2022, 15:57. Reason: misspeling

                        Comment


                        • #13
                          Show me the error tsset produced please? I don't buy that your training sample is too large. Why not?


                          in simulations of my synthetic control command, which relies on cvlasso, I've had pre-intervention periods of 1500 individual time periods, with 500 forecasting horizons. The minimum RMSE lambda or LSE lambda have their own merits. Ideally, neither should take change things very much. I imagine different literatures have different preferences.


                          Do you have missing data in your dataset?

                          Comment


                          • #14
                            Originally posted by Jared Greathouse View Post
                            Show me the error tsset produced please? I don't buy that your training sample is too large. Why not?


                            in simulations of my synthetic control command, which relies on cvlasso, I've had pre-intervention periods of 1500 individual time periods, with 500 forecasting horizons. The minimum RMSE lambda or LSE lambda have their own merits. Ideally, neither should take change things very much. I imagine different literatures have different preferences.


                            Do you have missing data in your dataset?
                            Yes, I do have gaps in the data. I attach a screenshot of what stata shows

                            The predict option starts from 1980 ending at 2014. Exactly more of less when data from the quadratic relation ends Could that be because of the missing data? Or somehow should I adjust the so-called k folder?


                            Click image for larger version

Name:	Screen Shot 2022-06-21 at 02.48.00.png
Views:	1
Size:	11.3 KB
ID:	1670163



                            Comment


                            • #15
                              Try
                              Code:
                              tsset panelvar year,  y

                              Comment

                              Working...
                              X