Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Issues with Stata lasso prediction

    Hi all,
    I came across a problem when implementing the Stata lasso predict command. What I was running is:

    (in a loop)
    ...
    predict `var'_p, xb postselection
    ....

    I checked the predicted outcomes and found some of them are missing. For all the missing predictions, I notice that the variables selected by Lasso include a weird " __000001". I suspect this is a misgenerated global macro by Stata lasso command, which causes the failure of prediction. Here is an example of a model selected by Lasso which fails to predict:

    ------------------------------------------
    | CV_min_genhealthFpr_1
    -------------------+----------------------
    _CLDlngtmhthyn_b | x
    _dntenryn3_b | x
    _dntenryn5_b | x
    _CLDalonedaysb | x
    _nooxcurrsb | x
    _PRGpntlenc_bi3 | x
    _CLDhealthrel_bi4 | x
    _CLDfreqmilk_bi1 | x
    _vigphysfreqF_bi3 | x
    _genhealthF_bi2 | x
    _genhealthF_bi4 | x
    _COGdprssdchrF_bi2 | x
    _COGbumpF_bi2 | x
    _edu2_bi4 | x
    _inschrsn11_bi14 | x
    _inschrsn12_bi5 | x
    _inschrsn14_bi13 | x
    _prijobind4_bi18 | x
    _prijobind4_bi23 | x
    _prijobind4_bi26 | x
    _prijobind5_bi17 | x
    _wtrsrcfar_bi1 | x
    _latrinetype_bi2 | x
    _genhealthM_bi3 | x
    _upchestpainM_bi1 | x
    _COGtiredrsnM_bi5 | x
    _notawasb_sq | x
    _nojwlsvsb_sq | x
    _soldlandrssb_sq | x
    _hhmineassc_b_mi | x
    _othhhsamelat_b_mi | x
    __000001 | x
    _cons | x
    ------------------------------------------

    My question is, is there any way to get rid of this "__000001"? If not, can I do the post selection prediction manually, by which I would like to use all the post selection variables except "
    __000001"? I am trying to extract the list of selected variables from
    e(allvars_sel), but it seems doesn't work:


    . local var e(allvars_sel)

    . display `var'
    _CLDlngtmhthyn_b _dntenryn3_b _dntenryn5_b _CLDalonedaysb _nooxcurrsb _PRGpntlenc_bi3 _CLDhealthrel_bi4 _CLDfreqmilk_bi1 _vigphysfreqF_bi3 _genhealthF_b
    > i2 _genhealthF_bi4 _COGdprssdchrF_bi2 _COGbumpF_bi2 _edu2_bi4 _inschrsn11_bi14 _inschrsn12_bi5 _inschrsn14_bi13 _prijobind4_bi18 _prijobind4_bi23 _pri
    > jobind4_bi26 _prijobind5_bi17 _wtrsrcfar_bi1 _latrinetype_bi2 _genhealthM_bi3 _upchestpainM_bi1 _COGtiredrsnM_bi5 _notawasb_sq _nojwlsvsb_sq _soldland
    > rssb_sq _hhmineassc_b_mi _othhhsamelat_b_mi __000001

    . local exclude __000001

    .
    . local post_sel_vars: list vars - exclude

    .
    . display `post_sel_vars'


    Any help/insight is very much appreciated!

  • #2
    Code:
    local vars "rssb_sq _hhmineassc_b_mi __000001 othhhsamelat_b_mi __000001"
    local vars= ustrregexra("`vars'", "__000001", "",.)
    di "`vars'"
    Res.:

    Code:
    . di "`vars'"
    rssb_sq _hhmineassc_b_mi  othhhsamelat_b_mi

    Comment


    • #3
      Originally posted by Andrew Musau View Post
      Code:
      local vars "rssb_sq _hhmineassc_b_mi __000001 othhhsamelat_b_mi __000001"
      local vars= ustrregexra("`vars'", "__000001", "",.)
      di "`vars'"
      Res.:

      Code:
      . di "`vars'"
      rssb_sq _hhmineassc_b_mi othhhsamelat_b_mi
      Hi Andrew,

      Thanks for replying! Unfortunately it does not work.

      Code:
      . local vars e(allvars_sel)
      
      . display `vars'
      _CLDlngtmhthyn_b _dntenryn3_b _dntenryn5_b _CLDalonedaysb _nooxcurrsb _PRGpntlenc_bi3 _CLDhealthrel_
      > bi4 _CLDfreqmilk_bi1 _vigphysfreqF_bi3 _genhealthF_bi2 _genhealthF_bi4 _COGdprssdchrF_bi2 _COGbump
      > F_bi2 _edu2_bi4 _inschrsn11_bi14 _inschrsn12_bi5 _inschrsn14_bi13 _prijobind4_bi18 _prijobind4_bi2
      > 3 _prijobind4_bi26 _prijobind5_bi17 _wtrsrcfar_bi1 _latrinetype_bi2 _genhealthM_bi3 _upchestpainM_
      > bi1 _COGtiredrsnM_bi5 _notawasb_sq _nojwlsvsb_sq _soldlandrssb_sq _hhmineassc_b_mi _othhhsamelat_b
      > _mi __000001
      
      .
      . local vars= ustrregexra("`vars'", "__000001", "",.)
      
      . di "`vars'"
      e(allvars_sel)
      It seems to me that Stata treat e() macro differently than normal global macro.

      Comment


      • #4
        Not sure what you are doing here because you are just including the text "e(allvars_sel)" in the local

        Code:
        local vars e(allvars_sel)
        display "`vars'"
        Res.:
        Code:
        . display "`vars'"
        e(allvars_sel)
        Instead, what I think you want is

        Code:
        local vars= e(allvars_sel)
        di "`vars'"
        Of course, nothing stops you from being direct, the extra local command is unnecessary

        Code:
        local vars= ustrregexra("`e(allvars_sel)'", "__000001", "",.)
        Last edited by Andrew Musau; 05 Aug 2020, 08:31.

        Comment


        • #5
          Originally posted by Andrew Musau View Post
          Not sure what you are doing here because you are just including the text "e(allvars_sel)" in the local

          Code:
          local vars e(allvars_sel)
          display "`vars'"
          Res.:
          Code:
          . display "`vars'"
          e(allvars_sel)
          Instead, what I think you want is

          Code:
          local vars= e(allvars_sel)
          di "`vars'"
          Of course, nothing stops you from being direct, the extra local command is unnecessary

          Code:
          local vars= ustrregexra("`e(allvars_sel)'", "__000001", "",.)
          Hi Andrew,

          This doesn't work in my case.
          Code:
          . local vars= ustrregexra("`e(allvars_sel)'", "__000001", "",.)
          
          . dis `vars'
          -.57239944.79718405-.9133504-.05240109-.45805511-.20602261.57693458-.51704282-.06150315-.85234064-.3
          > 148044-.57254755-.24979313-.20626488-.04987777-.06656814-.18042189-.05920211-.04894314-.02712628-.
          > 01917765-1.2039685-.82530624-.50459737-.34778938-.07192837-.60311311-.03084872-.03102635.03837648.
          > 82467687
          When "=" was used in creating a local, it actually shows the values instead of the variable names I want. Nevertheless, I appreciate your reply.

          Comment


          • #6
            Note in my code I have

            Code:
            di "`vars'"
            your double quotes somehow disappeared.

            Comment


            • #7
              Originally posted by Andrew Musau View Post
              Note in my code I have

              Code:
              di "`vars'"
              your double quotes somehow disappeared.
              Thanks Andrew! This works perfectly!
              Code:
              . local vars= ustrregexra("`e(allvars_sel)'", "__000001", "",.)
              
              . dis "`vars'"
              _CLDlngtmhthyn_b _dntenryn3_b _dntenryn5_b _CLDalonedaysb _nooxcurrsb _PRGpntlenc_bi3 _CLDhealthrel_
              > bi4 _CLDfreqmilk_bi1 _vigphysfreqF_bi3 _genhealthF_bi2 _genhealthF_bi4 _COGdprssdchrF_bi2 _COGbump
              > F_bi2 _edu2_bi4 _inschrsn11_bi14 _inschrsn12_bi5 _inschrsn14_bi13 _prijobind4_bi18 _prijobind4_bi2
              > 3 _prijobind4_bi26 _prijobind5_bi17 _wtrsrcfar_bi1 _latrinetype_bi2 _genhealthM_bi3 _upchestpainM_
              > bi1 _COGtiredrsnM_bi5 _notawasb_sq _nojwlsvsb_sq _soldlandrssb_sq _hhmineassc_b_mi _othhhsamelat_b
              > _mi
              Meanwhile, I just figured out another way to extract the variables from the coefficient matrix:
              Code:
              . local vars : colnames e(b)
              
              .
              . macro list _vars
              _vars:          _CLDlngtmhthyn_b _dntenryn3_b _dntenryn5_b _CLDalonedaysb _nooxcurrsb
                              _PRGpntlenc_bi3 _CLDhealthrel_bi4 _CLDfreqmilk_bi1 _vigphysfreqF_bi3 _genhealthF_bi2
                              _genhealthF_bi4 _COGdprssdchrF_bi2 _COGbumpF_bi2 _edu2_bi4 _inschrsn11_bi14
                              _inschrsn12_bi5 _inschrsn14_bi13 _prijobind4_bi18 _prijobind4_bi23 _prijobind4_bi26
                              _prijobind5_bi17 _wtrsrcfar_bi1 _latrinetype_bi2 _genhealthM_bi3 _upchestpainM_bi1
                              _COGtiredrsnM_bi5 _notawasb_sq _nojwlsvsb_sq _soldlandrssb_sq _hhmineassc_b_mi
                              _othhhsamelat_b_mi __000001 _cons
              
              .
              .
              . local exclude _cons __000001
              
              .
              . local vars : list vars - exclude
              
              .
              . macro list _vars
              _vars:          _CLDlngtmhthyn_b _dntenryn3_b _dntenryn5_b _CLDalonedaysb _nooxcurrsb
                              _PRGpntlenc_bi3 _CLDhealthrel_bi4 _CLDfreqmilk_bi1 _vigphysfreqF_bi3 _genhealthF_bi2
                              _genhealthF_bi4 _COGdprssdchrF_bi2 _COGbumpF_bi2 _edu2_bi4 _inschrsn11_bi14
                              _inschrsn12_bi5 _inschrsn14_bi13 _prijobind4_bi18 _prijobind4_bi23 _prijobind4_bi26
                              _prijobind5_bi17 _wtrsrcfar_bi1 _latrinetype_bi2 _genhealthM_bi3 _upchestpainM_bi1
                              _COGtiredrsnM_bi5 _notawasb_sq _nojwlsvsb_sq _soldlandrssb_sq _hhmineassc_b_mi
                              _othhhsamelat_b_mi
              Might be dumber but works. So I listed it here in case anyone find it helpful.

              Thanks again! Very much appreciate your help!

              Comment

              Working...
              X