Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • characteristic contents too long

    I have two sets of categorical variables, each of which takes on a small number of integer values. I'm trying to form all interactions between members of the two sets using the -xi- command at line 4 of the loop below. Ultimately the interactions will be fed into a lasso regression, but at the moment, just generating them results in a "characteristic contents too long" error.

    I haven't found much on this error. This thread https://www.statalist.org/forums/for...tents-too-long discusses the error in the context of -reshape-, but it seems pretty specific to that command.

    From the sound of the error, and the discussion in the above thread, I thought I had inadvertently tried to construct interactions with an id variable or something else with lots of values. The table at the bottom shows that this is not the case, since all of my categorical variables take on values between 0 and 4.

    Finally, the prefix() option of -xi- allows for a 4-character prefix. The largest prefix that results from the loop is _378, so it looks like I should be ok there.


    Code:
    . *** loop test
    . 
    . qui d,s
    
    . disp r(k)
    100
    
    . 
    . local n=0
    
    . capture drop _*
    
    . unab qtest : $dashqs
    
    . unab chtest : $chcat1
    
    . foreach q of local qtest {
      2.         foreach v of local chtest {
      3.                 local n=`n'+1
      4.                 qui xi i.`q'*i.`v',prefix(_`n') noomit
      5.         }
      6. }
    characteristic contents too long
        The maximum value of the contents is 67,784.
    
    . qui d _*,full
    
    . disp r(k)
    5446
    Code:
    . sum $dashqs
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
       question1 |     16,203    .3972721    .7096666          0          2
       question2 |     16,203    .6209344    .7598174          0          2
       question3 |     16,203    .6661729    .7682009          0          2
       question4 |     16,203    .4974387    .7898037          0          2
       question5 |     16,203    .5583534    .7951362          0          2
    -------------+---------------------------------------------------------
       question6 |     16,203    .8128742     .717489          0          2
       question7 |     16,203    .4954638    .7745655          0          2
       question8 |     16,203    .5668703     .790337          0          2
       question9 |     16,203    .5152132    .7729136          0          2
      question10 |     16,203    .5037956    .7752067          0          2
    -------------+---------------------------------------------------------
      question11 |     16,203    .4039375    .7872436          0          2
      question12 |     16,203    .4059742    .7896028          0          2
      question13 |     16,203    .5898908    .7954562          0          2
      question14 |     16,203    .5867432    .7968132          0          2
      question15 |     16,203    .6465469    .7919159          0          2
    -------------+---------------------------------------------------------
      question16 |     16,203    .4920694    .8082302          0          2
      question17 |     16,203    .5070049    .8101456          0          2
      question18 |     16,203    .5339752    .8102253          0          2
      question19 |     16,203    .4866383    .8143213          0          2
      question20 |     16,203    .4383756    .8070643          0          2
    -------------+---------------------------------------------------------
      question21 |     16,203    .5215701    .8104981          0          2
      question22 |     16,203     .438499    .8071502          0          2
      question23 |     16,203    .5563167    .8025451          0          2
      question24 |     16,203     .760785    .7557867          0          2
      question25 |     16,203    .5861877    .8089423          0          2
    -------------+---------------------------------------------------------
      question26 |     16,203    .5074986    .8086924          0          2
      question27 |     16,203     .826637    .7211523          0          2
    
    . sum $chcat1
    
        Variable |        Obs        Mean    Std. Dev.       Min        Max
    -------------+---------------------------------------------------------
    atvictimhome |     16,203    .7191261    .5818269          0          2
      expartners |     16,203    .6197001    .6068465          0          2
    familyperp~r |     16,203    .0002469    .0157106          0          1
    gendervictim |     16,203    .1627476    .3691467          0          1
    perpetrato~l |     16,203    .3179041    .4656764          0          1
    -------------+---------------------------------------------------------
    perpetrato~e |     16,203    1.435228    .9764152          0          4
      roleswitch |     16,203    .2380423    .4258983          0          1
     victimdrugs |     16,203    .0209838    .1433343          0          1
    victiminjury |     16,203      .10424     .305581          0          1
    victim_cat~e |     16,203    1.284762    .9585312          0          4
    -------------+---------------------------------------------------------
    victimalco~l |     16,203    .2095908    .4070292          0          1
    perpetrato~s |     16,203    .0767142    .2661456          0          1
    perpetrato~y |     16,203    .0271555    .1625413          0          1
    perpetrato~e |     16,203    1.435228    .9764152          0          4

  • #2
    If you are using the current version of Stata, there are native commands for doing lasso, and, of course, they support factor variable notation. So there is no reason to drag out the archaic -xi- command here. Just

    Code:
    local interactions
    foreach q of local qtest {
        foreach v of local chtest {
            local interactions `interactions' ibn.`q'##ibn.`v'
        }
    }
    will generate the main and interaction effects of all of these combinations in the local macro `interactions' which you can then list in the variable list for the appropriate lasso command.

    The use of -xi- should be reserved for those increasingly rare situations where a Stata command will not support factor variable notation. Such commands are, by now, few and far between, and most have been superseded by more modern commands that do accommodate factor-variable notation and accomplish the same purposes.

    If you are using an earlier version of Stata, you are asked to say so in your post and specify the version. Prior to version 16, lasso required a user-written command which, I believe, did not support factor-variable notation.

    Comment


    • #3
      Thanks for your response. I am using Stata 16; sorry for omitting that. Your code runs immensely faster than mine, which is great. It doesn't quite solve my problem, though, because it doesn't leave any variables behind. I need the actual interaction variables because many of them are quite rare, and I need to drop the rare ones before I feed to remainder to the lasso. Is there a way to actually leave the interaction terms in the data area? If so, I've missed it in the documentation. Finally, when I tried to use the second loop below to filter the rare interactions, it threw an error. I'd be grateful for any suggestions on that.

      Code:
      global interactions
      
      . foreach q of local qtest {
        2.     foreach v of local chtest {
        3.         global interactions $interactions ibn.`q'##ibn.`v'
        4.     }
        5. }
      
      . 
      . sum $interactions
      
          Variable |        Obs        Mean    Std. Dev.       Min        Max
      -------------+---------------------------------------------------------
         question1 |
                0  |     16,203    .7348022    .4414521          0          1
                1  |     16,203    .1331235    .3397186          0          1
      .
      .
      .
              2 0  |     16,203    .1841017    .3875791          0          1
              2 1  |     16,203    .0042585    .0651199          0          1
      
      . 
      . foreach v of global interactions {
        2.         qui sum `v'
        3.         if r(mean)<.05 | r(mean)>.95 {
        4.                 drop `v'
        5.         }
        6. }
      factor-variable and time-series operators not allowed
      r(101);
      
      end of do-file

      Comment


      • #4
        Well, I see this is an unusual circumstance. So the following code is more or less a bare-bones emulation of -xi- that should do the trick:

        Code:
        foreach q of local qtest {
            foreach v of local chtest {
                fvexpand ibn.`q'#ibn.`v'
                foreach v in `r(varlist)' {
                    local w: subinstr local v "." "_", all
                    local w: subinstr local w "#" "X", all
                    gen `=strtoname(`"ix_`w'"')' = `v'
                }
            }
        }
        It doesn't use characteristics, so you won't encounter that problem you had before.

        Comment


        • #5
          Yeah, you never know how far into the details you should go in the first post. Thanks a lot for your help.

          Comment


          • #6
            FWIW, I had to add the "capture drop" below to make this work. Thanks again.

            Code:
            foreach q of local qtest {
                foreach v of local chtest {
                    fvexpand ibn.`q'#ibn.`v'
                    foreach v in `r(varlist)' {
                        local w: subinstr local v "." "_", all
                        local w: subinstr local w "#" "X", all
                        capture drop `=strtoname(`"ix_`w'"')'
                        gen `=strtoname(`"ix_`w'"')' = `v'
                    }
                }
            }

            Comment

            Working...
            X