Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • -tabsplit- is creating empty string variables

    I am using -tabsplit- to try and create dummy variables from a multi-selection string variable, "features". The variable "features" lists up to 11 features separated by commas (e.g. "P2P, Bill payment, Merchant payment, Link to other banking products, Other bulk payment, Airtime top up"). I am using the following code to try and create the dummy variables:

    tabsplit features, sort p(,) gen(feature_)

    The output is the following:

    Parts Freq. Percent Cum.

    P2P 268 17.54 17.54
    Airtime top up 240 15.71 33.25
    Bill payment 219 14.33 47.58
    Merchant payment 166 10.86 58.44
    Cash out 157 10.27 68.72
    Cash in 153 10.01 78.73
    Other bulk payment 128 8.38 87.11
    International remittances 89 5.82 92.93
    Link to other banking products 53 3.47 96.40
    Loan disbursement or repayment 24 1.57 97.97
    G2P 20 1.31 99.28
    Mobile microinsurance 11 0.72 100.00

    Total 1,528 100.00

    11 variables are created: feature_1 - feature_11. Unfortunately, feature_1 - feature_11 are empty string variables. I would have expected feature_1 - feature_11 to be filled with the respective spring 'splits'.

    Thanks for your help!
    -Kyle

  • #2
    Have you tried using destring on features before trying to generate the dummies?

    Comment


    • #3
      tabsplit is from tab_chi (SSC), as you are asked to explain.

      http://www.statalist.org/forums/help#stata

      If you are using user-written commands, explain that and say where they came from: the Stata Journal, SSC, or other archives. This helps (often crucially) in explaining your precise problem, and it alerts readers to commands that may be interesting or useful to them.

      Here are some examples:
      I am using xtreg in Stata 13.1.
      I am using estout from SSC in Stata 13.1.
      You are using generate() as an option, presumably, because you saw that tabsplit supports options of tabulate. But there was a warning in the help:

      tabulate_options are options of tabulate with one variable. The most
      useful in practice is sort. Note that the table is based on a
      temporary dataset which does not remain in memory after tabsplit has
      finished.
      So, you can generate extra variables but there is no point in doing so because they do not remain in memory after tabsplit has finished. Further, there is no point in doing so because they don't correspond to your present data structure. At least, that is what I think should happen but I am only the author of the program. I don't know where you are seeing 11 empty variables.

      I can't relate fully to your working example because the data are not easily copied and pasted into Stata. See again FAQ #12 which does explain how examples should be given.

      But here is how I would approach this:

      Code:
      clear
      input str21 beasts
      "frog"
      "toad,frog"
      "frog,newt,toad"
      "frog,toad,newt,dragon"
      end
      
      split beasts, p(,) generate(b_)
      
      foreach v of var b_* {
          levelsof `v', local(b) clean
          local beasts : list beasts | b
      }
      
      local beasts : list sort beasts
      
      di "`beasts'"
      
      local j = 1
      gettoken b beasts : beasts
      while "`b'" != "" {
         gen b`j' = strpos(beasts, "`b'") > 0
         label var b`j' "`b'"
         local ++j
         gettoken b beasts : beasts
      }
      
      list
      
           +-------------------------------------------------------------------------+
           |                beasts    b_1    b_2    b_3      b_4   b1   b2   b3   b4 |
           |-------------------------------------------------------------------------|
        1. |                  frog   frog                           0    1    0    0 |
        2. |             toad,frog   toad   frog                    0    1    0    1 |
        3. |        frog,newt,toad   frog   newt   toad             0    1    1    1 |
        4. | frog,toad,newt,dragon   frog   toad   newt   dragon    1    1    1    1 |
           +-------------------------------------------------------------------------+

      Alberto's suggestion can't help here as it doesn't address the main problem with your approach.

      I haven't concocted an example in which the words have embedded spaces. Note that split is necessarily sensitive to differences in spelling and punctuation, other than leading and trailing spaces, which are trimmed by default.

      Comment

      Working...
      X