Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    OK. The variable you show is a numeric variable with a value label attached. So what you want to split is the value labels.

    Code:
    decode congdist, gen(str_congdist)
    gen state_code = substr(str_congdist, 1, 2)
    gen district_code = substr(str_congdist, 3, .)
    
    //  AND, OPTIONALLY, IF YOU PREFER DISTRICT_CODE TO BE NUMERIC:
    destring district_code, replace
    It is important to remember that in Stata, what you see, is not always what you get. Just because something looks like a string in the browser, or in Stata output, doesn't necessarily mean it's a string variable. It can, instead, be a value-labeled numeric variable, or, sometimes a date or clock variable with an alphanumeric display format attached. It is important to distinguish between these. This is perhaps simplest to do in the browser. True string variables in the browser display in a reddish brown color. Value labeled numeric variables show in blue. And display formatted date variables display in black. (These are the default color assignments: if you have changed those assignments when setting your Stata preferences, then they may differ from what I have said.) String functions only apply to string variables. To apply string values to things that look like strings but are really value-labeled numeric variables, you have to first -decode- the variable to create a true string, and then apply the string function to the newly created string variable.

    Comment


    • #17
      Sorry to trouble people but I am having trouble splitting a variable into two parts. The variable is state and congressional district, listed in the data set as AL03 for the third district in Alabama, as in:

      . dataex cdnew congdist in 1/5

      ----------------------- copy starting from the next line -----------------------
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str4 cdnew long congdist
      "AL03" 1
      "AL03" 1
      "AL03" 1
      "AL03" 1
      "AL03" 1
      end
      label values congdist congdist
      label def congdist 1 "AL03", modify
      ------------------ copy up to and including the previous line ------------------

      Listed 5 out of 1980 observations

      I have tried several attempts at split to get two variables with the first case being "AL" and "03" but can't get it to work. Any help would be appreciated.

      Ric Uslaner

      Comment


      • #18
        Thank you Clyde. Your suggestion worked wonderfully. I really appreciate your help.

        Ric

        Comment


        • #19
          #18 to Clyde Schechter appears to be an answer to #16.

          split requires information on parsing characters if parsing is not on spaces. But you have no parsing characters to specify.

          Did you look at the documentation? Your circumstances are explained in the manual entry.

          If your problem is not defined by splitting on separators, you will probably want to use substr()
          directly. Suppose that you have a string variable, date, containing dates in the form ”21011952” so that
          the last four characters define a year. This string contains no separators. To extract the year, you would
          use substr(date,-4,4). Again suppose that each woman’s obstetric history over the last 12 months
          was recorded by a str12 variable containing values such as ”nppppppppbnn”, where p, b, and n denote
          months of pregnancy, birth, and nonpregnancy. Once more, there are no separators, so you would use
          substr() to subdivide the string.
          I wrote split before it was absorbed as an official command, and wrote what became that text. More importantly,
          I did ponder extending split so that it covered that kind of use, but decided that it would complicate the syntax mightily.
          StataCorp haven't differed on that point.

          So, you need something like

          Code:
          gen state = substr(cdnew, 1, 2) 
          gen district = substr(cdnew, 3, .) 
          where the district code allows for the possibility of descriptors longer than 2 characters.






          Comment


          • #20
            Euslaner I often find your posts hard to follow. Looking again at Clyde's post #16 it already contains the answer to the question you then asked in #17. Is that what you were saying in #18?

            Statalist is working well if the answers precede the questions.

            Turn and turn about, I didn't read Clyde's answer carefully because I was presuming that you must be asking a new question.

            Comment

            Working...
            X