Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Separating string variable to multiple variables

    Hi,

    I have a string variable of varyimg number of characters, for which each sequential 4 characters contains two pieces of information:

    01T121T431L4

    In this example, the the first two characters correspond to a subject("01") a grade ("T1"), the second ("21" "T4") and so on. There are 90 levels of subject and 10 levels of grade. I want to create 90 variables subjects, with 10 levels for each possible grade.

    The first step I've taken is to separate out the string variable to 10 string variables:

    gen subject1 =substr(subjectgrade, 1,4)
    gen subject2 =substr(subjectgrade, 5,4)
    gen subject3 =substr(subjectgrade, 9,4)
    gen subject4 =substr(subjectgrade, 13,4)
    gen subject5=substr(subjectgrade, 17,4)
    gen subject6 =substr(subjectgrade, 21,4)
    gen subject7 =substr(subjectgrade, 25,4)
    gen subject8 =substr(subjectgrade, 29,4)
    gen subject9 =substr(subjectgrade, 33,4)
    gen subject10 =substr(subjectgrade, 37,4)

    I'm unsure as to the best next step, but think I need to create 90 variables based on all possible subjects 1-90 and then recode these strings for the level of the variable. e.g the string characters "01T1" are in new variable Subject01 and a value "T1". I've tried using strpos to capture the subject, but without success.

    Any advice or recommendations would be most welcome. I hope it's reasonably clear what I'm trying to do, but let me know if not.

    Thanks.





  • #2
    Best advice will depend on a real or at least realistic data example and some idea of what you want to do. For example, being familiar with your own data probably leads you to not spelling out details that are to you utterly obvious including (1) that these are data on people (?) so you have an identifier variable (?) (2) at the same time or different times and is that of importance, and so on.

    In broad terms lots and lots of variables in so-called wide layout are often much more awkward to deal with than you hope but there can be exceptions.

    Comment


    • #3
      Hi Nick,

      Thanks.

      1. Yes
      2. Yes
      3. Same time, crosssectional

      Student grade data for a set of exams. 90 possible exam subjects, 10 possible grades. Students typically sit between 5 and 10 exams, so that where 4 characters represents the exam and the grade, stringg will have between 20 and 40 characters.



      Comment


      • #4
        On what you've told us so far I would go for a long layout, and this is schematic code. You've get to give a very full real or realistic example

        Code:
        clear
        input str4 id str12 whatever 
        "ABCD" "01T121T431L4"
        end 
        
        gen length = strlen(whatever)
        su length, meanonly 
        
        local J = r(max)/4 
        
        forval j = 1/`J' {
            gen subject`j' = substr(whatever, 1 + 4*(`j' - 1), 2)
            gen grade`j' = substr(whatever, 3 + 4*(`j' - 1), 2) 
        }
        
        reshape long subject grade, i(id) j(which)
        
        list, sepby(id)
        
             +--------------------------------------------------------+
             |   id   which       whatever   length   subject   grade |
             |--------------------------------------------------------|
          1. | ABCD       1   01T121T431L4       12        01      T1 |
          2. | ABCD       2   01T121T431L4       12        21      T4 |
          3. | ABCD       3   01T121T431L4       12        31      L4 |
             +--------------------------------------------------------+
        This structure is compatible with comparisons between subjects and between individuals. As no one presumably studies all 90 subjects, Handling 180 different variables is a nightmare even to this fairly experienced Stata user (32 years, almost) but there are people here with several years more!

        Comment


        • #5
          Thank you Nick, that's a really helpful schematic, and I have got it working. Yes it's a bit of a mess to work with!

          Comment

          Working...
          X