Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with sequencing a string variable

    clear
    input str100 status
    "2222222200000LLLLLLBPPPPPPPPP000000"
    "PPPPPPPPP000000000BPPPPPPPP00000000"
    end

    Hello,

    I have a string variable, status, which represents 36 months of a woman's reproductive status, and includes pregnancies, births, contraceptive use, and non-use.

    Contraceptive use is indicated through a character that relates to a specific method, potential values include : 1 2 3 4 5 6 7 8 9 W N L C E S.

    Pregnancies are indicated through "P" for the months pregnant and "B" for birth, "T" for termination.

    “0” indicates no pregnancy or contraceptive use.

    For example, here is a woman's value (read from right to left).

    2222222200000LLLLLLBPPPPPPPPP000000

    You can see she had 6 months of non-use, followed by 9 months of pregnancy and a birth. After that she used method "L" for 6 months and then did not use for 5 months (0s), and then used method "2" for 8 months.


    I'd like a way to create variable(s) that show the ordered sequences of the transitions a woman experiences; for example:

    for person/observation/row 1: 6 months of "0"; 9 months of “P”; 1 month of “B”; 6 months of “L” ; 5 months of “0”; 8 months of “2”.
    for person/observation/row 2: 8 months of “0”; 8 months of “P” ; 1 month of “B”; 9 months of “0”; 9 months of “P”

    Ideally, I’d have three variables: one variable with the order number of the sequence (1,2,3,4, etc); another variable with the respective code for the sequence (e.g. 0, P, B, 1 2 3 4 5 6 7 8 9 W N L C E S, etc), and a variable for the duration of the status in months.


    Thank you so much in advance for your help!

    Dana



  • #2
    Code:
    replace status = reverse(trim(status))
    gen length = strlen(status)
    summ length, meanonly
    local max_length = r(max)
    drop length
    
    gen `c(obs_t)' id = _n
    
    forvalues i = 1/`max_length' {
        gen code`i' = substr(status, `i', 1)
    }
    drop status
    
    reshape long code, i(id) j(_j)
    by id (_j code), sort: gen seq = sum(code != code[_n-1])
    by id seq, sort: gen duration = _N
    by id seq: keep if _n == 1
    drop _j
    Note: Your full data set probably already contains an id variable. In that case, don't bother with -gen `c(obs_t)' id = _n-, and just replace all references to id in later commands with the name of your actual id variable.

    Added: if all values of the status variable contain exactly 36 characters, then everything from -gen length = strlen(status)- through -drop length- can be omitted. And in that case in the -forvalues i = 1/`max_length' {- command, replace `max_length' with 36.
    Last edited by Clyde Schechter; 01 Feb 2023, 16:26.

    Comment


    • #3
      Thank you so much!

      Comment

      Working...
      X