Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Labelling group patterns in concatenated variable

    Good morning. I would like some help regarding how to label concatenated variables. I have almost 4000 observations and I am a little confused about how to group the observed patterns. i have put an example below.


    tab vistfrfam_time if deceased==1

    vistfrfam_time Freq. Percent Cum.

    -1-1-1-1-1-1-1-1-1-1 7 0.19 0.19
    -1-1-1-1-1-1-1-1-1. 11 0.29 0.48
    -1-1-1-1-1-1-1-1.. 10 0.27 0.74
    -1-1-1-1-1-1-1... 27 0.72 1.46
    -1-1-1-1-1-1.... 29 0.77 2.23
    -1-1-1-1-1..... 31 0.82 3.05
    -1-1-1-1...... 60 1.59 4.65
    -1-1-1....... 96 2.55 7.19
    -1-1........ 120 3.19 10.38
    -8-9-9-9-9-1.... 1 0.03 10.41
    -81-1....... 1 0.03 10.43
    -82-921-1.... 1 0.03 10.46
    -82121-1.... 1 0.03 10.49
    -9-1........ 27 0.72 11.20
    -9-9-1....... 23 0.61 11.81
    -9-9-9-1...... 23 0.61 12.42
    -9-9-9-9-1..... 17 0.45 12.87
    -9-9-9-9-9-1.... 5 0.13 13.01
    -9-9-9-9-9-9-1... 8 0.21 13.22
    -9-9-9-9-9-9-9-1.. 4 0.11 13.33
    -9-9-9-9-9-9-9-9-1. 5 0.13 13.46
    -9-9-9-9-9-9-9-9-9-1 5 0.13 13.59
    ....-1-1-1-1-1-1 22 0.58 14.18
    ....-1-1-1-1-1. 23 0.61 14.79
    ....-1-1-1-1.. 22 0.58 15.37
    ....-1-1-1... 30 0.80 16.17
    ....-1-1.... 44 1.17 17.33
    ....-8-1.... 1 0.03 17.36
    ....-812-81-1 1 0.03 17.39
    ....-9-1.... 6 0.16 17.55
    ....-9-9-1... 2 0.05 17.60
    ....-9-9-9-1.. 9 0.24 17.84
    ....-9-9-9-9-1. 4 0.11 17.95
    ....-9-9-9-9-9-1 4 0.11 18.05
    ....1-1.... 111 2.95 21.00
    ....1-82-1.. 1 0.03 21.02
    ....1-9-1... 3 0.08 21.10
    ....1-9-9-1.. 3 0.08 21.18
    ....1-9-9-9-1. 2 0.05 21.24
    ....1-9-9-9-9-1 1 0.03 21.26
    ....1-911-1. 1 0.03 21.29
    ....11-1... 94 2.50 23.79
    ....11-9-1.. 5 0.13 23.92
    ....11-9-9-1. 1 0.03 23.94
    ....11-9-91-1 1 0.03 23.97
    ....111-1.. 73 1.94 25.91
    ....111-81-1 1 0.03 25.94
    ....111-9-1. 1 0.03 25.96
    ....111-9-9-1 1 0.03 25.99
    ....1111-1. 64 1.70 27.69
    ....1111-9-1 3 0.08 27.77
    ....11111-1 41 1.09 28.86
    ....11112-1 8 0.21 29.07
    ....1112-1. 9 0.24 29.31
    ....11121-1 10 0.27 29.57
    ....11122-1 2 0.05 29.63
    ....112-1.. 11 0.29 29.92
    ....1121-1. 6 0.16 30.08

  • #2
    I think we need (much) more context. For example is - a state or a separator? More generally, what are the states? Would you regard aaaab aaab aab ab as all ab? And so on.

    Comment


    • #3
      Oh, thank you. For context, I would like to understand how patterns of relationships with family and friends is associated with quality of life. In the first code here (below), I concatenated my independent variable (family and friends) over a period of ten years.
      egen vistfrfam_time = concat(pa1vistfrfam pa2vistfrfam pa3vistfrfam pa4vistfrfam ///
      pa5vistfrfam pa6vistfrfam pa7vistfrfam pa8vistfrfam ///
      pa9vistfrfam pa10vistfrfam).

      next, I want to classify these into trajectory groups such as here: stays the same (1-1-1-1-1), decreases then grows (1-2-2-2-1-1-1), increases (1-1-2-1-1-1-1-1-1-1-2-1) or varies (1-2-1-2-1-2-1). (high initial levels of social engagement with slight decrease over time, high initial levels of social engagement-moderately decreasing, high initial levels of social engagement-slight increase, medium initial levels of social engagement with increase over time, low initial levels of social engagement, and decreasing levels of social engagement over time). I would like an efficient way to classify these.


      The - separates the time period.

      Thank you


      Comment


      • #4
        Thanks for the detail I don't really follow yet. You have some cases in which there are 10 values of -1 for example, yet - doesn't appear uniformly or consistentlyt. I am broadly familiar with the concat() function of egen (as its original author) but I don't understand what is going on there given the syntax you cite.

        I think we need a data example Please show the results of

        Code:
        dataex pa*vistfrfam

        Comment


        • #5
          Here it is:

          tab pa*vistfrfam if deceased ==1
          mvdecode pa*vistfrfam if deceased==1, mv(-9 = .a \ -8 = .b \ -7 = .c)
          dataex pa*vistfrfam

          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input double(pa1vistfrfam pa2vistfrfam pa3vistfrfam pa4vistfrfam    pa5vistfrfam    pa6vistfrfam    pa7vistfrfam    pa8vistfrfam    pa9vistfrfam    pa10vistfrfam)
          1  1  1  1  1  1  1  2  1 -1
          1  .  .  .  .  .  .  .  .  .
          1  .  .  .  .  .  .  .  .  .
          1  .  .  .  .  .  .  .  .  .
          1  1  1 -9  .  .  .  .  .  .
          1  1  1  1  1  1  1  1  1  1
          -1 -1  .  .  .  .  .  .  .  .
          1  1  1  1  1  1 -1  .  .  .
          -1 -1 -1  .  .  .  .  .  .  .
          1  1  1 -1  .  .  .  .  .  .
          -1 -1  .  .  .  .  .  .  .  .
          1  .  .  .  .  .  .  .  .  .
          1  1  2  1  1  1  1 -1  .  .
          1  1  1  1  1  1  1  1  1  1
          2  1  1  .  .  .  .  .  .  .
          1  1  1  1  1  1  1  1  1  2
          1  1  1  1  1  1  1  1  1  1
          1  1  1  1  1 -1  .  .  .  .
          1  1  1  .  .  .  .  .  .  .
          1  .  .  .  .  .  .  .  .  .
          2  2  1  1  1  1  2 -1  .  .
          1  1  1  .  .  .  .  .  .  .
          1  1  .  .  .  .  .  .  .  .
          1  1  1  1  2  2  2  2  2 -1
          1  1  1  1  1  1  2  1  1  1
          1  1  .  .  .  .  .  .  .  .
          2  1  1  1  1  1  2  2 -1  .
          1  1  1  1  1  1  .  .  .  .
          1  2  1  1  1  1  2  1  1  2
          1  1 -1  .  .  .  .  .  .  .
          1  1  2  1  1  1  1  1  1  1
          1  1  1  1  1 -1  .  .  .  .
          2  2 -1  .  .  .  .  .  .  .
          .a .a .a -1  .  .  .  .  .  .
          1  1  2  2  1  2  2  1  2  2
          2  2 .a -1  .  .  .  .  .  .
          1  2  1  2  1  2  2  1  2  1
          2  1  2  1  1  1  2  2  .  .
          1  1  1  1  1  1  1  1  1  1
          1  .  .  .  .  .  .  .  .  .
          1  1  1  1  1  1  1  1  1  2
          1  .  .  .  .  .  .  .  .  .
          2  1  2  1  2  1  2  2  2  2
          1  1  1  1  1  1  1  1  1  1
          1  1  1  1  1  1  1  1  2  1
          1  1  1  1  1  1  1  1  1  2
          1  1  1  2  1  2  2  1  2  1
          .a .a .a .a -1  .  .  .  .  .
          1  1  1  1  2  1  1  2 -1  .
          -1 -1 -1 -1 -1  .  .  .  .  .
          1  1  1  1  1 -1  .  .  .  .
          1  1  .  .  .  .  .  .  .  .
          1  1  2  .  .  .  .  .  .  .
          1 -1  .  .  .  .  .  .  .  .
          1  1  1  1  .  .  .  .  .  .
          1  .  .  .  .  .  .  .  .  .
          1  1 -8  .  .  .  .  .  .  .
          2  1  2  1  2  2  2  2  1  2
          1  1  1  1  1  1  1  1  1  1
          2  .  .  .  .  .  .  .  .  .
          1  .  .  .  .  .  .  .  .  .
          1  1  1  1  1  1  1  1  1  1
          1  .  .  .  .  .  .  .  .  .
          1  1  1  2  1  2 -1  .  .  .
          -1 -1 -1 -1 -1  .  .  .  .  .
          1  1  1  1  2 -1  .  .  .  .
          1  1  1  1  1  1  1  1  1  1
          1  1  1  1  2  2  1  2  1  1
          1  1  1 -1  .  .  .  .  .  .
          1  .  .  .  .  .  .  .  .  .
          2  .  .  .  .  .  .  .  .  .
          1  1  1  1  .  .  .  .  .  .
          1  1  1  1  1  1  1  1  1  1
          1  1  1  1  1  1  1  1  1  1
          1  1  1  1  .  .  .  .  .  .
          1  .  .  .  .  .  .  .  .  .
          1  1  1  1  1  1  1  1  1  2
          1  1  1  1  1  1  1  1  1 -1
          .a .a -1  .  .  .  .  .  .  .
          1  2  .  .  .  .  .  .  .  .
          1  1  .  .  .  .  .  .  .  .
          2  .  .  .  .  .  .  .  .  .
          2 -1  .  .  .  .  .  .  .  .
          1  1  1  1  1  1  1  .  .  .
          1  1  .  .  .  .  .  .  .  .
          1  1  1  2  1  1  1  1  1  2
          2  1  1  .  .  .  .  .  .  .
          1  1  1  1  1  1  1  1  1  1
          1  1  1  .  .  .  .  .  .  .
          -1 -1  .  .  .  .  .  .  .  .
          1  1  1  2  1  2  1  1  1  2
          1  1  1  1  1  1  1  .  .  .
          1  1  1  1  1  1 -1  .  .  .
          1  1  1  1  .  .  .  .  .  .
          1 -1  .  .  .  .  .  .  .  .
          1  .  .  .  .  .  .  .  .  .
          2 -1  .  .  .  .  .  .  .  .
          1  2  1  1  2  1  1  1  1  2
          -1 -1 -1 -1  .  .  .  .  .  .
          1  1  .  .  .  .  .  .  .  .
          end
          label values pa1vistfrfam pa1vistfrfam
          label def pa1vistfrfam -1 "-1 Inapplicable", modify
          label def pa1vistfrfam 1 " 1 YES", modify
          label def pa1vistfrfam 2 " 2 NO", modify
          label values pa2vistfrfam pa2vistfrfam
          label def pa2vistfrfam -1 "-1 Inapplicable", modify
          label def pa2vistfrfam 1 "1 YES", modify
          label def pa2vistfrfam 2 "2 NO", modify
          label values pa3vistfrfam pa3vistfrfam
          label def pa3vistfrfam -8 "-8 DK", modify
          label def pa3vistfrfam -1 "-1 Inapplicable", modify
          label def pa3vistfrfam 1 "1 YES", modify
          label def pa3vistfrfam 2 "2 NO", modify
          label values pa4vistfrfam pa4vistfrfam
          label def pa4vistfrfam -9 "-9 Missing", modify
          label def pa4vistfrfam -1 "-1 Inapplicable", modify
          label def pa4vistfrfam 1 "1 YES", modify
          label def pa4vistfrfam 2 "2 NO", modify
          label values pa5vistfrfam pa5vistfrfam
          label def pa5vistfrfam -1 "-1 Inapplicable", modify
          label def pa5vistfrfam 1 "1 YES", modify
          label def pa5vistfrfam 2 "2 NO", modify
          label values pa6vistfrfam pa6vistfrfam
          label def pa6vistfrfam -1 "-1 Inapplicable", modify
          label def pa6vistfrfam 1 "1 YES", modify
          label def pa6vistfrfam 2 "2 NO", modify
          label values pa7vistfrfam pa7vistfrfam
          label def pa7vistfrfam -1 "-1 Inapplicable", modify
          label def pa7vistfrfam 1 "1 YES", modify
          label def pa7vistfrfam 2 "2 NO", modify
          label values pa8vistfrfam pa8vistfrfam
          label def pa8vistfrfam -1 "-1 Inapplicable", modify
          label def pa8vistfrfam 1 "1 YES", modify
          label def pa8vistfrfam 2 "2 NO", modify
          label values pa9vistfrfam pa9vistfrfam
          label def pa9vistfrfam -1 "-1 Inapplicable", modify
          label def pa9vistfrfam 1 "1 YES", modify
          label def pa9vistfrfam 2 "2 NO", modify
          label values pa10vistfrfam pa10vistfrfam
          label def pa10vistfrfam -1 "-1 Inapplicable", modify
          label def pa10vistfrfam 1 "1 YES", modify
          label def pa10vistfrfam 2 "2 NO", modify
          copy up to and including the previous line -- ------------ --

          Comment


          • #6
            Thanks for your data example, which shows that the data can be messy, which no doubt is not your fault. Also, the negative signs are not time separators at all: they are part of codes -1 -8 -9.

            Two character codes are hard to read alongside one character codes, so for at least some purposes I would recommend conversion to single characters. That is my one concrete suggestion:

            Code:
            egen all = concat(pa*vistfrfam) 
            
            replace all = subinstr(all, "-1", "N", .) 
            
            replace all = subinstr(all, "-9", "M", .) 
            
            replace all = subinstr(all, "-8", "K", .) 
            
            replace all = subinstr(all, ".a", "A", .)
            
            compress
            On the last: holding those codes as double is not needed at all.

            I don't see in your posts any comment on how you intend to tackle the large volume of missing values. Your categories seem to overlap or be vaguely defined (e.g. "varies" as compared with some of the other categories).

            I think I have to stop there because I don't see a precise problem to suggest code and how to analyse data like this is completely beyond my expertise and experience.

            Comment


            • #7
              Thank you for the above-suggested code. I have a couple more questions: I just tried the code you provided above, and I find that it is converting all of the variables in my dataset from double to byte, is there an advantage that this gives me as it is changing the original format of the data? Second, the missing values represent those who are deceased in the dataset and so they have no data to report for that variable at that time. since this is the case, i cannot input the data. given that this is the data i have, is there a code for generating labels for observed patterns after concatenation, even if it is 2 or more categories?

              Comment


              • #8
                for reference, -1, -7, -8, -9 all are either missing or inapplicable

                Comment

                Working...
                X