Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating variable to track individual's history over periods of time

    Dear all, hopefully the dataex example works correctly to illustrate the dummy data set. I have pasted both the dataex code and have also attached an image of the dummy dataset for a quick view for better understanding of the data.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str2 ID str5 Month float HasMembership
    "A" "APRIL" 0
    "B" "APRIL" 1
    "C" "APRIL" 0
    "D" "APRIL" 1
    "E" "APRIL" 0
    "F" "APRIL" 1
    "A" "MAY"   1
    "B" "MAY"   1
    "C" "MAY"   1
    "D" "MAY"   1
    "E" "MAY"   1
    "F" "MAY"   1
    "A" "JUNE"  1
    "B" "JUNE"  0
    "C" "JUNE"  1
    "D" "JUNE"  0
    "E" "JUNE"  1
    "F" "JUNE"  0
    end
    How do I generate a variable AprilToMay to indicate the change in an individual possessing membership between April to May? For example, individual A didn't have membership in April and now has membership in May, so the variable AprilToMay should be "Gained membership" for individual A. Similarly, individual B already had membership in April and also has membership in May, so AprilToMay should be "Retained membership" for individual B.

    Similarly I'd use the same code to generate a variable MayToJune to describe individuals who retained membership and who lost their membership between May and June.

    This is a dummy dataset, the actual dataset has a lot more time periods and a lot more individuals, and a lot more parameters than membership.

    Thanks in advance!
    Attached Files

  • #2
    A good answer to your question will depend on the following:

    1) Seeing the way your actual time variable is coded. Converting it to a Stata date variable or at least some variable that puts dates into time order will be necessary, but doing this for your example data will not likely be helpful in solving the problem in your actual data.

    2) Knowing more completely the kind of coding scheme you want for transitions in individuals' statuses. We don't, for example, know how you would handle someone with observations for April, May, and June. In your actual data, which I suppose might have 10 or more time periods, this issue would be even more complex. So, we need to know the possible values you want for your transition variable, with a precise definition for each.

    Thanks for providing a -dataex- example, but note that your screenshot doesn't provide any useful information beyond it, so per the StataList FAQ, you'll want to avoid graphical screenshots of data or code in any further postings.

    Comment


    • #3
      Here is one approach:

      Code:
      gen Month_Int = .
      forv x = 1/12 {
          replace Month_Int = `x' if Month == upper("`:word `x' of `c(Months)''")
      }
      
      local list Gained Retained Lost  "Retained Non"
      foreach var of varlist HasMembership{
          forv m = 2/12{
              bysort ID: egen `var'_`m' = max(cond(Month_Int==`m',`var',.))
              bysort ID: egen `var'_`=`m'-1' = max(cond(Month_Int==`=`m'-1',`var',.))
              local conditions `""`var'_`m'==1 & `var'_`=`m'-1'==0" "`var'_`m'==1 & `var'_`=`m'-1'==1" "`var'_`m'==0 & `var'_`=`m'-1'==1" "`var'_`m'==0 & `var'_`=`m'-1'==0""'
              gen `:word `=`m'-1' of `c(Mons)''_to_`:word `m' of `c(Mons)''_`var' = .
              forv x = 1/4{
                   bysort ID (Month_Int): replace  `:word `=`m'-1' of `c(Mons)''_to_`:word `m' of `c(Mons)''_`var' = `x' if `:word `x' of `conditions''
                   if `m'==2 label define `var'_lab `x' "`:word `x' of `list'' `var'", add
              }
              label values  `:word `=`m'-1' of `c(Mons)''_to_`:word `m' of `c(Mons)''_`var' `var'_lab `x'
              drop `var'_*
          }
      }
      This results in:

      Code:
           +-----------------------------------------------------------------------------+
           | ID   Month   HasMem~p     Apr_to_May_HasMembership   May_to_Jun_HasMember~p |
           |-----------------------------------------------------------------------------|
        1. |  A   APRIL          0   Retained Non HasMembership     Gained HasMembership |
        2. |  A     MAY          0   Retained Non HasMembership     Gained HasMembership |
        3. |  A    JUNE          1   Retained Non HasMembership     Gained HasMembership |
           |-----------------------------------------------------------------------------|
        4. |  B   APRIL          1       Retained HasMembership       Lost HasMembership |
        5. |  B     MAY          1       Retained HasMembership       Lost HasMembership |
        6. |  B    JUNE          0       Retained HasMembership       Lost HasMembership |
           |-----------------------------------------------------------------------------|
        7. |  C   APRIL          0         Gained HasMembership   Retained HasMembership |
        8. |  C     MAY          1         Gained HasMembership   Retained HasMembership |
        9. |  C    JUNE          1         Gained HasMembership   Retained HasMembership |
           |-----------------------------------------------------------------------------|
       10. |  D   APRIL          1       Retained HasMembership       Lost HasMembership |
       11. |  D     MAY          1       Retained HasMembership       Lost HasMembership |
       12. |  D    JUNE          0       Retained HasMembership       Lost HasMembership |
           |-----------------------------------------------------------------------------|
       13. |  E   APRIL          0         Gained HasMembership   Retained HasMembership |
       14. |  E     MAY          1         Gained HasMembership   Retained HasMembership |
       15. |  E    JUNE          1         Gained HasMembership   Retained HasMembership |
           |-----------------------------------------------------------------------------|
       16. |  F   APRIL          1       Retained HasMembership       Lost HasMembership |
       17. |  F     MAY          1       Retained HasMembership       Lost HasMembership |
       18. |  F    JUNE          0       Retained HasMembership       Lost HasMembership |
           +-----------------------------------------------------------------------------+
      I only show the April/May variables here, but variables are generated for all month-to-month transitions. Where there is no such transition in the data, the variables are missing. The transition variables are numeric with a label applied, which is coded as follows:

      Code:
                 1 Gained HasMembership
                 2 Retained HasMembership
                 3 Lost HasMembership
                 4 Retained Non HasMembership
      You mentioned that there are more variables than just HasMembership. To account for those, simply add them to the list defined in line 7 ("foreach var of varlist HasMembership{").
      Last edited by Ali Atia; 21 Jun 2022, 08:37.

      Comment


      • #4
        Originally posted by Mike Lacy View Post
        A good answer to your question will depend on the following:

        1) Seeing the way your actual time variable is coded. Converting it to a Stata date variable or at least some variable that puts dates into time order will be necessary, but doing this for your example data will not likely be helpful in solving the problem in your actual data.

        2) Knowing more completely the kind of coding scheme you want for transitions in individuals' statuses. We don't, for example, know how you would handle someone with observations for April, May, and June. In your actual data, which I suppose might have 10 or more time periods, this issue would be even more complex. So, we need to know the possible values you want for your transition variable, with a precise definition for each.

        Thanks for providing a -dataex- example, but note that your screenshot doesn't provide any useful information beyond it, so per the StataList FAQ, you'll want to avoid graphical screenshots of data or code in any further postings.
        Thank you Mike, I'll avoid any graphical attachments hereon.

        I'll post the updated dataex here, which converts the time variable from a string variable Month to a numeric variable just describing the time period (1, 2 or 3).
        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input str2 ID byte month float HasMembership
        "A" 1 0
        "B" 1 1
        "C" 1 0
        "D" 1 1
        "E" 1 0
        "F" 1 1
        "A" 2 1
        "B" 2 1
        "C" 2 1
        "D" 2 1
        "E" 2 1
        "F" 2 1
        "A" 3 1
        "B" 3 0
        "C" 3 1
        "D" 3 0
        "E" 3 1
        "F" 3 0
        end
        The main treatment is that individuals have different social identities (gender, religion, migration status) and the overall aim is to see what kind of individuals have retained membership, lost membership, temporarily had it etc etc.

        The final transition variable can be numeric and labeled (eg: 1 for gained, 2 for retained, 3 for lost), if that makes coding simpler instead of having a string variable.

        The overall aim is for example, for 10 periods, to see if someone was denied membership for 9 periods and finally granted membership in the last period - to get an indication of the social treatment. If I can generate the transition variables from period 1 to period 2, and from period 2 to period 3 and so on, I can generate the overall membership history.

        Thanks in advance, and I hope I was able to clear the confusion over how the transition variable will be treated

        Comment


        • #5
          Originally posted by Ali Atia View Post
          Here is one approach:

          Code:
          gen Month_Int = .
          forv x = 1/12 {
          replace Month_Int = `x' if Month == upper("`:word `x' of `c(Months)''")
          }
          
          local list Gained Retained Lost "Retained Non"
          foreach var of varlist HasMembership{
          forv m = 2/12{
          bysort ID: egen `var'_`m' = max(cond(Month_Int==`m',`var',.))
          bysort ID: egen `var'_`=`m'-1' = max(cond(Month_Int==`=`m'-1',`var',.))
          local conditions `""`var'_`m'==1 & `var'_`=`m'-1'==0" "`var'_`m'==1 & `var'_`=`m'-1'==1" "`var'_`m'==0 & `var'_`=`m'-1'==1" "`var'_`m'==0 & `var'_`=`m'-1'==0""'
          gen `:word `=`m'-1' of `c(Mons)''_to_`:word `m' of `c(Mons)''_`var' = .
          forv x = 1/4{
          bysort ID (Month_Int): replace `:word `=`m'-1' of `c(Mons)''_to_`:word `m' of `c(Mons)''_`var' = `x' if `:word `x' of `conditions''
          if `m'==2 label define `var'_lab `x' "`:word `x' of `list'' `var'", add
          }
          label values `:word `=`m'-1' of `c(Mons)''_to_`:word `m' of `c(Mons)''_`var' `var'_lab `x'
          drop `var'_*
          }
          }
          This results in:

          Code:
          +-----------------------------------------------------------------------------+
          | ID Month HasMem~p Apr_to_May_HasMembership May_to_Jun_HasMember~p |
          |-----------------------------------------------------------------------------|
          1. | A APRIL 0 Retained Non HasMembership Gained HasMembership |
          2. | A MAY 0 Retained Non HasMembership Gained HasMembership |
          3. | A JUNE 1 Retained Non HasMembership Gained HasMembership |
          |-----------------------------------------------------------------------------|
          4. | B APRIL 1 Retained HasMembership Lost HasMembership |
          5. | B MAY 1 Retained HasMembership Lost HasMembership |
          6. | B JUNE 0 Retained HasMembership Lost HasMembership |
          |-----------------------------------------------------------------------------|
          7. | C APRIL 0 Gained HasMembership Retained HasMembership |
          8. | C MAY 1 Gained HasMembership Retained HasMembership |
          9. | C JUNE 1 Gained HasMembership Retained HasMembership |
          |-----------------------------------------------------------------------------|
          10. | D APRIL 1 Retained HasMembership Lost HasMembership |
          11. | D MAY 1 Retained HasMembership Lost HasMembership |
          12. | D JUNE 0 Retained HasMembership Lost HasMembership |
          |-----------------------------------------------------------------------------|
          13. | E APRIL 0 Gained HasMembership Retained HasMembership |
          14. | E MAY 1 Gained HasMembership Retained HasMembership |
          15. | E JUNE 1 Gained HasMembership Retained HasMembership |
          |-----------------------------------------------------------------------------|
          16. | F APRIL 1 Retained HasMembership Lost HasMembership |
          17. | F MAY 1 Retained HasMembership Lost HasMembership |
          18. | F JUNE 0 Retained HasMembership Lost HasMembership |
          +-----------------------------------------------------------------------------+
          I only show the April/May variables here, but variables are generated for all month-to-month transitions. Where there is no such transition in the data, the variables are missing. The transition variables are numeric with a label applied, which is coded as follows:

          Code:
          1 Gained HasMembership
          2 Retained HasMembership
          3 Lost HasMembership
          4 Retained Non HasMembership
          You mentioned that there are more variables than just HasMembership. To account for those, simply add them to the list defined in line 7 ("foreach var of varlist HasMembership{").
          Thank you so much Ali, this was very helpful!
          Any idea how to modify the code to represent months over a couple of years, instead of months in just one year? Or to represent time periods such as time period 1, 2, 3 which are not necessarily in a month format?

          For example in "forv m = 2/12" I have to replace 12 with the exact number of periods right?

          Comment


          • #6
            For several years, would you want a variable generated for each month-to-month transition within each year (e.g., Apr_To_May_2021 as well as Apr_To_May_2022_Membership)? Also, your two requests are inconsistent -- do you have a month-year variable or a numeric non-monthly variable?

            More generally, it would be helpful if you shared a data example which shows the specific time variable you're working with to help myself and others get a clearer and more accurate understanding of the problem. That will enable more efficient modification of the code.
            Last edited by Ali Atia; 21 Jun 2022, 08:49.

            Comment


            • #7
              Originally posted by Ali Atia View Post
              For several years, would you want a variable generated for each month-to-month transition within each year (e.g., Apr_To_May_2021 as well as Apr_To_May_2022_Membership)? Also, your two requests are inconsistent -- do you have a month-year variable or a numeric non-monthly variable?

              More generally, it would be helpful if you shared a data example which shows the specific time variable you're working with to help myself and others get a clearer and more accurate understanding of the problem. That will enable more efficient modification of the code.
              Yes, actually I'm tracking changes in membership (and other parameters) over different phases post the onset of Covid. So the different phases of time would be numbered like 1, 2, 3 etc. Each time period represents when there was a lockdown, when there was a lift in lockdown, etc.

              I am posting an updated dataex, hopefully the coding of the time variable is more clear, I've changed it from a string variable Month to a numeric variable indicating phase or time period. There will overall be approximately 10 phases (since there have been multiple lockdowns and lifting of lockdowns since Covid started)
              Code:
               * Example generated by -dataex-. To install: ssc install dataex clear input str2 ID byte phase float HasMembership "A" 1 0 "B" 1 1 "C" 1 0 "D" 1 1 "E" 1 0 "F" 1 1 "A" 2 1 "B" 2 1 "C" 2 1 "D" 2 1 "E" 2 1 "F" 2 1 "A" 3 1 "B" 3 0 "C" 3 1 "D" 3 0 "E" 3 1 "F" 3 0 end
              Last edited by Sohini Mazumder; 21 Jun 2022, 08:58.

              Comment


              • #8
                You've signalled that this is a dummy dataset but it's all we have as a basis to suggest code.

                A string variable for Month is of limited use and to work with previous and following data time series operators are a good idea, which alone implies that you need a numeric identifier.

                Here is some technique after which you can use time series operators.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str2 ID str5 Month float HasMembership
                "A" "APRIL" 0
                "B" "APRIL" 1
                "C" "APRIL" 0
                "D" "APRIL" 1
                "E" "APRIL" 0
                "F" "APRIL" 1
                "A" "MAY"   1
                "B" "MAY"   1
                "C" "MAY"   1
                "D" "MAY"   1
                "E" "MAY"   1
                "F" "MAY"   1
                "A" "JUNE"  1
                "B" "JUNE"  0
                "C" "JUNE"  1
                "D" "JUNE"  0
                "E" "JUNE"  1
                "F" "JUNE"  0
                end
                
                gen month = .
                tokenize "`c(Months)'"
                
                forval j = 1/12 {
                    replace month = `j' if proper(Month) == "``j''"
                }
                
                encode ID, generate(id)
                
                order id
                
                list , sepby(Month)
                
                    +------------------------------------+
                     | id   ID   Month   HasMem~p   month |
                     |------------------------------------|
                  1. |  A    A   APRIL          0       4 |
                  2. |  B    B   APRIL          1       4 |
                  3. |  C    C   APRIL          0       4 |
                  4. |  D    D   APRIL          1       4 |
                  5. |  E    E   APRIL          0       4 |
                  6. |  F    F   APRIL          1       4 |
                     |------------------------------------|
                  7. |  A    A     MAY          1       5 |
                  8. |  B    B     MAY          1       5 |
                  9. |  C    C     MAY          1       5 |
                 10. |  D    D     MAY          1       5 |
                 11. |  E    E     MAY          1       5 |
                 12. |  F    F     MAY          1       5 |
                     |------------------------------------|
                 13. |  A    A    JUNE          1       6 |
                 14. |  B    B    JUNE          0       6 |
                 15. |  C    C    JUNE          1       6 |
                 16. |  D    D    JUNE          0       6 |
                 17. |  E    E    JUNE          1       6 |
                 18. |  F    F    JUNE          0       6 |
                     +------------------------------------+
                
                tsset id month
                EDIT: This was a long time in drafting as I was working on other things and other replies were not visible while I was doing that.
                Last edited by Nick Cox; 21 Jun 2022, 09:01.

                Comment


                • #9
                  Originally posted by Nick Cox View Post
                  You've signalled that this is a dummy dataset but it's all we have as a basis to suggest code.

                  A string variable for Month is of limited use and to work with previous and following data time series operators are a good idea, which alone implies that you need a numeric identifier.

                  Here is some technique after which you can use time series operators.

                  Code:
                  * Example generated by -dataex-. To install: ssc install dataex
                  clear
                  input str2 ID str5 Month float HasMembership
                  "A" "APRIL" 0
                  "B" "APRIL" 1
                  "C" "APRIL" 0
                  "D" "APRIL" 1
                  "E" "APRIL" 0
                  "F" "APRIL" 1
                  "A" "MAY" 1
                  "B" "MAY" 1
                  "C" "MAY" 1
                  "D" "MAY" 1
                  "E" "MAY" 1
                  "F" "MAY" 1
                  "A" "JUNE" 1
                  "B" "JUNE" 0
                  "C" "JUNE" 1
                  "D" "JUNE" 0
                  "E" "JUNE" 1
                  "F" "JUNE" 0
                  end
                  
                  gen month = .
                  tokenize "`c(Months)'"
                  
                  forval j = 1/12 {
                  replace month = `j' if proper(Month) == "``j''"
                  }
                  
                  encode ID, generate(id)
                  
                  order id
                  
                  list , sepby(Month)
                  
                  +------------------------------------+
                  | id ID Month HasMem~p month |
                  |------------------------------------|
                  1. | A A APRIL 0 4 |
                  2. | B B APRIL 1 4 |
                  3. | C C APRIL 0 4 |
                  4. | D D APRIL 1 4 |
                  5. | E E APRIL 0 4 |
                  6. | F F APRIL 1 4 |
                  |------------------------------------|
                  7. | A A MAY 1 5 |
                  8. | B B MAY 1 5 |
                  9. | C C MAY 1 5 |
                  10. | D D MAY 1 5 |
                  11. | E E MAY 1 5 |
                  12. | F F MAY 1 5 |
                  |------------------------------------|
                  13. | A A JUNE 1 6 |
                  14. | B B JUNE 0 6 |
                  15. | C C JUNE 1 6 |
                  16. | D D JUNE 0 6 |
                  17. | E E JUNE 1 6 |
                  18. | F F JUNE 0 6 |
                  +------------------------------------+
                  
                  tsset id month
                  EDIT: This was a long time in drafting as I was working on other things and other replies were not visible while I was doing that.
                  Thanks for the help Nick, I understand I made a mistake by not using a numeric time variable and making the coding more time consuming. Apologies for that, and thank you for the help!

                  I've asked above on how to modify the suggested code with time periods instead of specific months, since my actual data set spans several months over the Covid years. Below is the updated dataex where month has been replaced by a numeric variable phase.

                  Code:
                   input str2 ID byte phase float HasMembership "A" 1 0 "B" 1 1 "C" 1 0 "D" 1 1 "E" 1 0 "F" 1 1 "A" 2 1 "B" 2 1 "C" 2 1 "D" 2 1 "E" 2 1 "F" 2 1 "A" 3 1 "B" 3 0 "C" 3 1 "D" 3 0 "E" 3 1 "F" 3 0 end

                  Comment


                  • #10
                    Here is code which accounts for the phases:

                    Code:
                    local list Gained Retained Lost  "Retained Non"
                    foreach var of varlist HasMembership{
                        levelsof phase if `var'!=.,local(phases)
                        foreach p of local phases{
                            bysort ID: egen `var'_`p' = max(cond(phase==`p',`var',.))
                            bysort ID: egen `var'_`=`p'-1' = max(cond(phase==`=`p'-1',`var',.))
                            local conditions `""`var'_`p'==1 & `var'_`=`p'-1'==0" "`var'_`p'==1 & `var'_`=`p'-1'==1" "`var'_`p'==0 & `var'_`=`p'-1'==1" "`var'_`p'==0 & `var'_`=`p'-1'==0""'
                            gen p`=`p'-1'_p`p'_`var' = .
                            forv x = 1/4{
                                 bysort ID (phase): replace p`=`p'-1'_p`p'_`var' = `x' if `:word `x' of `conditions''
                                 if `p'==`:word 1 of `phases'' label define `var'_lab `x' "`:word `x' of `list'' `var'", add
                            }
                            label values p`=`p'-1'_p`p'_`var' `var'_lab `x'
                            drop `var'_*
                        }
                    }

                    Comment


                    • #11
                      Originally posted by Ali Atia View Post
                      Here is code which accounts for the phases:

                      Code:
                      local list Gained Retained Lost "Retained Non"
                      foreach var of varlist HasMembership{
                      levelsof phase if `var'!=.,local(phases)
                      foreach p of local phases{
                      bysort ID: egen `var'_`p' = max(cond(phase==`p',`var',.))
                      bysort ID: egen `var'_`=`p'-1' = max(cond(phase==`=`p'-1',`var',.))
                      local conditions `""`var'_`p'==1 & `var'_`=`p'-1'==0" "`var'_`p'==1 & `var'_`=`p'-1'==1" "`var'_`p'==0 & `var'_`=`p'-1'==1" "`var'_`p'==0 & `var'_`=`p'-1'==0""'
                      gen p`=`p'-1'_p`p'_`var' = .
                      forv x = 1/4{
                      bysort ID (phase): replace p`=`p'-1'_p`p'_`var' = `x' if `:word `x' of `conditions''
                      if `p'==`:word 1 of `phases'' label define `var'_lab `x' "`:word `x' of `list'' `var'", add
                      }
                      label values p`=`p'-1'_p`p'_`var' `var'_lab `x'
                      drop `var'_*
                      }
                      }
                      Thank you, this worked very well.

                      Comment

                      Working...
                      X