Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create a resetting membership tenure variable with different time spans between years

    Hello everyone,

    sorry for choosing the bulkiest title I could find.

    My data looks a bit like this (I choose one individual I call ID137 here as an example):
    ID Member year Desired tenure variable
    137 1 1986 1
    137 1 1990 5
    137 0 1994 0
    137 1 1999 1
    137 1 2002 4
    137 1 2004 6
    Member: 1 = Membership; 0 = No membership

    The idea is to build a tenure variable that counts consecutive years of being a member of an organization, switch back to 0 if a person cancelled the membership, and start counting when that person becomes a member again. There are also IDs that never were a member and IDs that always were a member.

    My (unsuccessful) tries to arrive at a "desired tenure variable" looked a bit like this:

    gen tenure_variable = .
    (1) bysort pid: replace tenure_variable= sum(syear - L_year) if member == 1 & L_member == 1
    (2) bysort pid: replace tenure_variable= 1 if member== 1 & L_member == .
    (3) bysort pid: replace tenure_variable= 1 if member== 1 & L_member == 0
    (4) bysort pid: replace tenure_variable= union if member== 0

    (1) is supposed to give me the sum of all consecutive years in which the person was a member. I have no idea why, but this worked kind of well for those individuals, who were a member from their first data point onwards. Problems arose when the membership status switched or the person became a member later.
    (2) and (3) just insert 1 for the first membership year.
    (4) puts a zero everywhere else.

    The lagged variables L_year and L_member are created beforehand and refer to the respective value in the last available year. As my data is non-continuous I had to create these variables myself instead of just using L.year and L.member.

    If you can help me create the desired tenure variable I would be incredibly thankful. I guess one should be able to do it without lagged values by using some kind of loop? In total, I have over ten years of data for some IDs, which would make creating lags (L, L2, L3....) manually extremely tedious.

    Thank you!!!

  • #2
    Would something like this be close?

    Code:
    by pid: replace tenure = (tenure(_n-1)+(year-year(_n-1)))*member if _n>1
    Not tested

    Comment


    • #3
      Daniel Feenberg

      Thank you for your idea!

      Using your approach:

      gen testtenure = .
      by pid: replace testtenure = (testtenure(_n-1)+(year-year(_n-1)))*member if _n>1


      gives me the following error:

      unknown function testtenure()

      Comment


      • #4
        This may be limited by your limited data example.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int id byte member int year byte desiredtenurevariable
        137 1 1986 1
        137 1 1990 5
        137 0 1994 0
        137 1 1999 1
        137 1 2002 4
        137 1 2004 6
        end
        
        bys id (year): g group= sum(!member[_n-1] & member)+1
        replace group= group*member
        bys id group (year): g wanted= cond(!member, 0, year-year[1]+1)
        Res.:

        Code:
        . sort id year
        
        . l, sepby(id)
        
             +-------------------------------------------------+
             |  id   member   year   desire~e   group   wanted |
             |-------------------------------------------------|
          1. | 137        1   1986          1       1        1 |
          2. | 137        1   1990          5       1        5 |
          3. | 137        0   1994          0       0        0 |
          4. | 137        1   1999          1       2        1 |
          5. | 137        1   2002          4       2        4 |
          6. | 137        1   2004          6       2        6 |
             +-------------------------------------------------+

        Comment


        • #5
          Sorry - I should have used brackets instead of parenthesis for subscripts.

          Comment


          • #6
            Thank you so much Andrew Musau !

            It worked perfectly on that ID. Gonna dive deeper into the data to see if I stumble upon outliers where it doesn't work, but for now, I can just say thank you!!

            Comment


            • #7
              Daniel Feenberg: Sorry, I hadn't seen your last comment when I answered back. Thank you for your response. Your second comment also helps me understand what I am supposed to use as input and what I am actually doing there.

              Andrew Musau: To the best of my knowledge your code worked for my entire sample. Very much appreciated! The next step will be actually understanding how the code works, not just seeing that it works. ^^

              Thank you both!

              Comment


              • #8
                I would recommend reading the following Stata Journal column by Nick Cox: https://journals.sagepub.com/doi/pdf...867X0700700209. Once you do this, the technique in #4 should become clear. If not, post back and I will explain.

                Comment


                • #9
                  By now I have read the journal column by Nick Cox Andrew Musau. Thank you for the advice. It was really helpful.

                  Regarding your solution, I think it may even go slightly beyond the content of the column. At least I do remain with one question concerning your code.

                  Your first line reads:
                  Code:
                  bys id (year): g group= sum(!member[_n-1] & member)+1
                  I think my question is born through a misunderstanding of the sum() function. I browsed through my data after inserting your code and understand that every time "member" becomes 1 (instead of 0) for a certain ID, the "group" variable increases by one, and remains at that value throughout the years until the ID becomes a member (=1) again (after switching to not being a member =0 in between).

                  The way I read your code it says:

                  !member[_n-1] = the opposite of the last membership observation. If it's a value outside of 0 !member[_n-1] becomes 1. If it's 1, !member[_n-1] becomes 0.
                  sum(!member[_n-1] & member) = Always take the value described by !member[_n-1] and add the value of the current membership-observation.
                  In the end, you add 1 to this calculation.

                  Your code works perfectly, but the way I understand it (incorrectly) the first "group" value for someone who is member = 1 in his/her observation should be 3, not 1, as your code correctly computes. As the missing value is interpreted as FALSE, the value of !member[_n-1] becomes 1. The current value for "member" is 1, too. Thus, shouldn't the value of sum(!member[_n-1] & member) be 2? Adding the 1, in the end, would even turn it into a 3 by my understanding. Just to make sure - your code does not do that and - correctly - shows 1.

                  Sorry for writing it down in this hard-to-understand way. I am trying to type down my exact understanding of the code, so that more experienced users can understand where my thought process is wrong. The way I understand it, this problem should arise throughout observations, not just for the very first value of a spell.

                  Maybe someone can help me understand the "How" behind this code.

                  Comment


                  • #10
                    Boolean expressions are evaluated as either true or false. In Stata, zero is evaluated as false and non-zero values are evaluated as true. Consider

                    Code:
                    assert 29
                    
                    assert 1
                    
                    assert 0
                    
                    assert -3
                    Res.:

                    Code:
                    . assert 29
                    
                    .
                    .
                    .
                    . assert 1
                    
                    .
                    .
                    .
                    . assert 0
                    assertion is false
                    r(9);
                    
                    .
                    .
                    .
                    . assert -3


                    So

                    !member[_n-1]
                    means the previous observation of member is equal to zero and

                    member
                    means that the current value of member is non-zero (therefore equal to one as "member" is a binary 0/1 variable).


                    So literally, the code

                    bys id (year): g group= sum(!member[_n-1] & member)+1
                    reads

                    1. sorting within each id from the earliest year to the latest year
                    2. generate a variable group that is a running sum (+1 if the previous observation is zero and the current observation is 1).

                    So if the observation sequence is 0-0-0-1-0-1-1-0-0-1, the running sum is 0-0-0-1-1-2-2-2-2-3, i.e., it adds one if the previous observation is zero and the current observation is one.

                    Now, I need the group variable to indicate spells. As long as the values are distinct across spells, I do not really care what the values are. In other words, the group variable is a categorical variable. The reason I add 1 is because I want the group identifier to start at a value of one instead of zero (which would be the case as the lagged value corresponding to the first observation in a group is missing, and Stata evaluates missing as greater than zero and hence true). If not, this will lead to some confounding because with your condition

                    The idea is to build a tenure variable that counts consecutive years of being a member of an organization, switch back to 0 if a person cancelled the membership, and start counting when that person becomes a member again.
                    [,] having my spell identifier start at zero will interfere with how I distinguish observations that are not in a group (are assigned a value of zero) to those that are in a group (should be assigned a non-zero value). Hope that this clarifies things.
                    Last edited by Andrew Musau; 25 Nov 2022, 15:31.

                    Comment


                    • #11
                      Thank you so much! Your explanation helped me fully grasp what is happening in your code and will also help me in the future!

                      Also, thank you so much for the detail of your explanations and your patience!

                      Comment

                      Working...
                      X