Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Loops

    Hi
    I am doing some coding were I am looking if an ID for an individual change between two years or not (1 for a change and 0 if the values are the same). I have one variable for each ID and year. Now I want to create new variables identifying the differences for each year, stretching over a time period of 20 years. I know how to do this manually but I want to this more efficient in a loop. Any experts that help me?
    My code for each individual year.

    Differences in the year 1975

    gen Different_ID1975 = 0
    replace Different_ID1975= 1 if ID1975!=ID1974
    replace Different_ID1975 = . if ID1975== . | ID1974==.

    Year 1976
    ....

  • #2
    This does not require any loops. It requires a better data organization and a couple of simple commands.

    Code:
    gen `c(obs_t)' obs_no = _n
    reshape long ID, i(obs_no) j(year)
    xtset obs_no year
    gen byte wanted = (ID != L1.ID) if !missing(ID, L1.ID)

    Comment


    • #3
      Yes, that true that I can put it in a long format and use your suggested coding. But if I still want it in a wide format so that I can have one variable for each annual change?

      Another question. If I do as you suggest, could I identify changes across 3 periods? Change 1975 would then use information from 1973, 1974 and 1975.

      Comment


      • #4
        Your code can be boiled down to

        Code:
        gen Different_ID1975 =  ID1975 != ID1974 if !missing(ID1975, ID1974)
        and that is one new variable -- but if I understand correctly you say you want two.

        I have to agree with Clyde Schechter . Put it this way: what are you going to do with these new variables, whether there are about 20 or about 40 of them?

        Further, you ask about comparing 3 years, but then there seem to be several possibilities:

        same id in 1973, 1974, 1975
        several possibilities for same id in two years out of three
        different id in 1973, 1974, 1975
        missing id in one, two or three years

        -- so there is undoubtedly some code for every set of rules. But what are the rules?

        Also, for each comparison you do in future, are you going to accumulate yet more new variables?

        Wanting a wide layout (the word "format" is better avoided, as it is so overloaded -- but that is a detail) may be a personal preference, but that won't help if you don't have a coding strategy to cope.

        Comment


        • #5
          Thanks for the reply. That´s correct, one variable for each pair of years.

          gen Different_ID1975 = ID1975 != ID1974 if !missing(ID1975, ID1974)
          gen Different_ID1976 = ID1976 != ID1975 if !missing(ID1976, ID1975) Etc. Until 2005.

          The reason for creating the Different_ID variables is that I want to use them in a logistic regression model. Since the change only occurs for 10 % of all my individuals each year, the number of observations in a long layout does not add to much to my analysis and just inflate the number of observations. Thus, I want to see how my analysis looks like if I put it in a wide layout. I instead model it for each different pair of years, using my regressors for the first year of each pair.

          The reason for why I want to include differences across three years, is that for some individuals there might be a time lag for a change in the ID. Maybe it´s possible to do so using the code Clyde suggested. Any ideas for that?

          The rules for this would be several. Individuals that have missing value for more than one year is excluded, it doesn't matter if it 1973, 1974 or 1975. The rules should be that I accept different values either: 1973-1974, 1974-1975 or 1973-1975.

          If I want to decide to do even longer time comparisons of the ID, say differences between the years 1973 - 1983 (ignoring the number of changes or when the change taking place) making the interpretation if a individuals has changed its ID during the years. How could I do this in the long layout that Clyde suggested? What do you mean with accumulate more new variables?
          Last edited by David Dahlgren; 23 Nov 2023, 03:37.

          Comment


          • #6
            I'm not sure I see where you are going with this. I read #5 as proposing a rather broad array of different comparisons that might be made, some of which are based on a number of years passing, and some of which are based on specifically designated years. These actually require different approaches.

            Is the underlying question just "does the ID change at some point?" If so, that can be very easily done in the long layout:
            Code:
            by obs_no (ID), sort: egen byte wanted = max(ID != ID[1] & !missing(ID))
            This will identify any observation (from the original wide data) where there are two distinct, non-missing IDs somewhere. It does not tell you when the ID changed.
            Note: This code assumes the ID is a numeric variable. If it is a string variable, replace ID[1] with ID[_N] in the code.

            Comment

            Working...
            X