Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Code Needed for Person-Year Identification. After Consecutive Years Above a Certain Level, 5 Years later a Specific Decrease Occurs.

    Apologies, confidential dataset.

    I'm looking for a string of code that would solve the problem described below. I have no idea where to start since deleting data won't work, and since the decrease in data 5 years later can still be above 0.5. Here ya go!

    Assume there is a psychological study measuring a group of peoples emotional intelligence (EI) over time, rated on a scale from 0 to 1. From 0 to 1, there are 11 ratings (0, 0.1, 0.2, ... 0.9, 1). There are 200 people, with their EI reevaluated each year from 1960 to 2010. My objective is to find the person-years where the preceding 10 years had an EI consistently greater than 0.5 (all 10 years had EI>0.5), and 5 years later the EI had decreased by 0.2 or more.


    Examples:

    1.

    1. EI of person A from 1960-1974 = 0.6, then 1975-1978 = 0.5, then 1979 = 0.4 would COUNT
    2. Same 1960-1977 as above, but 1978 = 0.4, 1979 = 0.5 would COUNT
    3. 1960-1969 = 0.6, 1970-1973 = 0.4, 1974 = 0.5 would NOT COUNT

    #1 above would need to identify 1975, person A as the person-year.
    #2 above would need to identify 1974 as the year, since the 5 year decrease occurred in 1978, but 1979 did not have the same result.
    #3 above had no 10 year run greater than 0.5 where 5 years later there was a 0.2 decrease. The only year that could fit this qualification would be 1974 since there was exactly a 10 year run above 0.5 that ended exactly 5 years before.



    2.

    EI of person B from 1960-1969 = 0.8, 1970-1973 = 0.7, 1974 = 0.6
    OR
    1960-1969 = 0.8, 1970-1974 = 0.6
    OR
    1960-1969 = 0.8, 1970-1974 = 0.3
    OR
    1960-1969 = 0.8, 1970-1973 = 0.3, 1974 = 0.6
    ALL WOULD COUNT. They should all return 1970, person B as the person-year.



    Any advice on the code for this would be greatly appreciated!! Life of a research assistant.

  • #2
    Welcome to Statalist, Austin.

    Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post, looking especially at sections 9-12 on how to best pose your question. It would be particularly helpful to post your examples in a small hand-made dataset showing the data before the process and how you expect it to look after the process. In particular, please read FAQ #12 and use dataex and CODE delimiters when posting to Statalist.

    We understand your dataset is confidential but you could use the Stata Data Editor to enter your seven examples into a Stata dataset usable for testing. You will see that many Statalist answers are provided as sample code, because Stata is the universal language here, and many members prefer not to post untested code, especially if the solution is centered around logic rather than just syntax.

    With that said, I would encourage you to transform your EI from a 0-1 scale incremented in tenths to a 0-10 scale incremented in units. The reason is that with the exception of 0.5, none of the fractional values can be represented accurately in binary arithmetic, and comparisons need to be handled carefully to avoid the possibility that, to make up an example, you try to compare 0.299999999... to 0.3 and find it less rather than equal.
    Code:
    generate EI10 = round(10*EI)
    should do what you need.

    Comment


    • #3
      As William advises, our FAQ anticipates your problem:

      If your dataset is confidential, then provide a fake example instead.
      I have another problem. I don't understand the difference between these two cases:

      2. Same 1960-1977 as above [namely 0.6], but 1978 = 0.4, 1979 = 0.5 would COUNT

      3. 1960-1969 = 0.6, 1970-1973 = 0.4, 1974 = 0.5 would NOT COUNT
      Otherwise put,

      5 years later the EI had decreased by 0.2 or more.
      doesn't spell out

      (a) whether that means ever, mostly, on average or something else.

      (b) 5 years later means within the next 5 years or after 5 years have elapsed.

      In #2 there is no evidence for 5 years after 1977 except for 1978 and 1979.

      There are more puzzles, but I can't begin to think of code myself without a much clearer story on your rules.

      Comment


      • #4
        I don't follow your examples and I can't reproduce your logic. Let's just look at the first example for person A. In 1975, person A has indeed had a run of (more than) 10 years with EI > 0.5. But person A's current AI is 0.5, and in 1979 it is down to 0.4. But that's only a drop of 0.1, not meeting the 0.2 threshold. So I don't get why this "counts."

        There is also some ambiguity in your description. Does "the preceding 10 years" mean the current year and the 9 years before that, or does it mean starting with the preceding year and going back to 10 years before the present? Also how do you want to treat years like 1964 where there isn't a total of 10 years of previous measurement to look at?

        Anyway, here is what I've come up with that matches one interpretation of what you described, though its results are not exactly what you've arrived at. Perhaps you can figure it out from here. (Note: if you don't already have it, you'll need to get Robert Picard's excellent -rangestat- command by running -ssc install rangestat-.)

        Code:
        // GENERATE DATA
        clear*
        set more off
        set obs 7
        gen person = "A1" in 1
        replace person = "A2" in 2
        replace person = "A3" in 3
        replace person = "B1" in 4
        replace person = "B2" in 5
        replace person = "B3" in 6
        replace person = "B4" in 7
        expand 1979-1960+1
        by person, sort: gen year = 1959 + _n
        gen EI10 = 6 if inlist(person, "A1", "A2") & inrange(year, 1960, 1974)
        replace EI10 = 5 if inlist(person, "A1", "A2") & inrange(year, 1975, 1979)
        replace EI10 = 4 if person == "A1" & year == 1979
        replace EI10 = 4 if person == "A2" & year == 1978
        replace EI10 = 4 if person == "A3" & inrange(year, 1970, 1973)
        replace EI10 = 5 if person == "A3" & inrange(year, 1974, 1979)
        replace EI10 = 8 if person == "B1" & inrange(year, 1960, 1969)
        replace EI10 = 7 if person == "B1" & inrange(year, 1970, 1973)
        replace EI10 = 6 if person == "B1" & inrange(year, 1974, 1979)
        replace EI10 = 8 if person == "B2" & inrange(year, 1960, 1969)
        replace EI10 = 6 if person == "B2" & inrange(year, 1970, 1979)
        replace EI10 = 8 if person == "B3" & inrange(year, 1960, 1969)
        replace EI10 = 3 if person == "B3" & inrange(year, 1970, 1979)
        replace EI10 = 8 if person == "B4" & inrange(year, 1960, 1969)
        replace EI10 = 3 if person == "B4" & inrange(year, 1970, 1979)
        replace EI10 = 6 if person == "B4" & year == 1974
        
        rangestat (min) lowest_ei_last_10 = EI10, by(person) interval(year -9 0)
        replace lowest_ei_last_10 = . if year < 1969
        by person (year), sort:gen byte drop2 = (EI10 >= EI10[_n+4] + 2)
        gen byte hit = lowest_ei_last_10 > 5 & !missing(lowest_ei_last_10) & drop2

        Comment


        • #5
          "" 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
          "A" .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .5 .5 .5 .5 .4
          "B" .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .5 .5 .5 .4 .5
          "C" .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .4 .4 .4 .4 .5 .5 .5 .5 .5 .6
          "D" .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .7 .7 .7 .7 .6 .7 .7 .7 .7 .7
          "E" .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6
          "F" .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .3 .3 .3 .3 .3 .3
          "G" .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .3 .3 .3 .3 .3 .6 .6 .6 .6 .6
          "H" .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .5 .5 .5 .5 .5
          "I" .9 .9 .9 .9 .9 .9 .9 .9 .9 .9 .9 .8 .8 .8 .8 .8 .8 .8 .8 .8
          "J" .7 .8 .8 .9 .8 .7 .7 .7 .7 .7 .7 .7 .7 .7 .7 .7 .8 .8 .8 .8
          "K" .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2
          "L" .8 .8 .8 .8 .8 .8 .8 .7 .7 .7 .7 .7 .6 .6 .6 .6 .6 .6 .6 .6
          "M" .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .1 .1 .1 .1 .1 .1
          "N" .7 .7 .7 .7 .7 .7 .7 .7 .7 .7 .7 .7 .7 .9 .9 .9 .9 .9 .9 .9
          end


          Here is a dataex. The first 7 examples are A through G, and I added some just for comparison that should have very specific answers.

          William, I did go over the FAQ first but didn't think of recreating the dataset separately, thank you for the direction!

          Nick, sorry I wasn't clear enough. The value must have decreased by 0.2 or more, exactly 5 years after the last year of the 10 year run. Not an average. The difference from 2 to 3 is that with 3, there is a 10 year run above 0.5, then exactly five years from the last year of the run, there was not a 0.2 decrease. It went from 0.6 to 0.5, even though the years between were 0.4.

          Clyde, in the first example, from 1974 to 1979 there is a decrease of 0.2. Just because there is not a singular year with a 0.2 decrease doesn't mean there will not be a difference of 0.2 in a five year span. I'm going to wok around with the rangestat code now to see if I can manipulate it to give the right outputs! Thank you so much!

          Again to everyone, sorry for the lame first-timer posting mistakes, I hope to be around here for a long time and continue to learn along the way!

          Comment


          • #6
            Let me perhaps help out here by consolidating the data in #5 with Clyde's code from #4. That might be enough for you to then modify Clyde's code to meet your needs.

            Your data has a wide layout; Clyde's code assumes a long layout. While I could have used reshape a second time to return the data to a wide layout, it would probably be a mistake to do so. The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data. As it was in this case, for example.
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str8 person float(EI60 EI61 EI62 EI63 EI64 EI65 EI66 EI67 EI68 EI69 EI70 EI71 EI72 EI73 EI74 EI75 EI76 EI77 EI78 EI79)
            "A" .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .5 .5 .5 .5 .4
            "B" .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .5 .5 .5 .4 .5
            "C" .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .4 .4 .4 .4 .5 .5 .5 .5 .5 .6
            "D" .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .7 .7 .7 .7 .6 .7 .7 .7 .7 .7
            "E" .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6
            "F" .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .3 .3 .3 .3 .3 .3
            "G" .8 .8 .8 .8 .8 .8 .8 .8 .8 .8 .3 .3 .3 .3 .3 .6 .6 .6 .6 .6
            "H" .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .6 .5 .5 .5 .5 .5
            "I" .9 .9 .9 .9 .9 .9 .9 .9 .9 .9 .9 .8 .8 .8 .8 .8 .8 .8 .8 .8
            "J" .7 .8 .8 .9 .8 .7 .7 .7 .7 .7 .7 .7 .7 .7 .7 .7 .8 .8 .8 .8
            "K" .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2 .2
            "L" .8 .8 .8 .8 .8 .8 .8 .7 .7 .7 .7 .7 .6 .6 .6 .6 .6 .6 .6 .6
            "M" .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .4 .1 .1 .1 .1 .1 .1
            "N" .7 .7 .7 .7 .7 .7 .7 .7 .7 .7 .7 .7 .7 .9 .9 .9 .9 .9 .9 .9
            end
            // Reshape data to match Clyde's code; create EI10 scale for precision
            reshape long EI, i(person) j(year)
            replace year = year+1900
            generate EI10 = round(10*EI)
            // Clyde's code
            rangestat (min) lowest_ei_last_10 = EI10, by(person) interval(year -9 0)
            replace lowest_ei_last_10 = . if year < 1969
            by person (year), sort:gen byte drop2 = (EI10 >= EI10[_n+4] + 2)
            gen byte hit = lowest_ei_last_10 > 5 & !missing(lowest_ei_last_10) & drop2
            Code:
            . list if hit==1, sepby(person)
            
                 +----------------------------------------------------+
                 | person   year   EI   EI10   lowes~10   drop2   hit |
                 |----------------------------------------------------|
             35. |      B   1974   .6      6          6       1     1 |
                 |----------------------------------------------------|
             50. |      C   1969   .6      6          6       1     1 |
                 |----------------------------------------------------|
             90. |      E   1969   .8      8          8       1     1 |
                 |----------------------------------------------------|
            111. |      F   1970   .8      8          8       1     1 |
            112. |      F   1971   .8      8          8       1     1 |
            113. |      F   1972   .8      8          8       1     1 |
            114. |      F   1973   .8      8          8       1     1 |
                 |----------------------------------------------------|
            130. |      G   1969   .8      8          8       1     1 |
                 +----------------------------------------------------+
            Last edited by William Lisowski; 15 Mar 2017, 12:24.

            Comment

            Working...
            X