Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Change in Observation within Group

    Dear all,

    I am new to Stata.
    I would like to obtain the variable "change" as below.
    In essence, what I need is that I need to get a variable with value of 1 if rank changes from 1 to 3 or 3 to 1 within the id.
    Variable year could be with or without gaps.
    My sincere and advance thank you.



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(id year rank change)
    1000 2000 1 .
    1000 2001 1 .
    1000 2002 2 .
    1000 2003 2 .
    1000 2004 1 .
    1100 2000 1 .
    1100 2001 2 .
    1100 2002 1 .
    1100 2005 3 1
    1100 2006 3 .
    1200 2000 3 .
    1200 2002 3 .
    1200 2004 2 .
    1200 2006 1 1
    1200 2007 1 .
    1300 2007 3 1
    1300 2008 1 1
    1310 2000 2 .
    1310 2001 2 .
    1320 2002 2 .
    1320 2003 2 .
    1320 2005 3 1
    1320 2006 1 1
    1320 2007 3 1
    end

  • #2
    Thanks for your example, but your rules say nothing about rank 2.

    Consider these:

    Code:
    1100 2001 2 .
    1100 2002 1 .
    Code:
    1200 2004 2 .
    1200 2006 1 1
    In the first pair a change from 2 to 1 is not flagged. In the second pair it is.

    Comment


    • #3
      Hi Nick,

      Apologies for not being clear.
      The variable "rank" is generated from the following formula:

      Code:
      egen rank= xtile(var), nq(3)
      The reason why the change in the first pair from 2 to 1 is not flagged is because I only want to flag the extreme changes, i.e., only those from 1 (lowest) to 3 (highest) or 3 to 1 within the id group and regardless of the year (but must be ascending as in example).
      Do let me know if I have not clarify properly.
      Once again, thank you so much.
      Last edited by Ricky Liu; 02 Jul 2018, 05:20.

      Comment


      • #4
        Sorry, I don't get it. Why is 2 to 1 for 1200 treated differently from the other identifier?

        Comment


        • #5
          Because the value for variable "rank " within id=1200 has changed - from 3 (second observation in 2002) to 1 (in 2007).
          In other words, regardless of the year, if there is a extreme change within the group (1 to 3 or 3 to 1), I would like to create a variable with value of 1.
          I have now made some changes and include the sample data here.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(id year rank change)
          1000 2000 1 .
          1000 2001 1 .
          1000 2002 2 .
          1000 2003 2 .
          1000 2004 1 .
          1100 2000 1 .
          1100 2001 2 .
          1100 2002 1 .
          1100 2005 3 1
          1100 2006 3 .
          1200 2000 3 .
          1200 2002 3 .
          1200 2004 2 .
          1200 2006 1 1
          1200 2007 1 .
          1300 2007 3 .
          1300 2008 1 1
          1310 2000 2 .
          1310 2001 2 .
          1320 2002 2 .
          1320 2003 2 .
          1320 2005 3 .
          1320 2006 1 1
          1320 2007 3 1
          end

          Thank you.
          Last edited by Ricky Liu; 02 Jul 2018, 05:38.

          Comment


          • #6
            No clearer to me. Sorry.

            Evidently your rules are more than 1 to 3 or 3 to 1 transitions, as shown by your treatment of 1200, but what the other rules are I don't understand.

            Also, what about instances of 4?

            (I won't reply further unless I understand. The questions are more for anybody else.)

            Comment


            • #7
              The following explanation only refers to group id 1200.

              The reason why variable "change" has been flagged in 2006 is because there is an occurrence of 3 (being the highest) in year 2002 but then there is a transition to the lowest (1) in 2006 within group id 1200. It is not due to the previous observation or year in 2004.

              Value of 3 in year 2000 should not be a concern because what I need is the transition from highest (lowest) to lowest (highest) in the later years, i.e 2002 (being later than 2000) with 2006. In other words, the difference between the transition years should be the minimal, i.e., (2006-2002)< (2006-2000).
              However, the observation in year 2000 would be an interest if the observation in 2002 was neither the highest or lowest while others remain constant.

              In essence, what I need is an indicator variable with value of 1 if:
              1. there is a transition from the 3 (highest) to the 1 (lowest) or 1 (lowest) to the highest (3) within group.
              2. the transition for the nearest years within the group

              Hope the explanation more or less clarifies the situation. My apologies if it still seems not clear.
              Thank you.

              Comment


              • #8
                Hi,

                I think I understand the desired operation; however, it remains unclear to me if observations should be flagged only when the transition occurs
                1. within a fixed time period (e.g. "only when the transition occurs within 2 years") or
                2. within a fixed period of observations (e.g "only when the transition occurs within no more than 2 observations").
                Could you clarify on this? I do not fully understand what you meant by this part:

                Originally posted by Ricky Liu View Post

                2. the transition for the nearest years within the group
                Regards
                Bela

                Comment


                • #9
                  Hi again,

                  I produced some code that might get you started. In essence, we are looking for lagged values here. -help tsvarlist- and -help tsset- are the essential help files. My questions above were, put in other terms,
                  1. how long are the focal the lags of interest and
                  2. are you're interested in time lags or observation lags.
                  For the following code snippet, I assume (1) you're interested in lags that are no longer than 2 units and (2) the lag unit of interest is observations, not years. If this is true, with your example data, you can produce the flag variable like this:
                  Code:
                  clear
                  input float(id year rank change)
                  1000 2000 1 .
                  1000 2001 1 .
                  1000 2002 2 .
                  1000 2003 2 .
                  1000 2004 1 .
                  1100 2000 1 .
                  1100 2001 2 .
                  1100 2002 1 .
                  1100 2005 3 1
                  1100 2006 3 .
                  1200 2000 3 .
                  1200 2002 3 .
                  1200 2004 2 .
                  1200 2006 1 1
                  1200 2007 1 .
                  1300 2007 3 .
                  1300 2008 1 1
                  1310 2000 2 .
                  1310 2001 2 .
                  1320 2002 2 .
                  1320 2003 2 .
                  1320 2005 3 .
                  1320 2006 1 1
                  1320 2007 3 1
                  end
                  
                  tempvar spellnumber
                  
                  /* two-observation-lag-scenario */
                  * generate empy flag variable
                  generate flag=.
                  * generate a consecutive spell identifier (needed to use lag functions)
                  bysort id (year) : generate `spellnumber'=_n
                  * declare time series data
                  tsset id `spellnumber'
                  * fill flag variable
                  quietly : replace flag=1 if (abs(S1.rank)==2 | (abs(S2.rank)==2 & abs(S1.rank)!=0))
                  * drop temporary variable
                  drop `spellnumber'
                  * check if result matches
                  assert flag==change
                  
                  list , sepby(id)
                  exit 0
                  There may be two extensions to this code. First, if you're interested in yearly lags instead of lags in observations (which might be a good idea), you should replace the bysort-line with
                  Code:
                  generate `spellnumber'=year
                  Second, if there is a variable lag-length, you need to let Stata calculate the expression for value replacement on the fly. It might not be the most elegant way to solve this, but the following code works for this purpose (the display command is only there to keep you updated on what's happening inside the loop):
                  Code:
                  /* generic infinite-lag-scenario (may not be the most efficient way to solve this; "infinite"=="up to 254") */
                  generate flag=.
                  bysort id (year) : generate `spellnumber'=_n
                  tsset id `spellnumber'
                  summarize `spellnumber' , meanonly
                  forvalues num=`r(min)'/`r(max)' {
                      display as text in smcl `"{text}flag will be set to 1 if {result}abs(S1.rank==2)`ifaddon'"'
                      quietly : replace flag=1 if (abs(S1.rank==2)`ifaddon')
                      local consumed_lags `consumed_lags',abs(S`num'.rank)
                      local ifaddon " | (abs(S`num'.rank)==2 & !inlist(0`consumed_lags'))"
                  }
                  drop `spellnumber'
                  tsset , clear
                  assert flag==change
                  This solution still is not perfect. For instance, it will break as soon as you encounter a lag length longer than 254 observations, as the inlist() function will hit its limit there.

                  I hope this helps a little (and I'm curious if someone comes up with a more elegant solution than my forvalues-loop).

                  Regards
                  Bela
                  Last edited by Daniel Bela; 03 Jul 2018, 06:01. Reason: typo

                  Comment


                  • #10
                    The below code seems more direct.
                    Code:
                    gen a=rank
                    bys id (year): replace a=a[_n-1] if a==2
                    bys id (year): gen switch=1 if a != a[_n-1] & !missing(a[_n-1])
                    drop a

                    Comment


                    • #11
                      Thank you Romalpa Akzo. It works

                      Comment

                      Working...
                      X