Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Check if variable is constant within ID

    Hi everyone,

    I have a dataset that that looks like this:

    state year legis_control
    NH 2009 Dem
    NH 2010 Dem
    NH 2011 Rep
    NH 2012 Rep
    NH 2013 Split
    NH 2014 Split
    NH 2015 Rep
    NH 2016 Rep
    NH 2017 Rep
    NH 2018 Rep
    NJ 2009 Dem
    NJ 2010 Dem
    NJ 2011 Dem
    NJ 2012 Dem
    NJ 2013 Dem
    NJ 2014 Dem
    NJ 2015 Dem
    NJ 2016 Dem
    NJ 2017 Dem
    NJ 2018 Dem
    NM 2009 Dem
    NM 2010 Dem
    NM 2011 Dem
    NM 2012 Dem
    NM 2013 Dem
    NM 2014 Dem
    NM 2015 Split
    NM 2016 Split
    NM 2017 Dem
    NM 2018 Dem
    I want to create a dummy whenever "legis_control" is constant within state. I have tried the following code as suggested in FAQ (https://www.stata.com/support/faqs/d...tions-in-group) and another posts:

    by state (legis_control), sort: gen swing_state = legis_control[1] != legis_control[_N]

    This seems to work both for states that are always constant ("NJ") and for those in which legis_control in 2009 differs from that of 2018 ("NH"). However, in states like "NM", the first and last observation are the same, but differ in the middle. Is there any way to fix this?

    Thanks,

    Zoel


  • #2
    by state (legis_control), sort: gen swing_state = legis_control[1] != legis_control[_N]

    This seems to work both for states that are always constant ("NJ") and for those in which legis_control in 2009 differs from that of 2018 ("NH"). However, in states like "NM", the first and last observation are the same, but differ in the middle. Is there any way to fix this?

    Have you looked at the results and if so, how do they contradict what you want? You are not sorting by time, so the temporal differences are irrelevant.

    Comment


    • #3
      Does this code solve your problem?

      Code:
      bys state (year): egen sdlegis_control = sd(legis_control)
      generate swing_state = sdlegis_control==0

      Comment


      • #4
        The problem came from the code:
        Code:
        by state (legis_control), sort: gen swing_state = legis_control[1] != legis_control[_N]
        Just because some states have the same beginning and end does not imply it's constant within. So it has to be revised. If your legis_control variable is numerical with a coding scheme, then suggestion in #3 would work. (Other than I'd change -generate swing_state = sdlegis_control == 0- into -generate swing_state = sdlegis_control != 0-, otherwise the variable name would be confusing.

        If it's coded as string, then this may work:
        Code:
        egen tmp1 = tag(state legis_control)
        bysort state: egen combo = total(tmp1)
        drop tmp1
        It creates a variable counting how many unique outcomes there had been for each state. Those with "1" are not swing states, those with 2 or higher are. And if that's not what you need, then the definition of "constant" needs to be clarified.

        Comment


        • #5
          I agree with Andrew in post #2. I ran your code on your example data.
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input str2 state int year str5 legis_control
          "NH" 2009 "Dem"  
          "NH" 2010 "Dem"  
          "NH" 2011 "Rep"  
          "NH" 2012 "Rep"  
          "NH" 2013 "Split"
          "NH" 2014 "Split"
          "NH" 2015 "Rep"  
          "NH" 2016 "Rep"  
          "NH" 2017 "Rep"  
          "NH" 2018 "Rep"  
          "NJ" 2009 "Dem"  
          "NJ" 2010 "Dem"  
          "NJ" 2011 "Dem"  
          "NJ" 2012 "Dem"  
          "NJ" 2013 "Dem"  
          "NJ" 2014 "Dem"  
          "NJ" 2015 "Dem"  
          "NJ" 2016 "Dem"  
          "NJ" 2017 "Dem"  
          "NJ" 2018 "Dem"  
          "NM" 2009 "Dem"  
          "NM" 2010 "Dem"  
          "NM" 2011 "Dem"  
          "NM" 2012 "Dem"  
          "NM" 2013 "Dem"  
          "NM" 2014 "Dem"  
          "NM" 2015 "Split"
          "NM" 2016 "Split"
          "NM" 2017 "Dem"  
          "NM" 2018 "Dem"  
          end
          by state (legis_control), sort: gen swing_state = legis_control[1] != legis_control[_N]
          tab state swing_state
          and received the following result
          Code:
          . tab state swing_state
          
                     |      swing_state
               state |         0          1 |     Total
          -----------+----------------------+----------
                  NH |         0         10 |        10 
                  NJ |        10          0 |        10 
                  NM |         0         10 |        10 
          -----------+----------------------+----------
               Total |        10         20 |        30
          which shows both NH and NM correctly identified as swing states. So you either (a) did not run the code you presented, or (b) misinterpreted the results of running it.

          The codde works because after the sorting, your NM data looks like this
          Code:
          . list if state=="NM"
          
               +------------------------------------+
               | state   year   legis_~l   swing_~e |
               |------------------------------------|
           21. |    NM   2014        Dem          1 |
           22. |    NM   2012        Dem          1 |
           23. |    NM   2011        Dem          1 |
           24. |    NM   2009        Dem          1 |
           25. |    NM   2017        Dem          1 |
               |------------------------------------|
           26. |    NM   2018        Dem          1 |
           27. |    NM   2010        Dem          1 |
           28. |    NM   2013        Dem          1 |
           29. |    NM   2015      Split          1 |
           30. |    NM   2016      Split          1 |
               +------------------------------------+
          with each value of legis_control grouped, so if there are different values, one value will appear in the first observation and a different value in the last observation.

          You'll likely want to sort your data back into the original sequence after creating swing_state.


          Comment


          • #6
            Thanks all for your replies. As suggested by Andrew and William, the code was indeed giving the correct results. I must have misread the results, my apologies for that.

            Comment

            Working...
            X