Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • sdid in case of unbalanced panel

    I need to run synthetic difference is difference regression. Therefore, I need balanced panel data. But, as you can see my sample year is from 2000-2021. So, there are 22 years total. For all the counties at least one year info for desired variable wanted is missing. When I'm running this following command it's telling me year and county are missing


    Code:
    tsset county year
    isid county year, sort
    variables county and year should never be missing
    r(459);
    When I'm running the following command all the observations are getting dropped out - indicating not even a single county has variable wanted for 22 years.

    Code:
    by county (year): keep if _N == 22
    I'm attaching a part of my data
    Code:
    * Example generated by -dataex-. For more info, type help dataex clear
    
    input float(wanted county year)
    
    2 1011 2002
    2 1011 2003
    1 1011 2004
    1 1011 2019
    1 1027 2000
    2 1027 2002
    1 1027 2008
    1 1027 2009
    1 1027 2013
    1 1027 2018
    4 1001 2000
    3 1001 2001
    1 1001 2002
    1 1001 2003
    3 1001 2004
    5 1001 2005
    2 1001 2006
    3 1001 2007
    2 1001 2008
    2 1001 2009
    3 1001 2010
    2 1001 2011
    7 1001 2012
    3 1001 2013
    3 1001 2014
    3 1001 2015
    2 1001 2016
    7 1001 2017
    11 1001 2018
    3 1001 2019
    3 1001 2020
    
    end
    After using - tsfill, full -command I was successful to keep the disappearing counties to show up in my data to make it strongly balanced.

    Code:
    tsset county year
    
    tsfill, full
    Then I replaced my wanted variable with 0 when wanted == . This is actually right since wanted is 0 when it doesn't show up in my data.

    [CODE]
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(wanted county year policy)
        4 1001 2000 0
        3 1001 2001 0
        1 1001 2002 0
        1 1001 2003 0
        3 1001 2004 0
        5 1001 2005 0
        2 1001 2006 0
        3 1001 2007 0
        2 1001 2008 0
        2 1001 2009 0
        3 1001 2010 0
        2 1001 2011 0
        7 1001 2012 0
        3 1001 2013 0
        3 1001 2014 1
        3 1001 2015 1
        2 1001 2016 1
        7 1001 2017 1
       11 1001 2018 1
        3 1001 2019 1
        3 1001 2020 1
    0 1001 2021 0
    0 1001 2022 0
    0 1011 2000 0
    0 1011 2001 0
        2 1011 2002 0
        2 1011 2003 0
        1 1011 2004 0
    0 1011 2005 0
    0 1011 2006 0
    0 1011 2007 0
    0 1011 2008 0
    0 1011 2009 0
    0 1011 2010 0
    0 1011 2011 0
    0 1011 2012 0
    0 1011 2013 0
    0 1011 2014 0
    0 1011 2015 0
    0 1011 2016 0
    0 1011 2017 0
    0 1011 2018 0
        1 1011 2019 0
    0 1011 2020 0
    0 1011 2021 0
    0 1011 2022 0
        1 1027 2000 0
    0 1027 2001 0
        2 1027 2002 0
    0 1027 2003 0
    0 1027 2004 0
    0 1027 2005 0
    0 1027 2006 0
    0 1027 2007 0
        1 1027 2008 0
        1 1027 2009 0
    0 1027 2010 0
    0 1027 2011 0
    0 1027 2012 0
        1 1027 2013 0
        3 1027 2014 0
    0 1027 2015 0
        1 1027 2016 0
    0 1027 2017 0
        1 1027 2018 1
        2 1027 2019 1
    0 1027 2020 1
    0 1027 2021 1
    0 1027 2022 1
    end
    it's still showing unbalanced panel when I'm running the following stata command for SDID or synthetic difference in difference

    Code:
    sdid wanted county year policy, vce(bootstrap) seed(1213)
    
    Panel is unbalanced.
    r(451);
    Is there anything I can do to detect the error ?
    Last edited by Tariq Abdullah; 29 Oct 2022, 15:58.

  • #2
    When I run this, Stata tells me
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input float(wanted county year policy)
        4 1001 2000 0
        3 1001 2001 0
        1 1001 2002 0
        1 1001 2003 0
        3 1001 2004 0
        5 1001 2005 0
        2 1001 2006 0
        3 1001 2007 0
        2 1001 2008 0
        2 1001 2009 0
        3 1001 2010 0
        2 1001 2011 0
        7 1001 2012 0
        3 1001 2013 0
        3 1001 2014 1
        3 1001 2015 1
        2 1001 2016 1
        7 1001 2017 1
       11 1001 2018 1
        3 1001 2019 1
        3 1001 2020 1
    0 1001 2021 0
    0 1001 2022 0
    0 1011 2000 0
    0 1011 2001 0
        2 1011 2002 0
        2 1011 2003 0
        1 1011 2004 0
    0 1011 2005 0
    0 1011 2006 0
    0 1011 2007 0
    0 1011 2008 0
    0 1011 2009 0
    0 1011 2010 0
    0 1011 2011 0
    0 1011 2012 0
    0 1011 2013 0
    0 1011 2014 0
    0 1011 2015 0
    0 1011 2016 0
    0 1011 2017 0
    0 1011 2018 0
        1 1011 2019 0
    0 1011 2020 0
    0 1011 2021 0
    0 1011 2022 0
        1 1027 2000 0
    0 1027 2001 0
        2 1027 2002 0
    0 1027 2003 0
    0 1027 2004 0
    0 1027 2005 0
    0 1027 2006 0
    0 1027 2007 0
        1 1027 2008 0
        1 1027 2009 0
    0 1027 2010 0
    0 1027 2011 0
    0 1027 2012 0
        1 1027 2013 0
        3 1027 2014 0
    0 1027 2015 0
        1 1027 2016 0
    0 1027 2017 0
        1 1027 2018 1
        2 1027 2019 1
    0 1027 2020 1
    0 1027 2021 1
    0 1027 2022 1
    end
    
    bys county: g obs = _N
    
    sdid wanted county year policy, vce(bootstrap) seed(1213)
    
    Units are observed to change from treated (earlier) to untreated (later).
    A staggered adoption is assumed in which units are assumed to only change from untreated to treated, or remain untreated.

    Comment


    • #3
      I have 70,000 observations. I don't know what's happening. I've just listed the first couple of counties ! I think I need to go back to my data and figure out if anything specific is going on or not.

      thanks for your time and patience for at least giving me the reassurance that as long as I've balanced the correct way , it'll give me the result.

      Just so you have the chance on giving me further feedback. After filling in with 0 for wanted variable I went ahead and ran this command and getting the following error .

      ```
      isid county year, sort
      variables county and year should never be missing
      r(459);

      end of do-file

      r(459);

      ```
      Does that say anything about my 70,000 observations of panel data?? Since, I had to post the starting fragment of it, that's why I thought should let you have a glimpse of it - if that really means anything. Does the error message indicate I still have missing observation for one or multiple counties ??
      Last edited by Tariq Abdullah; 29 Oct 2022, 20:17.

      Comment


      • #4
        The error code you give suggests that the county/time variable itself is missing someplace, something that should truly never happen.

        Comment


        • #5
          Tariq Abdullah tsfill will create new observations for the ones that don't exist for some counties and years. However, if there are observations that have missing values for those two variables, they still remain in the dataset. If anything, the problem is made worse, because tsfill will take the missing value as another value, and create more observations to balance the panel with that. Those observations are the ones isid is complaining about.
          Last edited by Hemanshu Kumar; 30 Oct 2022, 08:41.

          Comment


          • #6
            I need to fill in 0 for wanted variable which is missing for counties in number of years. Since, filling in 0 is appropriate give the type of wanted variable, how would you recommend I can do it in the following dataset ( which is the form of my initial dataset where you can see only county 1001 has observations of wanted since 2000-2020. Two other counties - 1011 and 1027 - have missing observation.

            I've made the decision of cutting down my sample to 2000-2020 since 2021-2022 have a lot of missing observations.

            Code:
             * Example generated by -dataex-. For more info, type help dataex clear
            input float(wanted county year)
            2 1011 2002
            2 1011 2003
            1 1011 2004
            1 1011 2019
            1 1027 2000
            2 1027 2002
            1 1027 2008
            1 1027 2009
            1 1027 2013
            1 1027 2018
            4 1001 2000
            3 1001 2001
            1 1001 2002
            1 1001 2003
            3 1001 2004
            5 1001 2005
            2 1001 2006
            3 1001 2007
            2 1001 2008
            2 1001 2009
            3 1001 2010
            2 1001 2011
            7 1001 2012
            3 1001 2013
            3 1001 2014
            3 1001 2015
            2 1001 2016
            7 1001 2017
            11 1001 2018
            3 1001 2019
            3 1001 2020
            
            end
            Last edited by Tariq Abdullah; 30 Oct 2022, 13:04.

            Comment


            • #7
              If in your original dataset, county and year are not missing in any observations (I am not saying the entire observation is missing, I mean cases where the observation is present but the values of one or both of these variables are missing), then tsfill, full will not create a problem. If wanted is also never missing in the original dataset, then you can simply do

              Code:
              tsset county year
              tsfill , full
              replace wanted = 0 if missing(wanted)
              For your example data:

              Code:
              . tsset county year
              
              Panel variable: county (unbalanced)
               Time variable: year, 2000 to 2020, but with gaps
                       Delta: 1 unit
              
              . tsfill , full
              
              . replace wanted = 0 if missing(wanted)
              (32 real changes made)
              
              . 
              . isid county year
              Last edited by Hemanshu Kumar; 30 Oct 2022, 13:37.

              Comment


              • #8
                Thanks mr. Kumar I understand why I was getting the error! It's because when wanted is missing or when county year is missing it's the whole observation is missing in my data ( as it's shown above in case of county 1011 and 1027 ).

                Since, tsfill is not going to work or going to worsen the existing tricky situation , when the observation is entirely missing , then I need to go for other alternatives.

                I appreciate the thought and time you put in this thread to help me address my concern ! Appreciate your time and patience so much!

                Comment

                Working...
                X