Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Replacing values with previous Census Year

    Hello Statalist users,

    I am working in Stata version 13.1 and am trying to replace missing values with a previous CENSUS year’s value.
    My data is panel data from multiple censuses with counties as the groups. However, different censuses are produced at different years. The beale codes and natural amenities data is only completed every 10 years. I would like to copy the census data for Beale codes and amenities factors for years that do not have this data. For example, I would like to copy the data from 1974 to 1978, 1982 to 1987 and so forth. An example of the data I am working with is below:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int FIPS str9 COUNTY int YEAR byte(BEALE URBAN_INFL) float NATURAL_SCALE byte NATURAL_RANK
    17001 "ADAMS"     1974 5 7 -2.1 3
    17001 "ADAMS"     1978 . .    . .
    17001 "ADAMS"     1982 5 7 -2.1 3
    17001 "ADAMS"     1987 . .    . .
    17001 "ADAMS"     1992 5 7 -2.1 3
    17001 "ADAMS"     1997 . .    . .
    17001 "ADAMS"     2002 5 7 -2.1 3
    17001 "ADAMS"     2007 . .    . .
    17001 "ADAMS"     2012 5 7 -2.1 3
    17003 "ALEXANDER" 1974 7 8  .95 4
    17003 "ALEXANDER" 1978 . .    . .
    17003 "ALEXANDER" 1987 . .    . .
    17003 "ALEXANDER" 1992 7 8  .95 4
    17003 "ALEXANDER" 1997 . .    . .
    17003 "ALEXANDER" 2002 7 8  .95 4
    17003 "ALEXANDER" 2007 . .    . .
    17003 "ALEXANDER" 2012 3 8  .95 4
    end
    Using the Stata support documents at:
    HTML Code:
     http://www.stata.com/support/faqs/data-management/replacing-missing-values/
    I found a code in order to replace values with the previous observation after sorting by FIPS and YEAR:

    Code:
    bys FIPS: replace BEALE= BEALE[_n-1] if YEAR==1978
    Although this works for 1978 as 1974 would be the only year prior, other years may not work properly. For instance, if 1982 is missing (see the county ALEXANDER with FIPS 17003 in example) then the code below will replace 1987 with 1978 data which is ultimately from 13 years prior in 1974.

    Code:
    bys FIPS: replace BEALE= BEALE[_n-1] if YEAR==1987
    Therefore, I would need the code to replace if 1982 was available, but leave blank if 1982 was not available.
    Ultimately, due to the nature of this data and its limited change, it may not impact a considerable amount by using a 14 year difference (between 1987 and 1974 for example). However, to be safe, I am really hoping I can get some assistance in trying to figure out a method for the above problem. Thank you for any and all suggestions.

    Kind Regards,
    Amie

  • #2
    So, this is a little complicated because for county ALEXANDER, you don't have 1982 with a missing value for BEALE. You don't have any observation for year 1982. So first we have to create those. Then we can remove them when we're done. I think this will do what you want:

    Code:
    fillin COUNTY YEAR
    clonevar new_BEALE = BEALE
    by COUNTY (YEAR), sort: replace new_BEALE = BEALE[_n-1] if missing(new_BEALE)
    drop if _fillin
    By the way, thank you very much for using -dataex- to give your sample data. It makes troubleshooting so much easier!

    Comment


    • #3
      Hi Mr. Schechter,

      Yes! Friedrich Huebler introduced me to dataex and it has been wonderful to use here on Statalist. It truly makes data description easier for me and I am glad it is helpful for advisers like yourself.


      As for the code you provided, it almost works! I just tried it with the entire dataset and realized that the example dataset I gave does not entirely encompass the issues of a my larger dataset. I have found the problem is that there are sometimes multiple counties with the same name in a different state. Therefore, they have different FIPS codes but the same county name.

      For example, there are multiple ADAMS counties throughout the US. Below is a an example of this with two Adams counties and one missing 1982 observation:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input int FIPS str9 COUNTY int YEAR byte(BEALE URBAN_INFL) float NATURAL_SCALE byte NATURAL_RANK
      17001 "ADAMS"     1974 5 7  -2.1 3
      17001 "ADAMS"     1978 . .     . .
      17001 "ADAMS"     1982 5 7  -2.1 3
      17001 "ADAMS"     1987 . .     . .
      17001 "ADAMS"     1992 5 7  -2.1 3
      17001 "ADAMS"     1997 . .     . .
      17001 "ADAMS"     2002 5 7  -2.1 3
      17001 "ADAMS"     2007 . .     . .
      17001 "ADAMS"     2012 5 7  -2.1 3
      17003 "ALEXANDER" 1974 7 8   .95 4
      17003 "ALEXANDER" 1978 . .     . .
      17003 "ALEXANDER" 1987 . .     . .
      17003 "ALEXANDER" 1992 7 8   .95 4
      17003 "ALEXANDER" 1997 . .     . .
      17003 "ALEXANDER" 2002 7 8   .95 4
      17003 "ALEXANDER" 2007 . .     . .
      17003 "ALEXANDER" 2012 3 8   .95 4
      18001 "ADAMS"     1974 2 2 -3.55 2
      18001 "ADAMS"     1978 . .     . .
      18001 "ADAMS"     1987 . .     . .
      18001 "ADAMS"     1992 2 2 -3.55 2
      18001 "ADAMS"     1997 . .     . .
      18001 "ADAMS"     2002 6 2 -3.55 2
      18001 "ADAMS"     2007 . .     . .
      18001 "ADAMS"     2012 6 2 -3.55 2
      end
      When using
      Code:
       fillin COUNTY YEAR
      it ignores the FIPS number and in this example, it does not see that Adams county in Indiana (FIPS==18001) is missing 1982. I will continue to try to work on this situation, but ultimately, I may have to move forward with the original code which will just use the last known value since the values change in relatively small proportions.

      Thank you again!

      Amie

      Comment


      • #4
        So, just a little change to the earlier solution is needed, creating a new variable that uniquely identifies counties by their FIPS and COUNTY values combined:

        Code:
        egen new_COUNTY = group(FIPS COUNTY)
        fillin new_COUNTY YEAR
        clonevar new_BEALE = BEALE
        by new_COUNTY (YEAR), sort: replace new_BEALE = BEALE[_n-1] if missing(new_BEALE)
        drop if _fillin
        drop new_COUNTY

        Comment


        • #5
          Hi Mr. Schechter,

          It worked perfect. Ironically enough, I only had one county that ends with a missing new Beale code. I guess it did not make too much of a difference in the end; however, it does make me feel better knowing that it is replacing with the appropriate values.

          Thank you very much for all your help.

          Amie

          Comment

          Working...
          X