Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Duplicating values to fill data

    Hi everyone!

    I have another question relating to the data below (see attachment). I want to fill YearBought with 1994 if RE_total=REatcost and year==1994 and that for the whole gvkey. I am currently using the following code:

    bys gvkey: replace YearBought = 1994 if year==1994 RE_total==REatCost & RE_total!=0 & YearBought==.

    This code however, gives me the data below so it only reports the 1994 one time but it does not fill it in for the entire GVKEY (companycode)

    Do you have any suggestions?

    Many thanks
    Rick




    Attached Files

  • #2
    Your question is a bit unclear. Is this something special for 1994 only, or if you have a gvkey where RE_total == RE_atcost in, say, 2005, would you then want to replace YearBought with 2005, assuming YearBought is currently missing and the value of RE_total and RE_atcost are non-zero.

    Also, please read the Forum FAQ about how to best show example data. Screenshots are pretty much the least helpful way to give example data. If somebody wants to test some code to make sure it works in your data, it is impossible to import the data from a screenshot into Stata, and it is too burdensome to key it in by hand. That is why we have the -dataex- command. Please always use it to show example data. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Please post back with a response to the question raised, and with example data given by -dataex-.

    Comment


    • #3
      I am sorry for the inconvinience. To answer your question. It is something special for 1994 only, however it happens reguraly throughout the dataset. Thus as you can see for the partial dataset below it happens for instance at gvkey 1010 and 1019. The total dataset is comprised of more than 500K observations. I hope this clarifies things.

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      
      clear
      input long gvkey double year long cbsacode float(RE_total REatCost YearBought)
      1004 1992 .    36.998 37.565002 1977
      1004 1993 . 37.565002 37.565002 1977
      1004 1994 .    39.328 37.565002 1977
      1004 1995 .    39.843 37.565002 1977
      1004 1996 .    58.807 37.565002 1977
      1004 1997 .    63.569 37.565002 1977
      1004 1998 .    72.454 37.565002 1977
      1004 1999 .    74.718 37.565002 1977
      1004 2000 .    77.151 37.565002 1977
      1004 2001 .    76.822 37.565002 1977
      1004 2002 .    74.407 37.565002 1977
      1004 2003 .     64.41 37.565002 1977
      1004 2004 .    58.749 37.565002 1977
      1004 2005 .     62.67 37.565002 1977
      1004 2006 . 74.392006 37.565002 1977
      1004 2007 .   115.044 37.565002 1977
      1004 2008 .     87.91 37.565002 1977
      1004 2009 .    94.389 37.565002 1977
      1004 2010 .         . 37.565002 1977
      1004 2011 .         . 37.565002 1977
      1004 2012 .         . 37.565002 1977
      1004 2013 .         . 37.565002 1977
      1004 2014 .         . 37.565002 1977
      1004 2015 .         . 37.565002 1977
      1004 2016 .         . 37.565002 1977
      1004 2017 .         . 37.565002 1977
      1004 2018 .         . 37.565002 1977
      1004 2019 .         . 37.565002 1977
      1004 2020 .         . 37.565002 1977
      1010 1992 .         .       1.4    .
      1010 1993 .         .       1.4    .
      1010 1994 .       1.4       1.4 1994
      1010 1995 .       1.7       1.4    .
      1010 1996 .       1.8       1.4    .
      1010 1997 .       1.9       1.4    .
      1010 1998 .       2.7       1.4    .
      1010 1999 .       2.7       1.4    .
      1010 2000 .       2.7       1.4    .
      1010 2001 .       2.7       1.4    .
      1010 2002 .       2.7       1.4    .
      1010 2003 .     138.4       1.4    .
      1013 1992 .         0         0    .
      1013 1993 .         0         0    .
      1013 1994 .         0         0    .
      1013 1995 .         0         0    .
      1013 1996 .         0         0    .
      1013 1997 .         0         0    .
      1013 1998 .         0         0    .
      1013 1999 .         0         0    .
      1013 2000 .     142.7         0    .
      1013 2001 .     108.8         0    .
      1013 2002 .       7.6         0    .
      1013 2003 .         3         0    .
      1013 2004 .        11         0    .
      1013 2005 .        20         0    .
      1013 2006 .      10.1         0    .
      1013 2007 .         6         0    .
      1013 2008 .       6.6         0    .
      1013 2009 .       9.1         0    .
      1013 2010 .       6.3         0    .
      1019 1992 .         .      .344    .
      1019 1993 .         .      .344    .
      1019 1994 .      .344      .344 1994
      1019 1995 .      .526      .344    .
      1019 1996 .      .951      .344    .
      1019 1997 .     1.001      .344    .
      1019 1998 .      .307      .344    .
      1019 1999 .      .437      .344    .
      1019 2000 .      .464      .344    .
      1019 2001 .      .364      .344    .
      1025 1992 .         .         0    .
      1025 1993 .         .         0    .
      1025 1994 .         0         0    .
      1025 1995 .         0         0    .
      1034 1992 .    59.303    66.379 1983
      1034 1993 .    66.379    66.379 1983
      1034 1994 .   111.669    66.379 1983
      1034 1995 .   119.452    66.379 1983
      1034 1996 .   126.695    66.379 1983
      1034 1997 .   125.168    66.379 1983
      1034 1998 .   151.159    66.379 1983
      1034 1999 .   146.723    66.379 1983
      1034 2000 .   205.623    66.379 1983
      1034 2001 . 295.20102    66.379 1983
      1034 2002 .   317.386    66.379 1983
      1034 2003 . 287.51202    66.379 1983
      1034 2004 .   289.713    66.379 1983
      1034 2005 .    105.94    66.379 1983
      1034 2006 .   124.986    66.379 1983
      1034 2007 .   173.554    66.379 1983
      1036 1992 .   113.409    129.24 1981
      1036 1993 .    129.24    129.24 1981
      1036 1994 .         .    129.24 1981
      1036 1995 .         .    129.24 1981
      1036 1996 .         .    129.24 1981
      1036 1997 .         .    129.24 1981
      1036 1998 .         .    129.24 1981
      1036 1999 .         .    129.24 1981
      1036 2000 .         .    129.24 1981
      1037 1992 .         .         0    .
      end

      Comment


      • #4
        Code:
        by gvkey, sort: egen Bought1994 = ///
            max(year == 1994 & RE_total == REatCost & !inlist(RE_total, 0, .))
        replace YearBought = 1994 if Bought1994 & missing(YearBought)
        will probably do the trick.

        My worry is about how the variables RE_total and REatCost were created. I see that they are numbers with 2 or 3 decimal places. There may be precision issues. Most decimal fractions have no exact finite-precision binary representation (just like, for example, 1/3 has no exact finite-precision decimal representation). Consequently, when calculations are done with floating point numbers, there can be rounding and truncation errors. So even if the theoretically correct value for both numbers is .344, if they were calculated in different ways, the underlying representations that Stata is working with might differ in their last few binary digits as the rounding and truncation errors accumulate, and this code will then not recognize them as equal. Conditioning anything on exact equality of floating point numbers is hazardous (in any software that relies on binary floating point calculations, not just Stata). So I would check the results very carefully to see if anything is missed with this code. Alternatively, you might want to consider relaxing -(Re_total == REatCost)- to something like -(abs(RE_total - REatCost) < 0.0005)-, which would capture agreement to 3 decimal places, or something like that.

        Comment


        • #5
          Thanks a lot, it worked!

          Comment

          Working...
          X