Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • combining datasets where want to replace incorrect values of one variable

    In my master dataset I have incorrect values of l for the years 1997-2000. I want to replace these values with the correct values of l that are found in the second dataset, whilst keeping the values for 2001-2012 of the master set but replacing the values of l for 1997-2000.

    a snapshot of my master dataset that includes multiple other variables (of which the values for 1997-2000 are correct):

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input double naic float(year yq) double(ts l)
    311111 1997  971 148  7473.666717529297
    311111 1997  972 149  7517.999938964844
    311111 1997  973 150  7861.999938964844
    311111 1997  974 151  7810.999908447266
    311111 2001  101 164 19.290000915527344
    311111 2001  102 165 19.306665420532227
    311111 2001  103 166  19.30699920654297
    311111 2001  104 167 19.410999298095703
    311111 2002  201 168 18.031333923339844
    311111 2002  202 169  17.89900016784668
    311111 2002  203 170 17.991334915161133
    311111 2002  204 171 18.222000122070313
    311111 2003  301 172  17.95333480834961
    311111 2003  302 173 18.000333786010742
    311111 2003  303 174 17.992666244506836
    311111 2003  304 175 18.002334594726563
    311111 2004  401 176  17.69933319091797
    311111 2004  402 177  17.59000015258789
    311111 2004  403 178 17.657665252685547
    311111 2004  404 179  17.56999969482422
    311111 2005  501 180 17.893665313720703
    311111 2005  502 181  17.94499969482422
    311111 2005  503 182  17.93899917602539
    311111 2005  504 183 18.209665298461914
    311111 2006  601 184  18.33966636657715
    311111 2006  602 185 18.463666915893555
    311111 2006  603 186 18.743999481201172
    311111 2006  604 187   18.8613338470459
    311111 2007  701 188  18.56533432006836
    311111 2007  702 189 18.441665649414063
    311111 2007  703 190 18.736000061035156
    311111 2007  704 191 19.071334838867188
    311111 2008  801 192 18.977333068847656
    311111 2008  802 193 18.920000076293945
    311111 2008  803 194  18.89433479309082
    311111 2008  804 195  18.94499969482422
    311111 2009  901 196 19.645666122436523
    311111 2009  902 197 19.806333541870117
    311111 2009  903 198 19.895999908447266
    311111 2009  904 199 20.117000579833984
    311111 2010 1001 200 20.121999740600586
    311111 2010 1002 201 20.454666137695313
    311111 2010 1003 202 20.538665771484375
    311111 2010 1004 203 20.645334243774414
    311111 2011 1101 204 20.581666946411133
    311111 2011 1102 205 20.526334762573242
    311111 2011 1103 206 20.444334030151367
    311111 2011 1104 207  20.64266586303711
    311111 2012 1201 208 21.030000686645508
    311111 2012 1202 209 21.009666442871094
    311111 2012 1203 210  21.57699966430664
    311119 1997  971 148 10721.333526611328
    311119 1997  972 149 10781.666717529297
    311119 1997  973 150 10857.333312988281
    311119 1997  974 151 10681.666534423828
    311119 1998  981 152  10559.33334350586
    311119 1998  982 153  10758.99984741211
    311119 1998  983 154 11051.666702270508
    311119 1998  984 155 10735.666534423828
    311119 1999  991 156 10794.666721343994
    311119 1999  992 157 10904.000091552734
    311119 1999  993 158   10952.3332862854
    311119 1999  994 159 11109.333095550537
    311119 2000    1 160 10597.666721343994
    311119 2000    2 161 10694.333358764648
    311119 2000    3 162 10609.333263397217
    311119 2000    4 163 10474.666610717773
    311119 2001  101 164   33.9913330078125
    311119 2001  102 165  33.94900131225586
    311119 2001  103 166  33.83266830444336
    311119 2001  104 167  33.39033126831055
    311119 2002  201 168   32.8736686706543
    311119 2002  202 169   33.0629997253418
    311119 2002  203 170  32.67166519165039
    311119 2002  204 171  32.72200012207031
    311119 2003  301 172 31.927000045776367
    311119 2003  302 173 31.743000030517578
    311119 2003  303 174 31.352333068847656
    311119 2003  304 175 31.140666961669922
    311119 2004  401 176 31.832666397094727
    311119 2004  402 177 31.476999282836914
    311119 2004  403 178 31.314334869384766
    311119 2004  404 179 31.217666625976563
    311119 2005  501 180   30.5049991607666
    311119 2005  502 181 30.557334899902344
    311119 2005  503 182  30.54666519165039
    311119 2005  504 183 30.645334243774414
    311119 2006  601 184 30.569000244140625
    311119 2006  602 185 30.676334381103516
    311119 2006  603 186 30.516000747680664
    311119 2006  604 187 30.665666580200195
    311119 2007  701 188 31.040000915527344
    311119 2007  702 189  31.49566650390625
    311119 2007  703 190 31.513999938964844
    311119 2007  704 191 31.836666107177734
    311119 2008  801 192  32.09866714477539
    311119 2008  802 193  32.03266525268555
    311119 2008  803 194  31.94099998474121
    311119 2008  804 195 31.758333206176758
    311119 2009  901 196 31.666000366210938
    end

    The correct values for l in the years 1997-2000 are given by:


    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long naic int year byte qtr double l float yq
    311111 1997 1  7.473666717529297 971
    311111 1997 2  7.517999938964843 972
    311111 1997 3  7.861999938964844 973
    311111 1997 4  7.810999908447266 974
    311111 1998 1 7.1313334045410155 981
    311111 1998 2  7.275999969482422 982
    311111 1998 3  7.220666595458985 983
    311111 1998 4  6.960333282470703 984
    311111 1999 1  7.442666687011719 991
    311111 1999 2 7.3666666259765625 992
    311111 1999 3  7.460999969482422 993
    311111 1999 4  7.327666809082031 994
    311111 2000 1  7.397333343505859   1
    311111 2000 2   7.18933334350586   2
    311111 2000 3  7.208333312988281   3
    311111 2000 4  7.241333282470703   4
    311119 1997 1 10.721333526611328 971
    311119 1997 2 10.781666717529298 972
    311119 1997 3 10.857333312988281 973
    311119 1997 4 10.681666534423828 974
    311119 1998 1 10.559333343505859 981
    311119 1998 2  10.75899984741211 982
    311119 1998 3 11.051666702270508 983
    311119 1998 4 10.735666534423828 984
    311119 1999 1 10.794666721343994 991
    311119 1999 2 10.904000091552735 992
    311119 1999 3   10.9523332862854 993
    311119 1999 4 11.109333095550538 994
    311119 2000 1 10.597666721343995   1
    311119 2000 2 10.694333358764649   2
    311119 2000 3 10.609333263397216   3
    311119 2000 4 10.474666610717774   4
    311211 1997 1 2.1346667022705077 971
    311211 1997 2 2.1440000457763673 972
    311211 1997 3  2.167000030517578 973
    311211 1997 4 2.2403333892822266 974
    311211 1998 1 2.4243333587646485 981
    311211 1998 2  2.235999984741211 982
    311211 1998 3              2.278 983
    311211 1998 4  2.295333343505859 984
    311211 1999 1              2.248 991
    311211 1999 2  2.182333343505859 992
    311211 1999 3 2.2743333587646486 993
    311211 1999 4 2.3406666107177734 994
    311211 2000 1 2.3419999847412107   1
    311211 2000 2 2.2726667022705076   2
    311211 2000 3  2.376333297729492   3
    311211 2000 4 2.3756666259765624   4
    311212 1997 1 2.1860000610351564 971
    311212 1997 2 2.1589999389648438 972
    311212 1997 3  2.246999938964844 973
    311212 1997 4  2.342333312988281 974
    311212 1998 1  2.149000061035156 981
    311212 1998 2              2.025 982
    311212 1998 3 1.9656666870117188 983
    311212 1998 4 2.2366666259765626 984
    311212 1999 1 2.3176666870117186 991
    311212 1999 2 2.0959999389648436 992
    311212 1999 3 2.0243333129882815 993
    311212 1999 4  2.221333312988281 994
    311212 2000 1              2.119   1
    311212 2000 2 2.1719999389648437   2
    311212 2000 3 2.0836666259765626   3
    311212 2000 4 2.2383333129882814   4
    end

  • #2
    You can -merge- the correct values onto your original file. (See -help merge-.) While it's possible to do this with the -update- option on -merge-, I'd rather bring in the correct values under a new variable name, check them out, and use them to replace the bad original values after eyeballing them. You did not tell us which variables link observations between your two files (the "key" variables, in common terminology), but I surmise that they are naic and yq, and that observations identified by those variables appear only once in each file.


    Code:
    use CorrectFile.dta
    rename l lcorrect
    save CorrectFile.dta
    //
    use "YourOriginalFile.dta"
    merge 1:1 naic yq using YourCorrectFile.dta", keepusing(lcorrect)
    // Your example data has observations identified by naic/yq that are present in "correct" file but
    // are not in your original.  I'll presume these are irrelevant and drop them.
    drop if _merge == 2 // 
    //
    // Presuming you have examined the results and they look ok
    replace l = lcorrect if inrange(year, 1997, 2000)
    drop lcorrect

    Comment

    Working...
    X