Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trouble Exporting/Importing data and merging files

    Hi all-- I am trying to add geocoded variables to a large dataset and have encountered a strange problem. When I export my data from stata to csv and then re-import it into stata, the summary statistics for the variables of interest are the same but the exact values are different. To be more specific, I exported a file with no duplicate observations but then when I reimport the file back into stata it reports duplicate observations. Moreover, when I go to merge the new (imported from csv) file into the old (stata) file, there are many failed matches. Has anyone else had this problem? I have reduced the dataset to three variables-- 2 of type "float" and one that is a string variable. There are close to 800,000 observations so it's difficult for me to manually figure out what has been dropped and what has been duplicated. Any advice would be appreciated. Thanks!

  • #2
    this sounds like it may be a precision problem:
    Code:
    search precision
    I don't understand at all why you are exporting to csv in the first place

    Comment


    • #3
      Welcome to Statalist.

      Building on Rich's question, is it the case that you exported coordinates to a csv file, fed the csv file into a geocoding process that appended something useful (like city or ZIP code or some such), and are now attempting to match the geocoded file back to the original using the merge command in order to add the appended information to the original file?

      If so, I agree with Rich that the problem lies in the loss of precision when exporting to csv. I cannot see how to make that work reliably.

      I suggest creating an identifier in your original data (perhaps as simple as generate obsID = _n) and then including the ID in the csv file that goes to geocoding, and presumably is included in the output of the process, then using that identifier as the key in your merge of the geocoding output with your original data.

      If this isn't a correct interpretation of your problem, then please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. The more you help others understand your problem, the more likely others are to be able to help you solve your problem.

      Comment

      Working...
      X