Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Managing missing values from Excel to Stata (good practice)

    Dear Statalisters,

    I am writing you because I would like to ask you something related with good practices. I have an Excel file that have missing values. I am going to merge this Excel with a Stata file I have. Now, when I merge two files, sometimes with the missing values I have duplicates which I track with a codebook command before and after the merge.
    Now, specifically, I am writing to ask you what is a good practice, what do you recommend me, managing the missing values in excel (and Stata) with periods or empty cells?

    Thank you very much for your help.

    Best regards !!

  • #2
    If you put "." in the Excel cells, Stata will see non-numeric content in that column, and make that column a string variable, which often is not what you want. If the cell is empty, Stata will treat it as missing, represent it with the default "." value, and treat that variable as numeric, presuming there is no other non-numeric content in the Excel column. The latter is more desirable, I should think. Another thing I have done is to have data entry done in Excel using some special value (e.g., -1) rather than a blank cell to represent missing data. That way, accidentally empty cells will stand out. It's easy to later in Stata convert such values to one of ., .a, .b, ... with -recode- or -mvdecode-.

    Duplicate missing values, as you mention, often arise for me because there are rows at the end of the Excel sheet that appear blank but are not actually so, with stray blank characters and such. The strategy I recommend is to either a) go to the end of the Excel sheet and delete many rows below it and save it; or b) manually specify the row/column range on the -import excel- command so as to avoid any such rows.

    I hope this is on target for what you were thinking.

    Comment


    • #3
      Dear Mike,
      Thank you for your great advice, those are very good practices to have in mind, thank you again !!

      Comment


      • #4
        Dear Mike, I hope you could see this message, if not, I really appreciate any help.
        I followed your advice and I used an special value (e.g., -999) rather a blank cell to represent missing data. I just did a merge and now I have 104 variables containing missing data.
        My question is, are there any fast way to change the "-999" on the 104 variables at the same time and tell stata that those special value actually is missing data?
        I am using Stata 15.


        Thank you very much again for your help,

        Alejandro

        Comment


        • #5
          Code:
          help mvdecode

          Comment


          • #6
            Thank you Nick !! Ckecked ...

            Comment

            Working...
            X