Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Error in importing Excel data into STATA

    Hi everyone,

    I am importing panel data from an Excel file into STATA, but for three dichotomous variables (original value is 1/0) I get missing values (i.e., a '.') for certain years. The data in the Excell file is arranged in multiple sheets which are named according to the year, e.g., 2005, 2006.....2019, and 2020. For 2020 and 2019, I get correct values of 1 or 0 in the STATA. But for all other years, I get a "." I am using the following loop to import my data:


    forvalues i= 2005/2020 {
    import excel using fin_aid_data_2023.xlsx, firstrow sheet(`i') clear

    if `i'==2005 {
    save fin_aid_data, replace
    }

    else {
    append using fin_aid_data, force
    save fin_aid_data, replace
    }

    }

    I will appreciate your advice.

    Ijaz

  • #2
    -force- is always a red flag, and is probably the source of your problem.

    Why did you use the -force- option in the first place? I imagine that it was because without it, the -append- command broke your code with an error message saying that there was some variable (perhaps more than one) whose type is inconsistent between the data in memory and the file being appended (i.e., one is a string and the other is numeric). So you decided to sweep that under the rug with -force-. Now you are paying the price for that. When there is inconsistency in types between variables, the only way that -append- can proceed is by replacing the inconsistent values with missing values, because, unlike a spreadsheet, Stata cannot have a variable with different types in different observations.

    If the scenario I have outlined above is what happened, then the solution lies in fixing up the incompatibility. Since what you are expecting is a 0/1 variable, which should be numeric, you need to go back to the spreadsheets for which you are getting missing values and identify why that variable is not numeric in those tabs. One possibility is that the first row of data is blank for the affected variables. Another possibility is that somewhere in the columns for those variables you will find actual non-numeric values such as "?" or "N/A" or something like that. Once you have determined what is causing the problem, you can modify your code so that in the affected files, you -destring- those variables, specifying the unwanted non-numeric material in the -ignore()- option. (See -help destring- if you are not familiar with how to use it. And, please, don't use the -force- option with -destring- either!) Then you will be able to smoothly append everything.

    If this scenario is not what happened, please go back and import each tab into a separate Stata data set, and then use the -dataex- command to show example data from a couple of them that imported correctly and from a couple of them that did not so that further investigation can be done. I cannot emphasize enough that for this purpose nothing but a -dataex- display of the example data will be helpful: do not waste your time with a screenshot, or -list-, or a hand-made table. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      I think that Clyde Schechter guess is probably right - but I'm going to suggest an alternative - when importing from Excel, use the "allstring" option to that all variables are imported as strings - it is easy, once in Stata to use, e.g., -destring- to make appropriate variables numeric

      Comment


      • #4
        Thank you so much Clyde Schechter and Rich Goldstein. Clyde is right that without using -force- I was getting this message: "variable VAR1 is str4 in master but byte in using data. You could specify append's force option to ignore this string/numeric mismatch." I have just noticed that all three problematic dichotomous variables had "NULL" in a couple of observations. So, I used -allstring- option to import all the data as strings and converted it into numeric form. Thank you again both of you.

        Comment


        • #5
          General moral of the story: don't use -force- options. Or at least don't use them unless you are 100% certain that you know why you are getting those error messages and that ignoring them will not create problems. And even in those circumstances, it is usually better to fix the problem that yields the error messages anyway.

          Comment


          • #6
            Thank you, Clyde. My apologies; I didn't see your last message until today.

            Comment

            Working...
            X