Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert variables to numeric results in data loss when a regression is run

    Dear all,

    A dataset with 11 variables was imported to STATA using File -> Import Excel spreadsheet. Then, Data -> Create or change data -> Convert variables from string to numeric was used.

    Stata recognised two types of numeric variables: byte for dummy and double for other. Function "describe" identifies 184 observations, but a regression that I run provides me with only 142 observations.

    Dataex reveals that some values were replaced into dots instead of numbers, however in the original excel file all values seem to be fine. It contains numerical values and there should not be any issues.

    Could you please help me fix this problem?
    I'm looking forward to any ideas on how it can be fixed

  • #2
    Madlen Myers the "dots" represent missing values in Stata and a regression drops all missing values in the dependent and independent variables

    Comment


    • #3
      Most likely the following thing has happened:

      Code:
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input float var1 str3 var2
      1 "bla"
      2 "5"  
      end
      . destring, force replace var1 already numeric; no replace var2: contains nonnumeric characters; replaced as byte (1 missing value generated) . list +-------------+ | var1 var2 | |-------------| 1. | 1 . | 2. | 2 5 | +-------------+ .

      Comment


      • #4
        Thank you Tom Scott and Joro Kolev for your answers. Yes, indeed that is the issue, but I'm looking for solutions. Please, understand that in the original excel files I have no missing values. I want to fix this mistake in stata to have a complete dataset to run regressions.

        Comment


        • #5
          Find some of your missing values in the Stata numeric data.

          Then find the corresponding values in the Stata string data.

          Then find the corresponding values in the Excel data.

          Please understand that
          • in the original Excel files you have cells containing values that are not interpretable as numbers by Stata
          • the Excel columns they appear in are imported as Stata string variables
          • when the string variables are converted to numeric variables, Stata can do no better than convert them to missing
          So your problem is to find these cells in Excel and try to understand in what way they are different from other values that were successfully imported.

          We can perhaps help, but we cannot see your data, as you have shown us nothing. Since you know dataex, use it to show the string variable and the numeric variable containing missing values.

          Comment


          • #6
            Thank you William Lisowski! According to FAQ, I'm not allowed to share screenshots or files from Stata. I will describe two other attempts to fix it after having read your comment.

            1) I tried to format my original data in Excel by multiplying every value by a thousand to have integers (I'm working with cumulative abnormal returns and they all have 4 decimal points after comma). However, the amount of missing values remained the same

            2) I resaved my original file as a csv, but unfortunately it didn't help either as it was read incorrectly.

            Is there anything else that can help?

            Comment


            • #7
              Correct, a screen shot is rarely usefui.

              Instead, follow the advice in post #5 and use dataex to display some of the "before-and-after" string and (missing) numeric values. The output of dataex will look something like the following.
              Code:
              ----------------------- copy starting from the next line -----------------------
              [CODE]
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input int(x1 x2 x3) float x4 int x5 byte x6
               4195 24 1   2 10 0
              10371 16 3 3.5 17 0
               4647 28 3   2 11 0
              ...
               5079 24 4 2.5  8 1
               8129 21 4 2.5  8 1
               4296 21 3 2.5 16 1
              end
              label values x6 yesno
              label def yesno 0 "No", modify
              label def yesno 1 "Yes", modify
              [/CODE]
              ------------------ copy up to and including the previous line ------------------
              Copy all the lines starting with the line conatining [CODE] and finishing with the line containing [/CODE] and paste them into your post.

              Comment


              • #8
                You wrote in post #6
                I'm working with cumulative abnormal returns and they all have 4 decimal points after comma)
                What you mean, I think, is that your string numbers have a comma followed by four digits
                Code:
                3,1415
                rather than a decimal point followed by four digits
                Code:
                3.1415
                In that case you neglected to to tell Data -> Create or change data -> Convert variables from string to numeric to treat commas as decimal points.

                Comment


                • #9
                  In short, if we do not see how your data looks like before, and after conversion, and you need to provide these samples using -dataex-, it is all guessing what has gone wrong.

                  Provide one data sample using -dataex- before you did the conversion, and one data sample with -dataex- after you did the conversion.

                  Comment

                  Working...
                  X