Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Stata not recognising binary variables as integers

    Hello,

    I have a binary variable (X1) which is already in ones and zeroes on an Excel spreadsheet and I'm trying to interact this with a continuous variable (X2) but every time I do I get the error message 'factor variables may not contain noninteger values' for the binary variable. I've tried the command to declare the variable - xtreg y i.x1#c.x2, fe but that's how I keep getting the error message. There's maybe one or two missing values but other than that there's nothing out of place in the dat - I wondered if it was something to do with missing values

    Does anyone have any ideas on why this is happening or what I can do?

    Thanks in advance.

  • #2
    One way to find out:

    Code:
    l X1 if X1!= int(X1), sep(0)
    Last edited by Andrew Musau; 01 Aug 2022, 08:02.

    Comment


    • #3
      May:
      welcome to this forum.
      I would investigate (and share on this forum thereafter) the -format- of your variable(s), as you can see in the following toy-example:
      Code:
      . use "C:\Program Files\Stata17\ado\base\a\auto.dta"
      (1978 automobile data)
      
      . format*
      
        Variable name  Display format
        -----------------------------
        make           %-18s
        price          %8.0gc
        mpg            %8.0g
        rep78          %8.0g
        -----------------------------
        headroom       %6.1f
        trunk          %8.0g
        weight         %8.0gc
        length         %8.0g
        turn           %8.0g
        -----------------------------
        displacement   %8.0g
        gear_ratio     %6.2f
        foreign        %8.0g
        -----------------------------
      
      .
      I addition, what you're experiencing has nothing to do with missing values, as Stata simply omit all observations with at least a missing value in any variables (listwise deletion).
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Hello,

        Thank you for the responses, I couldn't get your command to work Andrew

        I used factor* and it says %10.0g which it what it says for the rest of them except for the id variable

        Comment


        • #5
          Originally posted by May Kingston View Post
          I couldn't get your command to work Andrew
          What does that mean? It did not display anything? Can you show exactly what you typed and what you got back?

          Comment


          • #6
            Please show results of

            Code:
            tab X1
            or

            Code:
            tab x1
            depending on which is the real name. Indeed, is that the problem, as you use both names in your post #1 and it’s entirely legal to have one of each?

            Comment


            • #7
              So that indicates that your variable is stored in double precision. That is fine, it could still be an integer as far as the factor variable notation is involved. However, we know that that is not the case. Next question: are all values positive? With all values I mean all, that is without a single exception. So no missing values coded as -99.
              ---------------------------------
              Maarten L. Buis
              University of Konstanz
              Department of history and sociology
              box 40
              78457 Konstanz
              Germany
              http://www.maartenbuis.nl
              ---------------------------------

              Comment


              • #8
                x1/X1 is called ci2m I just called it x1 for simplicity. This is what happens when I use tab which is bizarre, it's almost like it's pulling the data from another variable because ci2 is purely '0' and '1'

                tab ci2

                ci2 | Freq. Percent Cum.
                ------------+-----------------------------------
                0 | 130 10.23 10.23
                .0604027 | 3 0.24 10.46
                .1208054 | 1 0.08 10.54
                .165697 | 758 59.64 70.18
                .1812082 | 2 0.16 70.34
                .2260998 | 16 1.26 71.60
                .2416109 | 3 0.24 71.83
                .2488163 | 12 0.94 72.78
                .2834731 | 5 0.39 73.17
                .2865025 | 3 0.24 73.41
                .3020136 | 15 1.18 74.59
                .309219 | 1 0.08 74.67
                .3469052 | 3 0.24 74.90
                .3696218 | 1 0.08 74.98
                .4073079 | 4 0.31 75.30
                .4145133 | 101 7.95 83.24
                .4491701 | 3 0.24 83.48
                .474916 | 5 0.39 83.87
                .5095729 | 1 0.08 83.95
                .5353188 | 2 0.16 84.11
                .5926921 | 2 0.16 84.26
                .5957215 | 2 0.16 84.42
                .6303783 | 1 0.08 84.50
                .6530948 | 3 0.24 84.74
                .6561242 | 5 0.39 85.13
                .690781 | 1 0.08 85.21
                .6979864 | 26 2.05 87.25
                .7134976 | 3 0.24 87.49
                .7165269 | 10 0.79 88.28
                .7511837 | 2 0.16 88.43
                .7583891 | 3 0.24 88.67
                .7739003 | 2 0.16 88.83
                .8187919 | 5 0.39 89.22
                .834303 | 5 0.39 89.61
                .8791946 | 5 0.39 90.01
                .9395973 | 9 0.71 90.72
                1 | 118 9.28 100.00
                ------------+-----------------------------------
                Total | 1,271 100.00

                Comment


                • #9
                  I think what's happened is Excel is rounding them up and down and displaying them as ones and zeroes but Stata is using the original values that it can somehow see but the data is supposed to be binary from the source - really weird

                  Comment


                  • #10
                    Clearly Stata was right in reporting what it did. Perhaps the 0s and. 1s occur in the first 248 or so observations and the other values are in later observations and originated elsewhere. Such messes can arise in various ways, such as copy and paste.

                    Comment


                    • #11
                      #10 crossed with #9. If Stata is so stupid that it can’t infer the intent you have from the content you supply, then you may need to give a push with the round() function.

                      Comment

                      Working...
                      X