Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Treat numeric variables as factor variables

    I have a dataset with 3500 companies. Each company has a unique company ID (variable name is id), which is all number (To be precise, the data type is long $12.0g when I use inshee to load the data).
    I can use reg y x i.id

    I also tried
    tostring id, gen(id2)
    encode id2, gen(id2_c)
    then
    reg y x i.id2_c

    I thought the two should have given me the identical result on x, but the fact is they are not. I am puzzle why is that. Your help would be greatly appreciated.

  • #2
    Mozi:
    welcome to this forum.
    Please find below a frequently quoted warning message to be read before using -encode- and trusting its outcomes (from -encode- help file):
    encode creates a new variable named newvar based on the string variable varname, creating, adding to, or just using (as necessary) the value label newvar or, if specified, name. Do not use encode if
    varname contains numbers that merely happen to be stored as strings; instead, use generate newvar = real(varname) or destring; see real() or [D] destring.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Greatly appreciated, Carlo.
      I indeed saw the quoted part before, but completely forgot that when working on the data.
      Just to confirm, I can directly do reg y x i.id in my case, right?

      Originally posted by Carlo Lazzaro View Post
      Mozi:
      welcome to this forum.
      Please find below a frequently quoted warning message to be read before using -encode- and trusting its outcomes (from -encode- help file):

      Comment


      • #4
        Mozi:
        yes, you can go that way, provided that your -id- is actually in numeric format.
        As an aside, I doubt that you can obtain useful results with two predictors only (if this is the case).
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thanks greatly, Carlo.
          The model with two predictors only was just for illustration purpose ;-)

          Originally posted by Carlo Lazzaro View Post
          Mozi:
          yes, you can go that way, provided that your -id- is actually in numeric format.
          As an aside, I doubt that you can obtain useful results with two predictors only (if this is the case).

          Comment


          • #6
            Mozi:
            actualli, I thought it was...but just in case...
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              You might want to provide a data sample where the problem occurs, because I think that they should be giving you the same results.

              Comment


              • #8
                Thanks, Joro. I also thought they should have given the same result.
                Let me prepare and post a data sample. Meanwhile, maybe Carlo ore other experts could further illustrate why encoding string could raise issues and thus, not recommended.
                By the way, I've also tried R and got the same issue, so this may not be a Stata-specific question.

                Originally posted by Joro Kolev View Post
                You might want to provide a data sample where the problem occurs, because I think that they should be giving you the same results.

                Comment


                • #9
                  Mozi:
                  as you can see in the following toy-example, -encode- and -destring- treat the same numbers differently:
                  Code:
                  . set obs 5
                  number of observations (_N) was 0, now 5
                  
                  . g num="1" in 1
                  (4 missing values generated)
                  
                  . replace num="14" if num==""
                  variable num was str1 now str2
                  (4 real changes made)
                  
                  . encode num, g(nostring)
                  
                  . egen check=sum( nostring )
                  
                  . destring num, g(nostring_2)
                  num: all characters numeric; nostring_2 generated as byte
                  
                  . egen check2=sum( nostring_2 )
                  
                  . list
                  
                       +--------------------------------------------+
                       | num   nostri~2   nostring   check   check2 |
                       |--------------------------------------------|
                    1. |   1          1          1       9       57 |
                    2. |  14         14         14       9       57 |
                    3. |  14         14         14       9       57 |
                    4. |  14         14         14       9       57 |
                    5. |  14         14         14       9       57 |
                       +--------------------------------------------+
                  
                  . label list
                  nostring:
                             1 1
                             2 14
                  
                  .
                  Kind regards,
                  Carlo
                  (Stata 19.0)

                  Comment

                  Working...
                  X