Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why doesn't Stata allow xtset with string variables?

    As you all probably know, --xtset- (and -tsset-) takes an id and time argument in order to declare panel data. Stata requires that both id and time be numeric variables. I want to understand the reasoning behind this requirement, as theoretically, there is no reason why id must be numeric. To be clear, I'm not asking for help, as I know how to work around this problem. I'm just wondering if anyone can explain the design choice to not allow -xtset- to work with string variables for the panel identifier, for my own edification (obviously time must be numeric so it can be sorted, but I don't see any reason why id has to be numeric. Indeed, most other statistical software does not require as such).

    Thanks!

  • #2
    The reason time must be numeric is obvious. However for identifiers, I can't speculate as to *why*, as that is a question for StataCorp. But for consistency, many commands only accept numerical identifiers to denote things like panels, clusters, groups. Numerical values are immensely convenient for matrix operations because Stata matrices can only accept numbers, and in Mata one can only have string or numeric matrices (and not intermixed). Keeping everything numeric makes it much easier to manipulate by the code in the background.

    One way to make unique numeric ids for your panel is some variation of this theme (using pseudocode):

    Code:
    by panelid_str (timevar), sort : gen byte first = _n==1
    gen panelid = sum(first)
    drop first

    Comment


    • #3
      Thanks, Leonardo. Requiring numeric values for consistency or computational convenience makes sense; however it seems that this should be handled internally (as I assume other statistical software does) rather than inconveniencing the user.

      Comment


      • #4
        I don't disagree with you, but there is no right or wrong way here. Like I said, it's ultimately a design decision from StataCorp. A few additional lines of code, however, are hardly an inconvenience, especially as this would need to be done once only after your panel id variable is defined. It would also be a mistake to expect that Stata (or any other software) behave the same way or offer the same functionality as some other software. Like languages, every program is different, and the way to make yourself understood is to express yourself in the way that is natural to that language (or software).

        Comment


        • #5
          I agree with OP that not allowing string variables in -xtset- is a silly design choice by Stata Corp that cannot be justified on any rational grounds. As OP points out, if I can do something manually, there is no reason for why the software should not handle it automatically for me.

          Such design choices also have long lasting impact on habits. I, for example, altogether try to bypass the xt commands when I can, partially due to this design choice. E.g.:

          Code:
          . sysuse auto
          (1978 automobile data)
          
          . xtset make
          string variables not allowed in varlist;
          make is a string variable
          r(109);
          
          . areg price mpg, absorb(make)
          note: mpg omitted because of collinearity.
          
          Linear regression, absorbing indicators             Number of obs     =     74
          Absorbed variable: make                             No. of categories =     74
                                                              F(0, 0)           =      .
                                                              Prob > F          =      .
                                                              R-squared         = 1.0000
                                                              Root MSE          = 0.0000
          
          ------------------------------------------------------------------------------
                 price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
          -------------+----------------------------------------------------------------
                   mpg |          0  (omitted)
                 _cons |   6165.257          .        .       .            .           .
          ------------------------------------------------------------------------------
          F test of absorbed indicators: F(73, 0) =    .                Prob > F =    .
          
          .
          So -areg- can absorb a sting variable, but -xtset- cannot work on a string variable... Go figure... But to me, and if you just want to run the fixed effects regression, -areg- is superiour and more convenient thatn -xtreg, fe-.

          Comment

          Working...
          X