Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Questions about the interpretation of the xtdescribe documentation

    Click image for larger version

Name:	_20250412120623.png
Views:	1
Size:	105.0 KB
ID:	1775843




    Around page 100 of the official manual xt.pdf (For covenience: xtxtdescribe.pdf), I’m unclear about how to reproduce the output from the second example of xtdescribe. What’s particularly puzzling is that, given the note "idcode*year does not uniquely identify observations", it seems that the command xtset idcode year shouldn't have worked. But my understanding is that the example did successfully run xtset, followed by xtdescribe.



    I believe it's very important to clarify the origin of the sample data used in this example and how exactly the output was generated, because the manual appears to be the only known source that explains that the digits in the pattern output can be not only 1 but also 2, 3, and other numbers.
    Last edited by Wang Xiaobu; 11 Apr 2025, 22:23.

  • #2
    That is really strange. Successful xtset panelid timeid requires there to be at most one observation for each combination of identifiers. And the output of xtdescribe is a slightly different from "normal" which gives Delta and Span in separate lines. The following complementary explanation in xtxtdescribe.pdf not clarfying but adding confusion.
    In fact, this is a dataset that was itself extracted from the NLSY, in which t is not time but job number. To simplify exposition, we made a simpler dataset by selecting the last job in each year.

    Comment


    • #3
      Dear professor Richard Williams, president Alan Riley (StataCorp), and Enrique Pinzon (StataCorp) sorry to bother you all. Did you notice this question? I doubt if Wang and I missed something or there's really something wrong. Thank you very much!

      Comment


      • #4
        Hi Wang and Chen, thanks for bringing this up. You are correct that the command cannot produce this output, because repeated time values are not allowed. This example was meant to be a mock-up to illustrate the structure of the original data, rather than to show how -xtdescribe- works. We will modify the documentation in a future update.

        Comment


        • #5
          Dear Thomas Stringham (StataCorp), thank you very much!

          Comment


          • #6
            Dear Thomas Stringham (StataCorp) and Chen Samulsion , thank you!

            My additional question is, will we add another example to explain that the digits in the pattern output can be not only 1 but also 2, 3, and other numbers?

            Comment


            • #7
              Dear Wang Xiaobu, I think the question in #1 and question in #6 are quite different questions. -xtset- requires panelvar*timevar uniquely identifies each observation, thus the Example 2 in manual is questionable. But, under the condition that panelvar*timevar uniquely identifies each observation is met, pattern outcome produced by -xtdescribe- do allow 2, 3, and other numbers, that is determined by option width(). And this point is explained clearly in the manual:
              width(#) specifies the desired width of the participation patterns to be displayed; width(100) is the default. If the number of times is greater than width(), then each column in the participation pattern represents multiple periods as indicated in a footnote at the bottom of the table.
              Code:
              . webuse nlswork
              (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
              
              . xtdescribe
              
                idcode:  1, 2, ..., 5159                                   n =       4711
                  year:  68, 69, ..., 88                                   T =         15
                         Delta(year) = 1 unit
                         Span(year)  = 21 periods
                         (idcode*year uniquely identifies each observation)
              
              Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                                       1       1       3         5         9      13      15
              
                   Freq.  Percent    Cum. |  Pattern
               ---------------------------+-----------------------
                    136      2.89    2.89 |  1....................
                    114      2.42    5.31 |  ....................1
                     89      1.89    7.20 |  .................1.11
                     87      1.85    9.04 |  ...................11
                     86      1.83   10.87 |  111111.1.11.1.11.1.11
                     61      1.29   12.16 |  ..............11.1.11
                     56      1.19   13.35 |  11...................
                     54      1.15   14.50 |  ...............1.1.11
                     54      1.15   15.64 |  .......1.11.1.11.1.11
                   3974     84.36  100.00 | (other patterns)
               ---------------------------+-----------------------
                   4711    100.00         |  XXXXXX.X.XX.X.XX.X.XX
              
              . xtdescribe, width(10)
              
                idcode:  1, 2, ..., 5159                                   n =       4711
                  year:  68, 69, ..., 88                                   T =         15
                         Delta(year) = 1 unit
                         Span(year)  = 21 periods
                         (idcode*year uniquely identifies each observation)
              
              Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                                       1       1       3         5         9      13      15
              
                   Freq.  Percent    Cum. |  Pattern*
               ---------------------------+-------------
                    159      3.38    3.38 |  1..........
                    114      2.42    5.79 |  ..........1
                     89      1.89    7.68 |  ........111
                     87      1.85    9.53 |  .........11
                     86      1.83   11.36 |  22211112111
                     67      1.42   12.78 |  .......1111
                     65      1.38   14.16 |  .1.........
                     61      1.29   15.45 |  .......2111
                     56      1.19   16.64 |  2..........
                   3927     83.36  100.00 | (other patterns)
               ---------------------------+-------------
                   4711    100.00         |  XXXXXXXXXXX
               -----------------------------------------
              *Each column represents 2 periods.
              
              
              . xtdescribe, width(5)
              
                idcode:  1, 2, ..., 5159                                   n =       4711
                  year:  68, 69, ..., 88                                   T =         15
                         Delta(year) = 1 unit
                         Span(year)  = 21 periods
                         (idcode*year uniquely identifies each observation)
              
              Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                                       1       1       3         5         9      13      15
              
                   Freq.  Percent    Cum. |  Pattern*
               ---------------------------+----------
                    249      5.29    5.29 |  1....
                    140      2.97    8.26 |  2....
                    114      2.42   10.68 |  ....1
                    114      2.42   13.10 |  ...21
                    105      2.23   15.33 |  ...11
                     96      2.04   17.36 |  3....
                     86      1.83   19.19 |  53331
                     75      1.59   20.78 |  .1...
                     73      1.55   22.33 |  ...1.
                   3659     77.67  100.00 | (other patterns)
               ---------------------------+----------
                   4711    100.00         |  XXXXX
               --------------------------------------
              *Each column represents 5 periods.

              Comment


              • #8
                Thank you very much Chen Samulsion. Now I understand it (numbers greater than 1 are due to the width option).

                Comment

                Working...
                X