Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • displaying 14-digit numeric variable (double) in tab and codebook commands

    Hello,

    I have a 14-digit numeric variable, named 'year_id' (type double), which I would like STATA to display fully (ie. not in scientific notation), at least in the codebook. It's working with the table command, but not with the codebook or tab command (the latter of which I also use frequently). Importantly, I need this variable to remain numeric because I will be adding it to other id variables to creat unique ids for each of my 3million+ observations. Please see my code below:

    Click image for larger version

Name:	Screen Shot 2019-03-14 at 14.46.16.png
Views:	1
Size:	51.7 KB
ID:	1488068
    Best of thanks for your consideration,
    Rosa

  • #2
    This is an example of the x-y problem. Unique identifiers for 3 million observations could just be

    Code:
    gen long id = _n 
    Informative identifiers could easily be based on two or more variables using something like

    Code:
    egen long id = group(year whatever whoever somethingelse), label
    Last edited by Nick Cox; 14 Mar 2019, 08:36.

    Comment


    • #3
      Thank you! The egen code looks promising, but I began running it 30 minutes ago.. and it's still going...

      I have Stata/IC with a 1.3 GHz Intel Core i5 processor, but still this seems a bit slow in a dataset with 'only' 3,156,487 observations and (including my new id variable) 24 variables. I'm starting to assume that something has gone wrong.

      Is there no other way to have codebook display a 14-digit numeric variable, generated with a 'gen' command, in non-scientific notation?

      Last edited by Rosa Blau; 15 Mar 2019, 03:43.

      Comment


      • #4
        Originally posted by Rosa Blau View Post
        Importantly, I need this variable to remain numeric because I will be adding it to other id variables to creat unique ids for each of my 3million+ observations.
        You can concatenate ("add") strings to create a composite id variable. Since you are working with such large numbers and such large datasets, that is probably safer. Moreover, it will solve your problem with codebook. Here is an artificial example for a composite id for a panel dataset with 3 waves of three families with 4 persons per family.

        Code:
        // create example dataset
        drop _all
        set obs 3
        gen wave = _n
        expand 3
        bys wave : gen fam_id = _n
        expand 4
        bys wave fam_id : gen pers_id = _n
        
        // create the composite id
        gen id = strofreal(wave) + "_" + strofreal(fam_id) + "_" + strofreal(pers_id)
        
        // admire the result
        list, sepby(wave fam_id)
        You don't have to store the parts that make up the final id (important in large datasets), as you can recover them easily with the split command, in this case split id, gen(id) parse(_) would do the trick.
        Last edited by Maarten Buis; 15 Mar 2019, 04:46.
        ---------------------------------
        Maarten L. Buis
        University of Konstanz
        Department of history and sociology
        box 40
        78457 Konstanz
        Germany
        http://www.maartenbuis.nl
        ---------------------------------

        Comment


        • #5
          I see! So I could convert the numeric variables to strings and then concatenate the strings. Thank you, I will try that.

          Comment


          • #6
            With a machine that I guess is fairly humdrum, I was getting results in seconds:


            Code:
            . clear
            
            . set obs 3000000
            number of observations (_N) was 0, now 3,000,000
            
            . gen year = runiformint(1991, 2018)
            
            . gen whatever = cond(_n < 1.5e6, "A", "B")
            
            . set rmsg on
            r; t=0.00 10:40:56
            
            . egen id = group(whatever year), label
            r; t=6.37 10:41:18
            Most of the time there was me thinking and wanting coffee. Your calculation may well be more complicated, but it shouldn't take that long.

            In my case there were 56 distinct classes, all 28 possible years by 2 categories. You may well have many, many more, but I am still surprised at 30 minutes +.

            Comment


            • #7
              I too am surprised. Out of impatience I'm using my unique id without the help of codebook for the moment, but I will try the egen command again later.

              Comment


              • #8
                Dear Statalisters,

                I am once again having a similar problem, although it is simpler this time. I am using a 10-digit numeric (double) variable. I am not permitted to give a data example, but the values are all integers. I have tried the formats %10.0g and %10.0f, but the variable continues to display in scientific notation. I have studied the manual entry on format, but clearly I am missing something. I greatly appreciate your time and advice on this issue.

                Best wishes,
                Rosa

                Comment


                • #9
                  You are going to have to make up an example that doesn't work. Can you reproduce this?

                  Code:
                  . clear
                  
                  . set obs 1
                  number of observations (_N) was 0, now 1
                  
                  . gen double y = 9876543210
                  
                  . l
                  
                       +-----------+
                       |         y |
                       |-----------|
                    1. | 9.877e+09 |
                       +-----------+
                  
                  . format y %12.0f
                  
                  . l
                  
                       +------------+
                       |          y |
                       |------------|
                    1. | 9876543210 |
                       +------------+
                  
                  . format y %11.0f
                  
                  . l
                  
                       +------------+
                       |          y |
                       |------------|
                    1. | 9876543210 |
                       +------------+
                  
                  . format y %10.0f
                  
                  . l
                  
                       +------------+
                       |          y |
                       |------------|
                    1. | 9876543210 |
                       +------------+

                  Comment


                  • #10
                    In the display format one character is reserved for the decimal point and one for the sign, so if you want to display 10 digits you need to use the format %12.0g or %12.0f:

                    Code:
                    . di %10.0g 1234567890
                     1.235e+09
                    
                    . di %12.0g 1234567890
                      1234567890
                    ---------------------------------
                    Maarten L. Buis
                    University of Konstanz
                    Department of history and sociology
                    box 40
                    78457 Konstanz
                    Germany
                    http://www.maartenbuis.nl
                    ---------------------------------

                    Comment


                    • #11
                      Dear Maarten, dear Nick,

                      With all formats listed in your posts I can get the full 10-digit integer display with the command 'table' (but not with tab or codebook). I think I should be able to work with this for now, so best of thanks for your advice.

                      Best wishes, Rosa

                      Comment


                      • #12
                        OK, but there was no mention of tabulate or codebook in #8. I don't know an general work-around for codebook but I have got to suggest that not seeing sample identifiers exactly wouldn't worry me for what I use codebook for.

                        For tabulate there could be many alternatives or work-arounds, but all depends on what you are tabulating, and we need to see details.

                        That said, with 10-digit identifiers I experience no difficulty if use a string version:

                        Code:
                        . clear
                        
                        . set obs 1
                        number of observations (_N) was 0, now 1
                        
                        . gen double nid = 9876543210
                        
                        . gen sid = string(nid, "%10.0f")
                        
                        . codebook
                        
                        --------------------------------------------------------------------------------------------------------------------------------------------------
                        nid                                                                                                                                    (unlabeled)
                        --------------------------------------------------------------------------------------------------------------------------------------------------
                        
                                          type:  numeric (double)
                        
                                         range:  [9.877e+09,9.877e+09]        units:  1000
                                 unique values:  1                        missing .:  0/1
                        
                                    tabulation:  Freq.  Value
                                                     1  9.877e+09
                        
                        --------------------------------------------------------------------------------------------------------------------------------------------------
                        sid                                                                                                                                    (unlabeled)
                        --------------------------------------------------------------------------------------------------------------------------------------------------
                        
                                          type:  string (str10)
                        
                                 unique values:  1                        missing "":  0/1
                        
                                    tabulation:  Freq.  Value
                                                     1  "9876543210"
                        
                        . gen y = 42
                        
                        . tab sid, su(y)
                        
                                    |            Summary of y
                                sid |        Mean   Std. Dev.       Freq.
                        ------------+------------------------------------
                         9876543210 |          42           0           1
                        ------------+------------------------------------
                              Total |          42           0           1
                        
                        .

                        Comment

                        Working...
                        X