Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a dataset where each obs. = 1 variable category

    Hello.

    I wish to create a dataset in which each observation is one specific category of a categorical variable. Now, I'm aware Stata does not make any difference between categorical and continuous variables. However, in my specific case, I define a continuous variable as any variable that has more than 10 categories. If a variable is continuous, then I wish to have only one observation for this. With a toy example, this is the original dataset:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int(candidat inc) float frac double(pfrac pop)
    3 1 23               3.22  49878
    4 1 18 2.5200000000000005  39035
    2 1 59  8.260000000000002 127947
    2 2 45 10.799999999999999 167292
    3 2 35                8.4 130116
    4 2 20                4.8  74352
    4 3 21                6.3  97587
    2 3 41 12.299999999999999 190527
    3 3 38               11.4 176586
    2 4 40                  8 123920
    3 4 42                8.4 130116
    4 4 18                3.6  55764
    4 5 16               2.08  32219
    2 5 36               4.68  72493
    3 5 48               6.24  96658
    end
    label values candidat candidat
    label def candidat 2 "Clinton", modify
    label def candidat 3 "Bush", modify
    label def candidat 4 "Perot", modify
    label values inc inc2
    label def inc2 1 "<$15k", modify
    label def inc2 2 "$15-30k", modify
    label def inc2 3 "$30-50k", modify
    label def inc2 4 "$50-75k", modify
    label def inc2 5 "$75k+", modify
    And this is what I wish to achieve:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str10 variable
    "candidat_2"
    "candidat_3"
    "candidat_4"
    "inc_1"     
    "inc_2"     
    "inc_3"     
    "inc_4"     
    "inc_5"     
    "frac"      
    "pfrac"     
    "pop"       
    end
    Any help would be appreciated!



  • #2
    It doesn't purport to do what you want but designplot from the Stata Journal may give you some ideas.

    Code:
    SJ-19-3 gr0061_3  . . . . . . . . . . . . . . . Software update for designplot
            (help designplot if installed)  . . . . . . . . . . . . . .  N. J. Cox
            Q3/19   SJ 19(3):748--751
            any attempt to use the missing option of graph dot,
            graph hbar, or graph bar is now ignored and advice on
            what to do instead is shown
    
    SJ-17-3 gr0061_2  . . . . . . . . . . . . . . . Software update for designplot
            (help designplot if installed)  . . . . . . . . . . . . . .  N. J. Cox
            Q3/17   SJ 17(3):779
            help file updated
    
    SJ-15-2 gr0061_1  . . . . . . . . . . . . . . . Software update for designplot
            (help designplot if installed)  . . . . . . . . . . . . . .  N. J. Cox
            Q2/15   SJ 15(2):605--606
            bug fixed for Stata 14
    
    SJ-14-4 gr0061  Design plots for graphical summary of a response given factors
            (help designplot if installed)  . . . . . . . . . . . . . .  N. J. Cox
            Q4/14   SJ 14(4):975--990
            produces a graphical summary of a numeric response variable
            given one or more factors

    Comment


    • #3
      Nick: Thank you for the designplot command! I did not know about it and it is actually useful for what I do generally speaking.

      To give more context to what I want to achieve, I want to build a mapping table for each category that may have been altered in datasets over time. What I'm asking is only the first step of it, i.e. listing every category of every categorical variable!

      Comment


      • #4
        What I'm asking is only the first step of it, i.e. listing every category of every categorical variable!
        But this is precisely what designplot will do for you. Here's some technique.

        Code:
        sysuse auto, clear
        
        * ignore string variables 
        ds, has(type numeric)
        
        foreach v in `r(varlist)' {
        quietly tab `v'
        if r(r) < 10 local cat `cat' `v'
        else local cont `cont' `v'
        }
        
        gen ONE = 1 
        designplot ONE `cat', saveresults(demo, replace) min(1) max(1)
        
        u demo, clear
        
        gen category = ""
        
        foreach v of local cat {
        replace category = "`v'" if `v' < .
        }
        
        egen value = rowmax(`cat')
        
        local ncont : word count `cont'
        insobs `ncont'
        tokenize `cont'
        
        quietly forval j = 1/`ncont' { 
            replace category = "``j''" in -`j'
        }
        
        list category value 
        
        
            +----------------------+
             |     category   value |
             |----------------------|
          1. |        rep78       1 |
          2. |        rep78       2 |
          3. |        rep78       3 |
          4. |        rep78       4 |
          5. |        rep78       5 |
             |----------------------|
          6. |     headroom     1.5 |
          7. |     headroom       2 |
          8. |     headroom     2.5 |
          9. |     headroom       3 |
         10. |     headroom     3.5 |
             |----------------------|
         11. |     headroom       4 |
         12. |     headroom     4.5 |
         13. |     headroom       5 |
         14. |      foreign       0 |
         15. |      foreign       1 |
             |----------------------|
         16. |   gear_ratio       . |
         17. | displacement       . |
         18. |         turn       . |
         19. |       length       . |
         20. |       weight       . |
             |----------------------|
         21. |        trunk       . |
         22. |          mpg       . |
         23. |        price       . |
             +----------------------+

        Comment

        Working...
        X