Creating a dataset where each obs. = 1 variable category

Adam Sadi

Join Date: Jul 2022
Posts: 68

Creating a dataset where each obs. = 1 variable category

04 Sep 2023, 01:26

Hello.

I wish to create a dataset in which each observation is one specific category of a categorical variable. Now, I'm aware Stata does not make any difference between categorical and continuous variables. However, in my specific case, I define a continuous variable as any variable that has more than 10 categories. If a variable is continuous, then I wish to have only one observation for this. With a toy example, this is the original dataset:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int(candidat inc) float frac double(pfrac pop)
3 1 23               3.22  49878
4 1 18 2.5200000000000005  39035
2 1 59  8.260000000000002 127947
2 2 45 10.799999999999999 167292
3 2 35                8.4 130116
4 2 20                4.8  74352
4 3 21                6.3  97587
2 3 41 12.299999999999999 190527
3 3 38               11.4 176586
2 4 40                  8 123920
3 4 42                8.4 130116
4 4 18                3.6  55764
4 5 16               2.08  32219
2 5 36               4.68  72493
3 5 48               6.24  96658
end
label values candidat candidat
label def candidat 2 "Clinton", modify
label def candidat 3 "Bush", modify
label def candidat 4 "Perot", modify
label values inc inc2
label def inc2 1 "<$15k", modify
label def inc2 2 "$15-30k", modify
label def inc2 3 "$30-50k", modify
label def inc2 4 "$50-75k", modify
label def inc2 5 "$75k+", modify

And this is what I wish to achieve:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str10 variable
"candidat_2"
"candidat_3"
"candidat_4"
"inc_1"     
"inc_2"     
"inc_3"     
"inc_4"     
"inc_5"     
"frac"      
"pfrac"     
"pop"       
end

Any help would be appreciated!

Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35778

04 Sep 2023, 02:38

It doesn't purport to do what you want but designplot from the Stata Journal may give you some ideas.

Code:

SJ-19-3 gr0061_3  . . . . . . . . . . . . . . . Software update for designplot
        (help designplot if installed)  . . . . . . . . . . . . . .  N. J. Cox
        Q3/19   SJ 19(3):748--751
        any attempt to use the missing option of graph dot,
        graph hbar, or graph bar is now ignored and advice on
        what to do instead is shown

SJ-17-3 gr0061_2  . . . . . . . . . . . . . . . Software update for designplot
        (help designplot if installed)  . . . . . . . . . . . . . .  N. J. Cox
        Q3/17   SJ 17(3):779
        help file updated

SJ-15-2 gr0061_1  . . . . . . . . . . . . . . . Software update for designplot
        (help designplot if installed)  . . . . . . . . . . . . . .  N. J. Cox
        Q2/15   SJ 15(2):605--606
        bug fixed for Stata 14

SJ-14-4 gr0061  Design plots for graphical summary of a response given factors
        (help designplot if installed)  . . . . . . . . . . . . . .  N. J. Cox
        Q4/14   SJ 14(4):975--990
        produces a graphical summary of a numeric response variable
        given one or more factors

Comment

Adam Sadi

Join Date: Jul 2022

Posts: 68
#3

04 Sep 2023, 03:08

Nick: Thank you for the designplot command! I did not know about it and it is actually useful for what I do generally speaking.

To give more context to what I want to achieve, I want to build a mapping table for each category that may have been altered in datasets over time. What I'm asking is only the first step of it, i.e. listing every category of every categorical variable!
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35778

04 Sep 2023, 07:18

What I'm asking is only the first step of it, i.e. listing every category of every categorical variable!

But this is precisely what designplot will do for you. Here's some technique.

Code:

sysuse auto, clear

* ignore string variables 
ds, has(type numeric)

foreach v in `r(varlist)' {
quietly tab `v'
if r(r) < 10 local cat `cat' `v'
else local cont `cont' `v'
}

gen ONE = 1 
designplot ONE `cat', saveresults(demo, replace) min(1) max(1)

u demo, clear

gen category = ""

foreach v of local cat {
replace category = "`v'" if `v' < .
}

egen value = rowmax(`cat')

local ncont : word count `cont'
insobs `ncont'
tokenize `cont'

quietly forval j = 1/`ncont' { 
    replace category = "``j''" in -`j'
}

list category value 


    +----------------------+
     |     category   value |
     |----------------------|
  1. |        rep78       1 |
  2. |        rep78       2 |
  3. |        rep78       3 |
  4. |        rep78       4 |
  5. |        rep78       5 |
     |----------------------|
  6. |     headroom     1.5 |
  7. |     headroom       2 |
  8. |     headroom     2.5 |
  9. |     headroom       3 |
 10. |     headroom     3.5 |
     |----------------------|
 11. |     headroom       4 |
 12. |     headroom     4.5 |
 13. |     headroom       5 |
 14. |      foreign       0 |
 15. |      foreign       1 |
     |----------------------|
 16. |   gear_ratio       . |
 17. | displacement       . |
 18. |         turn       . |
 19. |       length       . |
 20. |       weight       . |
     |----------------------|
 21. |        trunk       . |
 22. |          mpg       . |
 23. |        price       . |
     +----------------------+

Announcement

Creating a dataset where each obs. = 1 variable category

Comment

Comment

Comment