Wrong number of variables created from tabulate command

Mary Hemler

Join Date: Aug 2022

Posts: 8
#1

Wrong number of variables created from tabulate command

19 Oct 2023, 05:35

Hi everyone,

I have just run the code

Code:

tabulate kode_melder, generate(melder_)

where kode_melder has values 1-23. An example from the dataset is

Code:

input str10 id_lnr byte(kode_melder melder_1 melder_2 melder_3 melder_4 melder_5 melder_6 melder_7 melder_8 melder_9 melder_10 melder_11) "idlnr1" 6 0 0 1 0 0 "idlnr2" 4 0 0 0 0 0 "idlnr3" 11 0 0 0 0 0 "idlnr4" 23 0 0 0 0 0

The problem that I'm having is that Stata (version 16.1) is only creating indicator variables up to number 22, even though kode_melder has values up to 23. So, I end up with melder_1 - melder_22, but don't understand why melder_23 is not created. Does anyone know how to fix this?

Thank you!
Tags: None

Nick Cox

Join Date: Mar 2014
Posts: 35769

19 Oct 2023, 05:48

Stata is willing to create 23 indicator variables this way:

Code:

. clear

. set obs 23
Number of observations (_N) was 0, now 23.

. gen category = _n

. tab category, gen(indcat)

   category |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          1        4.35        4.35
          2 |          1        4.35        8.70
          3 |          1        4.35       13.04
          4 |          1        4.35       17.39
          5 |          1        4.35       21.74
          6 |          1        4.35       26.09
          7 |          1        4.35       30.43
          8 |          1        4.35       34.78
          9 |          1        4.35       39.13
         10 |          1        4.35       43.48
         11 |          1        4.35       47.83
         12 |          1        4.35       52.17
         13 |          1        4.35       56.52
         14 |          1        4.35       60.87
         15 |          1        4.35       65.22
         16 |          1        4.35       69.57
         17 |          1        4.35       73.91
         18 |          1        4.35       78.26
         19 |          1        4.35       82.61
         20 |          1        4.35       86.96
         21 |          1        4.35       91.30
         22 |          1        4.35       95.65
         23 |          1        4.35      100.00
------------+-----------------------------------
      Total |         23      100.00

.
. ds indcat*
indcat1   indcat6   indcat11  indcat16  indcat21
indcat2   indcat7   indcat12  indcat17  indcat22
indcat3   indcat8   indcat13  indcat18  indcat23
indcat4   indcat9   indcat14  indcat19
indcat5   indcat10  indcat15  indcat20

help limits implies to me that if there is an upper limit, it arises otherwise.

I ran this in Stata 18 and also in 16 with equivalent result.

I have to guess that the problem lies in your data. You've shown us that 23 is a value in the data, so I don't have a quick explanation.

Please show us the results of

Code:

contract kode_melder
dataex

(save your dataset first if changed)

Comment

Mary Hemler

Join Date: Aug 2022

Posts: 8
#3

19 Oct 2023, 06:23

Here are the results of

Code:

contract kode_melder dataex

Code:

input byte kode_melder int _freq 1 84 2 264 3 132 4 122 5 532 6 85 7 102 8 509 9 17 10 258 11 570 12 16 13 75 14 63 15 186 16 33 17 26 18 11 20 3 21 4 22 88 23 201

Looking at this output, it seems like the problem probably lies in the fact that kode_melder == 19 has a frequency of 0, thus making stata skip over that value. I am planning to merge this dataset with other datasets that likely have values for kode_melder == 19, what would be the best way to go about this so that that the melder* variables keep the correct numbers?
1 like
Comment
Rich Goldstein

Join Date: Mar 2014

Posts: 4489
#4

19 Oct 2023, 06:55

do the -merge- first and then do your -tab , gen()- command; note that you don't say why you want these indicator variables but if you want to include them in some kind of model, you may be better off using factor variable notation rather than generating all these variables
Comment
Mary Hemler

Join Date: Aug 2022

Posts: 8
#5

19 Oct 2023, 07:18

Sorry I did not specify, the reason that I am creating these indicator variables is because there are currently duplicates in the dataset of one of the variables that I need to use while merging, "id_lnr". This "id_lnr" has duplicates because each id_lnr can have several codes for the kode_melder variable, so all information in the duplicate is the same with the exception of the kode_melder variable. I am not able to merge with the other datasets until each id_lnr is a single variable, so I was trying to find a way to keep the information from all of the codes by creating indicator variables. My current process is to go through each dataset, clean it so that I end up with unique id_lnr variables, save it, then merge. I was just previously doing that in a very inefficient way, with tons of copy-paste code, so I was trying to streamline the process a bit by learning how to reduce the amount of code
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10265
#6

19 Oct 2023, 07:47

I cannot follow what the purpose is for these indicators, but as long as the minimum value and maximum value are always observed, here is another way to create the indicators that includes the empty categories.

Code:

qui sum kode_melder forval i= `r(min)'/`r(max)'{ gen melder`i'= `i'.kode_melder }
Comment
Mary Hemler

Join Date: Aug 2022

Posts: 8
#7

19 Oct 2023, 08:15

That was perfect Andrew, thank you!
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35769
#8

19 Oct 2023, 08:15

I agree with Rich Goldstein that the first preference is to use factor variable notation. Mary will know, but Rich won't, that this was a point made in comments at https://stackoverflow.com/questions/...tata-correctly

See also dummieslab from SSC. This is a rather old command (most of the work done 2003/2004) but I suspect much of the original motivation was to have names for the indicators that made sense.
Comment

Announcement

Wrong number of variables created from tabulate command

Comment

Comment

Comment

Comment

Comment

Comment

Comment