Deducing the mode of a categorical variable

Nitin Singh

Join Date: Dec 2022

Posts: 6
#1

Deducing the mode of a categorical variable

13 Jan 2023, 08:54

Hello,

I am using panel data from Waves 1 to 3 of the UK Millennium Cohort Study and I have been struggling to deduce the mode of a categorical variable. I have tried researching how to do this online, but it appears STATA does not have a direct mode command? I have made some progress with a work around, however, I am encountering the error message ‘type mismatch’. Does anyone have any suggestions?

The categorical variable I am interested in is ‘frequency of alcohol consumption,’ for which I have three observations per individual i.e. drinking frequency for person X in 2001 (APALDR00), 2004 (BPALDR00), and 2006 (CPALDR00).

My end goal is to generate a new variable equal to the mode of each individual’s frequency of drinking. For example, if individual X drank '1-2 times a month' in 2001, '2-3 times a week' in 2004, and '1-2 times a month' in 2006, I aim for the generated variable to read ‘1-2 times a month’. Alternatively, if they are do not drink, the generated variable should read 'Never'.

Moreover, if there is a missing value, or no mode available, then I would like the variable to read the most frequent entry. For instance, person Y drank 1-2 times a month in 2001, 2-3 times a week in 2004, and missing value in 2006, I would like the variable to read ‘2-3 times a week’. Or person J drank 1-2 times a month in 2001, 'everday' in 2004, and 'less than once a month' in 2006, then the generated variable should read 'less than once a month'

Please find attached the relevant data set, log file, and my Do-File.

Thanks in advance!

Attached Files

DATA.dta (359.9 KB, 1 view)

STATA list Do-File .do (2.3 KB, 1 view)

Log File.smcl (4.9 KB, 1 view)
Tags: None
Mike Lacy

Join Date: Apr 2014

Posts: 2413
#2

13 Jan 2023, 09:52

Take a look at -help egen-, where you will find a mode(varname) function described. You might also need to use some other -egen- functions (e.g., max()) to handle the special situations you describe.

For the future, I'd recommend against posting attachments here, per item 12.5 of the StataList FAQ for new members.
2 likes
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35664

13 Jan 2023, 10:02

A search in Stata for mode brings up many false positives but the most helpful code for you is the mode() function of egen.

Note also community-contributed commands modes (Stata Journal) and hsmode (SSC)

Code:

. clear

. set obs 10
number of observations (_N) was 0, now 10

. gen whatever = "A"

. replace whatever = "B" in 6/8
(3 real changes made)

. replace whatever = "C" in 9/10
(2 real changes made)

. tab whatever

   whatever |      Freq.     Percent        Cum.
------------+-----------------------------------
          A |          5       50.00       50.00
          B |          3       30.00       80.00
          C |          2       20.00      100.00
------------+-----------------------------------
      Total |         10      100.00

. modes whatever

----------------------
 whatever |      Freq.
----------+-----------
        A |          5
----------------------

. hsmode whatever
string variables not allowed in varlist;
whatever is a string variable
r(109);

. egen mode = mode(whatever)

. tab mode

       mode |      Freq.     Percent        Cum.
------------+-----------------------------------
          A |         10      100.00      100.00
------------+-----------------------------------
      Total |         10      100.00

.
Please see advice in the FAQ Advice #12 on attachments (essentially only .png is encouraged).

I am encountering the error message ‘type mismatch’.

Perhaps the command you used is in your log file, but typically people won't read attachments.

Last edited by Nick Cox; 13 Jan 2023, 10:15.

Comment

Nitin Singh

Join Date: Dec 2022

Posts: 6
#4

16 Jan 2023, 15:16

Will do, thanks both
Comment

Announcement