Grouping variables to create a master variable - possible? *Bangs Head*

Joshua Edrich

Join Date: May 2019

Posts: 4
#1

Grouping variables to create a master variable - possible? *Bangs Head*

23 May 2019, 18:51

Hello all! My colleague and I have been banging our heads over this.

I have a dataset that I am working with that was built off of a survey about overdoses on certain drugs. I'm trying to combine variables where the data is either 1 (yes) or 0 (no) to create a master variable - imagine the variable will only return the drug mentioned if it was a 1. The variables are about 20 different types of drugs and I want to group them into drug types and have the variable return the data only when it is indicated yes. For reference I've attached an example. I want to list heroin_num-alcohol_num into a new group.

I also created a listeddrugs variable using the grouplabs command and it's come in handy but not doing what I asked. I'd like to use this new variable to run specific statistics and visuals.

Hope this makes sense!

Thanks
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30064
#2

23 May 2019, 19:06

Unfortunately, you posted a screenshot of your data. Apart from the fact that it is barely readable (as so often happens here), we are left to guess what the data actually looks like because there is important metadata, that is critical to solving your problem, that simply doesn't show up in a screenshot. All Forum members are asked to read the Forum FAQ before posting. Had you done that, in #12 you would have learned that the useful way to show example data is with the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

That said, I'm going to speculate that your drug variables are not the yes/no strings they appear to be but, rather, because they are showing up in blue, they are numeric variables with value labels. In particular, I'm going to assume that the actual values of the variables are 1 for yes and 0 for no. (The code I show below will not work if this is not the case.)

Code:

egen heroin_alcohol_group = rowmax(heroin_num-alcohol_num)

This variable will be 1 if any of the variables from heroin_num through alcohol_num is 1 (yes), and 0 if all of them are 0.

If I have misunderstood what you want, or if your variables are not coded 0/1 as I have supposed, then do post back showing example data with -dataex-.
Comment
Joshua Edrich

Join Date: May 2019

Posts: 4
#3

23 May 2019, 19:18

Sorry Clyde,

You're right I should done that - essentially what you've provided is a way for me to summarize the data of 1 and 0 and I don't need that necessarily...here is the dataex example.

[CODE]
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(heroin_num rxopioid_num fentanyl_num methamphetamines_num cocaine_num otherrxdrugs_num hallucinogens_num marijuana_num alcohol_num caffeine_num otherlisteddrugs_num)
1 0 0 0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 1
0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1
0 1 0 0 0 0 0 0 0 0 0

I have 37 variables and would like to have their frequency represented in one single table. I have tried using tabulate and tabm as well but I can't figure out the command to put this many variables into one single table. Ideally, I would like to be able to see the percent that each drug up out of the 180 observations - especially when each observation either has at least one drug, but possibly more identified.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30064

23 May 2019, 22:54

There are many community-contributed programs to do tables of various kinds. So many that I can never remember which does what. So when I need a table that I can't get with -tabulate-, -tabstat-, or -table- I generally just code my own routine for the particular problem at hand. Here you could do this:

Code:

local counts
local percents
foreach v of varlist heroin_num-otherlisteddrugs_num {
    local counts `counts' count_`v' = `v'
    local percents `percents' pct_`v' = `v'
}

collapse (sum) `counts' (mean) `percents'
gen long obs_no = _n
reshape long count_ pct_, i(obs_no) j(drug) string
replace drug = subinstr(drug, "_num", "", .)
replace pct_ = 100*pct_
rename *_ *
drop obs_no
format pct %3.2f
list, noobs clean

Comment

Joshua Edrich

Join Date: May 2019

Posts: 4
#5

24 May 2019, 17:18

Clyde, thanks so much this is great!

Is there a way to create this as a new variable so that I do not lose the data/other variables in my current dta. file? It erases all of the other variables. Otherwise, works like a charm and I will try to recreate it for the subsequent 3 groups that I need to categorize.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30064
#6

24 May 2019, 18:14

Probably the simplest way to do this is to put -preserve- before the -collapse- command and then -restore- after the -list- command. That will restore your original data, but you already have your table listed at that point. If you need to save that table as a data set, then insert a -save name_of_new_file_for_table- command before the -restore-.
Comment

Announcement

Grouping variables to create a master variable - possible? Bangs Head

Comment

Comment

Comment

Comment

Comment