Creating variables

Asia Be

Join Date: Jul 2021

Posts: 8
#1

Creating variables

26 Jul 2021, 08:35

Hi, I am a complete beginner to stata and need help with creating a new variable.

I am trying to make long term conditions into new variables, for example, one would be a variable for Atrial Fibrillation so that I can then filter through my data to see who has it and who doesn't. I tried to use g Atrial Fibrillation = 0, but then I got a message saying "too many variables specified" so I'm not sure what to do now.

Any help would be useful, thank you!
Tags: None
Ken Chui

Join Date: Aug 2014

Posts: 1058
#2

26 Jul 2021, 09:04

Welcome to Statalist.

The reason of that error is that the variable name has two words (Atrial Fibrillation). Stata consider Fibrillation another argument and since -generate- does not accept the second variable, it says "too many variables specified." A workaround is to name it "Atrial_Fibrillation".

However, that does not achieve your goal, it'd only create a column of 0s and nothing else. Some other variable must be carrying the condition information and knowing what that is would be helpful for your question to get answered.

Also, please take a moment to read the FAQ (http://www.statalist.org/forums/help) on how to ask an effective Stata question, including how to use the command -dataex- to post some sample data so that users can test their suggested codes, and you can walk away with tested codes that would be more likely to work.
2 likes
Comment
Asia Be

Join Date: Jul 2021

Posts: 8
#3

26 Jul 2021, 09:37

Hi Ken Chui

Thank you for your response! The variable that is carrying the condition names is called "ltc1" (ltc stands for long term condition). There are 15 variables named ltc 1- 15, I want to make variables with the specific condition names so that I can then make a table showing who has the condition and who does not have the condition.

Also, I will now have a read of the FAQ, thank you for sending the link.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

26 Jul 2021, 09:55

Hi Asia Be

Welcome to Statalist, and to the Stata user community.

I'm sympathetic to you as a new user of Stata - there is quite a lot to absorb. And even worse if perhaps you are under pressure to produce some output quickly. Nevertheless, I'd like to encourage you to take a step back from your immediate tasks.

When I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

The objective in doing the reading was not so much to master Stata - I'm still far from that goal - as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.

Stata also supples YouTube videos, if that's your thing.
Comment
Asia Be

Join Date: Jul 2021

Posts: 8
#5

26 Jul 2021, 10:06

Hi William Lisowski

I appreciate your response, I think I will have a read through the users guide as that might be quicker than me just sitting around feeling stuck haha. Thanks for all the recommendations!
Comment
Asia Be

Join Date: Jul 2021

Posts: 8
#6

27 Jul 2021, 09:35

Hi Ken Chui William Lisowski

So I am still struggling with what I previously asked about, I have copied in this example dataset.

How would I make a variable for each number within the variable (drug), so that I then have a variable for 1 (drug 1), 2 (drug 2), 3 (drug 3). If I made each individual drug into a variable, would I then be able to make a table of total no. of observations of that drug and then also compare it to other variables?

I hope that makes sense, thanks.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input byte drug 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 end
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#7

27 Jul 2021, 10:05

With your example data (thanks) and the tabulate command, I get this:

Code:

. tab drug drug | Freq. Percent Cum. ------------+----------------------------------- 1 | 20 41.67 41.67 2 | 14 29.17 70.83 3 | 14 29.17 100.00 ------------+----------------------------------- Total | 48 100.00

As the example shows, you can abbreviate tabulate.

You don't need any new variables for that. To "compare with other variables" could mean many other things, such as a cross-tabulation of this variable and others. Most of the tasks that spring to mind don't need yet other variables either.
Comment
Asia Be

Join Date: Jul 2021

Posts: 8
#8

27 Jul 2021, 10:14

Hi Nick Cox

Thank you so much for your response, I've tried this and it works well for my data!

May I ask how I could go about getting drug 1 on the column by itself so that in the rows I can have sex, age and other drugs ( with drugs I can see who is taking both drug 1 and drug 2).
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35698
#9

27 Jul 2021, 10:18

Code:

help tabulate
1 like
Comment
Ken Chui

Join Date: Aug 2014

Posts: 1058
#10

27 Jul 2021, 10:19

Do you mean something like three indicator variables, one for each drug? You can try:

Code:

tab drug, gen(drug_)
1 like
Comment
Asia Be

Join Date: Jul 2021

Posts: 8
#11

27 Jul 2021, 10:35

It works but I'm confused by what the 0 and 1 mean.

drug== | drug== 2.0000
1.0000 | 0 1 | Total
-----------+----------------------+----------
0 | 14 14 | 28
1 | 20 0 | 20
-----------+----------------------+----------
Total | 34 14 | 48
Comment

Ken Chui

Join Date: Aug 2014
Posts: 1058

#12

27 Jul 2021, 11:01

See this:

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input byte drug
1
1
1
1
1
1
2
2
2
2
2
2
2
2
3
3
3
3
3
end

tab drug, gen(drug_)
list, sepby(drug)

Results:

Code:

     +---------------------------------+
     | drug   drug_1   drug_2   drug_3 |
     |---------------------------------|
  1. |    1        1        0        0 |
  2. |    1        1        0        0 |
  3. |    1        1        0        0 |
  4. |    1        1        0        0 |
  5. |    1        1        0        0 |
  6. |    1        1        0        0 |
     |---------------------------------|
  7. |    2        0        1        0 |
  8. |    2        0        1        0 |
  9. |    2        0        1        0 |
 10. |    2        0        1        0 |
 11. |    2        0        1        0 |
 12. |    2        0        1        0 |
 13. |    2        0        1        0 |
 14. |    2        0        1        0 |
     |---------------------------------|
 15. |    3        0        0        1 |
 16. |    3        0        0        1 |
 17. |    3        0        0        1 |
 18. |    3        0        0        1 |
 19. |    3        0        0        1 |
     +---------------------------------+

The -gen- option creates a set of binary indicators, one for each level in "drug". See "drug_2", if it's "2" in drug, it'd get a 1, otherwise 0. So, the amount of "1" in drug_2 is the number of people who had "2" in drug.

Comment

Asia Be

Join Date: Jul 2021

Posts: 8
#13

27 Jul 2021, 11:37

Hi, I tried this on my actual data but because I changed my data frame to long, the sepby is gonna go on until about 2 million. Is there are a more summarised way to look at it as this would've been really useful if it wasn't so much to look at.
Comment
Ken Chui

Join Date: Aug 2014

Posts: 1058
#14

27 Jul 2021, 11:49

Just like what Nick said in #9:

Code:

help tabulate

The "list" command is only to show the data. You don't have to do that. I was just making a point to show you what do 0 and 1 mean, because in #11 you asked what they mean.

I think it'd be beneficial to learn this software for a bit before tackling the analysis. The getting started guide is wonderful. Use this command to get there:

Code:

help gs

Last edited by Ken Chui; 27 Jul 2021, 12:00.
2 likes
Comment
Asia Be

Join Date: Jul 2021

Posts: 8
#15

28 Jul 2021, 03:30

Hi Ken Chui

Honestly thank you soo much for the help, I've managed to figure out a lot of things and I will also take some time to properly learn the software. I would've usually done that first but I am in a bit of a rush.

Thanks again!
Comment

Announcement

Creating variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment