Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating variables

    Hi, I am a complete beginner to stata and need help with creating a new variable.

    I am trying to make long term conditions into new variables, for example, one would be a variable for Atrial Fibrillation so that I can then filter through my data to see who has it and who doesn't. I tried to use g Atrial Fibrillation = 0, but then I got a message saying "too many variables specified" so I'm not sure what to do now.

    Any help would be useful, thank you!

  • #2
    Welcome to Statalist.

    The reason of that error is that the variable name has two words (Atrial Fibrillation). Stata consider Fibrillation another argument and since -generate- does not accept the second variable, it says "too many variables specified." A workaround is to name it "Atrial_Fibrillation".

    However, that does not achieve your goal, it'd only create a column of 0s and nothing else. Some other variable must be carrying the condition information and knowing what that is would be helpful for your question to get answered.

    Also, please take a moment to read the FAQ (http://www.statalist.org/forums/help) on how to ask an effective Stata question, including how to use the command -dataex- to post some sample data so that users can test their suggested codes, and you can walk away with tested codes that would be more likely to work.

    Comment


    • #3
      Hi Ken Chui

      Thank you for your response! The variable that is carrying the condition names is called "ltc1" (ltc stands for long term condition). There are 15 variables named ltc 1- 15, I want to make variables with the specific condition names so that I can then make a table showing who has the condition and who does not have the condition.

      Also, I will now have a read of the FAQ, thank you for sending the link.

      Comment


      • #4
        Hi Asia Be

        Welcome to Statalist, and to the Stata user community.

        I'm sympathetic to you as a new user of Stata - there is quite a lot to absorb. And even worse if perhaps you are under pressure to produce some output quickly. Nevertheless, I'd like to encourage you to take a step back from your immediate tasks.

        When I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

        The objective in doing the reading was not so much to master Stata - I'm still far from that goal - as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

        Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.

        Stata also supples YouTube videos, if that's your thing.

        Comment


        • #5
          Hi William Lisowski

          I appreciate your response, I think I will have a read through the users guide as that might be quicker than me just sitting around feeling stuck haha. Thanks for all the recommendations!

          Comment


          • #6
            Hi Ken Chui William Lisowski

            So I am still struggling with what I previously asked about, I have copied in this example dataset.

            How would I make a variable for each number within the variable (drug), so that I then have a variable for 1 (drug 1), 2 (drug 2), 3 (drug 3). If I made each individual drug into a variable, would I then be able to make a table of total no. of observations of that drug and then also compare it to other variables?

            I hope that makes sense, thanks.

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input byte drug
            1
            1
            1
            1
            1
            1
            1
            1
            1
            1
            1
            1
            1
            1
            1
            1
            1
            1
            1
            1
            2
            2
            2
            2
            2
            2
            2
            2
            2
            2
            2
            2
            2
            2
            3
            3
            3
            3
            3
            3
            3
            3
            3
            3
            3
            3
            3
            3
            end

            Comment


            • #7
              With your example data (thanks) and the tabulate command, I get this:


              Code:
              . tab drug 
              
                     drug |      Freq.     Percent        Cum.
              ------------+-----------------------------------
                        1 |         20       41.67       41.67
                        2 |         14       29.17       70.83
                        3 |         14       29.17      100.00
              ------------+-----------------------------------
                    Total |         48      100.00
              As the example shows, you can abbreviate tabulate.

              You don't need any new variables for that. To "compare with other variables" could mean many other things, such as a cross-tabulation of this variable and others. Most of the tasks that spring to mind don't need yet other variables either.

              Comment


              • #8
                Hi Nick Cox

                Thank you so much for your response, I've tried this and it works well for my data!

                May I ask how I could go about getting drug 1 on the column by itself so that in the rows I can have sex, age and other drugs ( with drugs I can see who is taking both drug 1 and drug 2).

                Comment


                • #9
                  Code:
                  help tabulate

                  Comment


                  • #10
                    Do you mean something like three indicator variables, one for each drug? You can try:

                    Code:
                    tab drug, gen(drug_)

                    Comment


                    • #11
                      It works but I'm confused by what the 0 and 1 mean.


                      drug== | drug== 2.0000
                      1.0000 | 0 1 | Total
                      -----------+----------------------+----------
                      0 | 14 14 | 28
                      1 | 20 0 | 20
                      -----------+----------------------+----------
                      Total | 34 14 | 48

                      Comment


                      • #12
                        See this:

                        Code:
                        * Example generated by -dataex-. For more info, type help dataex
                        clear
                        input byte drug
                        1
                        1
                        1
                        1
                        1
                        1
                        2
                        2
                        2
                        2
                        2
                        2
                        2
                        2
                        3
                        3
                        3
                        3
                        3
                        end
                        
                        tab drug, gen(drug_)
                        list, sepby(drug)
                        Results:
                        Code:
                             +---------------------------------+
                             | drug   drug_1   drug_2   drug_3 |
                             |---------------------------------|
                          1. |    1        1        0        0 |
                          2. |    1        1        0        0 |
                          3. |    1        1        0        0 |
                          4. |    1        1        0        0 |
                          5. |    1        1        0        0 |
                          6. |    1        1        0        0 |
                             |---------------------------------|
                          7. |    2        0        1        0 |
                          8. |    2        0        1        0 |
                          9. |    2        0        1        0 |
                         10. |    2        0        1        0 |
                         11. |    2        0        1        0 |
                         12. |    2        0        1        0 |
                         13. |    2        0        1        0 |
                         14. |    2        0        1        0 |
                             |---------------------------------|
                         15. |    3        0        0        1 |
                         16. |    3        0        0        1 |
                         17. |    3        0        0        1 |
                         18. |    3        0        0        1 |
                         19. |    3        0        0        1 |
                             +---------------------------------+
                        The -gen- option creates a set of binary indicators, one for each level in "drug". See "drug_2", if it's "2" in drug, it'd get a 1, otherwise 0. So, the amount of "1" in drug_2 is the number of people who had "2" in drug.

                        Comment


                        • #13
                          Hi, I tried this on my actual data but because I changed my data frame to long, the sepby is gonna go on until about 2 million. Is there are a more summarised way to look at it as this would've been really useful if it wasn't so much to look at.

                          Comment


                          • #14
                            Just like what Nick said in #9:
                            Code:
                            help tabulate
                            The "list" command is only to show the data. You don't have to do that. I was just making a point to show you what do 0 and 1 mean, because in #11 you asked what they mean.

                            I think it'd be beneficial to learn this software for a bit before tackling the analysis. The getting started guide is wonderful. Use this command to get there:
                            Code:
                            help gs
                            Last edited by Ken Chui; 27 Jul 2021, 12:00.

                            Comment


                            • #15
                              Hi Ken Chui

                              Honestly thank you soo much for the help, I've managed to figure out a lot of things and I will also take some time to properly learn the software. I would've usually done that first but I am in a bit of a rush.

                              Thanks again!

                              Comment

                              Working...
                              X