Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating composite index/Variable

    Hi. MAY YOU PLEASE ASSIST.

    I am trying to do two things:

    1. To create a composite index/variable from 4 individual variables of a dataset (i.e. create an outcome variable from 4 individual variables). This will help with my multinomial regression model.

    2. To have a STATA code that will measure levels based on the number of correct responses (i.e. None=0, Low=1-2, Average=3, High=4-5)...such that, I can be able to say x% of respondents are at a low level because they only answered 1 or 2 questions correctly.

    I will really appreciate...thanks in advance.

  • #2
    Can you present a data example, e.g., copying and pasting the output of

    Code:
    dataex
    ?

    Comment


    • #3
      . dataex
      input statement exceeds linesize limit. Try specifying fewer variables
      r(1000);

      When I specified the variable I need to use to create an a composite variable:

      label values q3_1b

      label def q3_1b 1 "Agree", modify
      label def q3_1b 2 "Disagree", modify
      label def q3_1b 3 "Do not know", modify
      label values q3_1c
      label def q3_1c 1 "Agree", modify
      label def q3_1c 2 "Disagree", modify
      label def q3_1c 3 "Do not know", modify
      label values q3_2f
      label def q3_2f 1 "Agree", modify
      label def q3_2f 2 "Disagree", modify
      label def q3_2f 3 "Do not know", modify
      label values q3_3e
      label def q3_3e 1 "Yes", modify
      label def q3_3e 2 "No", modify
      label def q3_3e 3 "Do not know", modify
      label values q3_4
      label def q3_4 1 "Yes", modify
      label def q3_4 2 "No", modify
      label def q3_4 3 "Do not know", modify

      Comment


      • #4
        I am very new on stata.

        Comment


        • #5
          Try

          Code:
          qui ds
          dataex `=word("`r(varlist)'", 1)' - `=word("`r(varlist)'", 5)'
          If that fails, change the highlighted "5" to "4", then run the code again. If that fails, change it to "3", and so on.

          Comment


          • #6

            . qui ds

            .
            . dataex `=word("`r(varlist)'", 1)' - `=word("`r(varlist)'", 5)'

            ----------------------- copy starting from the next line -----------------------
            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input double idnum str7 barcode long sal byte(persno province)
                  130000 "SV02520" 1993605 . 1
                 5008000 "SV34432" 5920215 8 5
                 7984253 "SV36954" 7984253 . 7
                 7984253 "SV37000" 7984253 . 7
                 7984999 "SV37943" 7984999 . 7
                 7985338 "SV36983" 7985338 . 7
                24004000 "SV06387" 5992646 4 5
               101003000 "SV08474" 7984253 3 7
               125006000 "SV04767" 1991704 6 1
               182001000 "SV08501" 7984999 1 7
               251002000 "SV08514" 7984999 2 7
               251004000 "SV08516" 7984999 4 7
               278001000 "SV08401" 7984735 1 7
               290070000 "SV31926" 7984253 . 7
               311070000 "SV36921" 7600909 . 7
               355210000 "SV36942" 7984253 . 7
             19936430000 "SV34347" 1993643 . 1
             36001470000 "SV04989" 3600147 . 3
             4.72004e+10 "SV30649" 4720040 . 4
             47400430000 "SV90336" 4740043 . 4
             56900610000 "SV22414" 5690061 . 5
             56901230000 "SV35358" 5690123 . 5
             58302140000 "SV52999" 5830214 . 5
             59947490000 "SV03782" 5994749 . 5
             86100930000 "SV51059" 8610093 . 8
             86101170000 "SV22008" 8610117 . 8
             86300150000 "SV51052" 8630015 . 8
             86400990000 "SV65770" 8640099 . 8
             86401610000 "SV51058" 8640161 . 8
             86602440000 "SV11891" 8660244 . 8
             86604440000 "SV11934" 8660444 . 8
             86804080000 "SV13872" 8680408 . 8
            160004000101 "SV11221" 1600040 1 1
            160004000102 ""        1600040 2 1
            160004000103 "SV19655" 1600040 3 1
            160004000104 "SV14452" 1600040 4 1
            160004000105 "SV10511" 1600040 5 1
            160004000301 ""        1600040 1 1
            160004000503 ""        1600040 3 1
            160004000603 ""        1600040 3 1
            160004000803 ""        1600040 3 1
            160004000903 ""        1600040 3 5
            160004001101 "SV12234" 1600040 1 1
            160004001101 "SV65327" 1600040 1 1
            160004001102 "SV65313" 1600040 2 1
            160004001103 ""        1600040 3 1
            160004001501 ""        1600040 1 1
            160004001601 "SV10465" 1600040 1 1
            160004001602 "SV26120" 1600040 2 1
            160004002101 "SV10481" 1600040 1 1
            160004002301 ""        1600040 1 1
            160004002701 "SV65239" 1600040 1 1
            160004002702 ""        1600040 2 1
            160004002703 "SV65296" 1600040 3 1
            160004003201 "SV65301" 1600040 1 1
            160004003202 "SV65306" 1600040 2 1
            160004003203 "SV31053" 1600040 3 1
            160004003204 "SV32985" 1600040 4 1
            160004003405 ""        1600040 5 1
            160004003405 ""        1600040 5 1
            160004003701 "SV65258" 1600040 1 1
            160004003702 "SV65333" 1600040 2 1
            160004003703 "SV65340" 1600040 3 1
            160004003704 "SV52297" 1600040 4 1
            160004004201 "SV04715" 1600040 1 1
            160004004202 "SV65315" 1600040 2 1
            160004004203 "SV65278" 1600040 3 1
            160004004204 "SV65336" 1600040 4 1
            160004004205 "SV52352" 1600040 5 1
            160004004206 "SV67028" 1600040 6 1
            160004004701 "SV65202" 1600040 1 1
            160004004702 "SV65324" 1600040 2 2
            160004004702 "SV65324" 1600040 2 1
            160004004703 "SV52200" 1600040 3 1
            160004004703 ""        1600040 3 2
            160004004704 ""        1600040 4 1
            160004004705 ""        1600040 5 1
            160004005301 "SV10455" 1600040 1 1
            160004005801 "SV08272" 1600040 1 1
            160004005802 "SV65240" 1600040 2 1
            160004005803 ""        1600040 3 1
            160004006301 "SV65325" 1600040 1 1
            160004006801 "SV65330" 1600040 1 1
            160004006802 "SV65339" 1600040 2 1
            160004006803 "SV52301" 1600040 3 1
            160004008002 "SV13183" 1600040 2 1
            160004012301 ""        1600040 1 1
            160004013403 ""        1600040 3 1
            160004014502 ""        1600040 2 1
            160004016503 "SV19991" 1600040 3 2
            160004023404 ""        1600040 4 1
            160004032103 ""        1600040 3 1
            162001800501 "SV19417" 1620018 1 1
            162001800502 "SV22858" 1620018 2 1
            162001800503 "SV11236" 1620018 3 1
            162001800504 "SV10582" 1620018 4 1
            162001800505 ""        1620018 5 1
            162001800701 "SV65671" 1620018 1 1
            162001800702 ""        1620018 2 1
            162001801001 "SV65633" 1620018 1 1
            end
            label values province province
            label def province 1 "Western Cape", modify
            label def province 2 "Eastern Cape", modify
            label def province 3 "Northern Cape", modify
            label def province 4 "Free State", modify
            label def province 5 "KwaZulu-Natal", modify
            label def province 7 "Gauteng", modify
            label def province 8 "Mpumalanga", modify
            ------------------ copy up to and including the previous line ------------------

            Listed 100 out of 66615 observations
            Use the count() option to list more

            Comment


            • #7
              On my second question: No Knowledge = Those who answered all 5 questions incorrectly, combined with those who said they did not know all question = 3235 + 246......Similarly, Those who answered all questions correctly are 3235 participants. How do I count those who answered 1, 2, 3, or 4 questions correctly?

              . count if q3_1b ==1 & q3_1c ==1 & q3_2f ==1 & q3_3e ==1 & q3_4 ==1
              3,235

              . count if q3_1b ==2 & q3_1c ==2 & q3_2f ==2 & q3_3e ==2 & q3_4 ==2
              246

              . count if q3_1b ==3 & q3_1c ==3 & q3_2f ==3 & q3_3e ==3 & q3_4 ==3
              1,232

              Truth values: 1=True, 2=false; 3=dont know
              Last edited by Sonwabile Mbuma; 02 May 2024, 03:49.

              Comment


              • #8
                Of your questions in #1

                1. To create a composite index/variable from 4 individual variables of a dataset (i.e. create an outcome variable from 4 individual variables).
                Sorry, but I have no idea what that means precisely. A composite variable could be (a) a mean (b) a median (c) some other summary (d) a score from factor analysis, principal component analysis or correspondence analysis (e) something else.

                2. To have a Stata code that will measure levels based on the number of correct responses (i.e. None=0, Low=1-2, Average=3, High=4-5)...such that, I can be able to say x% of respondents are at a low level because they only answered 1 or 2 questions correctly.
                Your post #3 indicates that you have variables such as

                q3_1b q3_1c q3_2f q3_3e q3_4 some of which have values 1 "Agree" 2 "Disagree" 3 "Do not know"

                others of which have values 1 "Yes" 2 "No" 3 "Do not know"

                Your post #7 complicates matters further by mentioning
                1=True, 2=false; 3=dont know

                So, what defines correct? It seems that you mean answering 1.

                Here is some technique that may help. If not, I suspect you need to be much clearer on what you have and what you want.



                Code:
                * Example generated by -dataex-. For more info, type help dataex
                clear
                input float(id q1 q2 q3)
                1 1 2 1
                2 1 2 1
                3 1 2 2
                4 1 3 3
                5 2 3 3
                end
                
                egen count1 = anycount(q*), values(1)
                
                egen count2 = anycount(q*), values(2)
                
                egen count3 = anycount(q*), values(3)
                
                list
                
                     +----------------------------------------------+
                     | id   q1   q2   q3   count1   count2   count3 |
                     |----------------------------------------------|
                  1. |  1    1    2    1        2        1        0 |
                  2. |  2    1    2    1        2        1        0 |
                  3. |  3    1    2    2        1        2        0 |
                  4. |  4    1    3    3        1        0        2 |
                  5. |  5    2    3    3        0        1        2 |
                     +----------------------------------------------+

                Comment


                • #9
                  Thank you very much Nick!! The code works so well. It is exactly what I will need for the second part of my problem; which is determining how many people answered (how many questions) correctly?


                  Just to clarify the first part of my problem: I need to create one variable from q3_1b q3_1c q3_2f q3_3e q3_4 (which will be my dependent variable). This will be a way of compiling one score from the questions or statements that represent knowledge (i.e. q3_1b q3_1c q3_2f q3_3e & q3_4). The variable will be then used in a multinomial regression analysis (looking at factors affecting knowledge).

                  I hope this clarifies what I need to do. I am not quite sure whether this is doable in ways that do not reduce the level of contribution of each question towards the index (the variable that i need to create).

                  Comment


                  • #10
                    Thank you very much Nick!! The code works so well. It is exactly what I will need for the second part of my problem; which is determining how many people answered (how many questions) correctly?


                    Just to clarify the first part of my problem: I need to create one variable from q3_1b q3_1c q3_2f q3_3e q3_4 (which will be my dependent variable). This will be a way of compiling one score from the questions or statements that represent knowledge (i.e. q3_1b q3_1c q3_2f q3_3e & q3_4). The variable will be then used in a multinomial regression analysis (looking at factors affecting knowledge).

                    I hope this clarifies what I need to do. I am not quite sure whether this is doable in ways that do not reduce the level of contribution of each question towards the index (the variable that i need to create).

                    Comment


                    • #11
                      Good to hear of progress on your second question. Otherwise your explanation does not take me beyond your initial statement that you want to calculate a composite score. As already detailed in #8, there are many ways to do that.

                      Comment


                      • #12
                        I am not absolutely sure which one is best suited. But I strongly believe that a composite that gives the mean of all five variables is what I need.

                        Comment


                        • #13
                          Use egen then. It has a rowmean() function.

                          Comment


                          • #14
                            Thank you.

                            Comment

                            Working...
                            X