Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reshaping dataset with repeat option

    Dear all,
    I have a dataset which has been imported with Survey CTO. The dataset is at household level, however there are some questions that have been asked to all the members of the household (like: have you been ill in the last 30 days?) In this case, the variable relative to that question is reported with at unserscore at the end, followed the number referring to that person in the household (a number from 1 to 15). E.g.: e1a_3 is the reported answer for the 3rd person in the household.
    Moreover, some of these individual-level questions are again subdivided, meaning that each person in the household could report more than one illness and that, for each illness, the type of illness is reported (malaria, flu, etc). E.g.: e1b_3_2 is the type of illness that individual 3 of the household reported as second illness in the last 30 days.

    The problem is: I need to create a dummy variable for each type of illness, because I need to know, across all households, how many people got each illness. I need this information because I subsequently have to regress the type of illness on the presence of medical support in the neighborhood. What can I do? I tried tab, gen and also xpose and reshape, but none of them seems to work properly, do you have any suggestions?

    Thank you!
    Best,
    Elisabetta

  • #2
    Welcome to Statalist.
    Your question, like most questions on here, is much easier to understand and answer if you show an example of your data. Please do so using dataex, by doing:
    Code:
    ssc install dataex
    dataex in 1/40
    See also the explanation on how and why of dataex in the FAQ (http://www.statalist.org/forums/help#stata).
    Note there's no need to install dataex if you are on Stata 15, and that if you are concerned over privacy, you can anonymize your data.

    Please also explain how the diseases may be identified, if ts not clear from the data example. Right now I am guessing it is one of the numbers in e.g., e1b_3_2 ?

    Comment


    • #3
      Dear Jorrit,
      thank you for replying.

      The numbers in e1b_3_2 only refer to the household member (3, meaning the third household member) and to the number of reported illness (2, which means the second illness mentioned by the member during the survey, up to a maximum of 4). Hence, the number 2 is only referring to the order with which the person mentioned that illness during the survey. For example, let's say the 3rd person in the household has bees sick 3 times in the last 30 days and he/she reports illnessX, illnessY and illnessZ in this order. Then the variables will become: e1b_3_1 for illnessX, e1b_3_2 for the illnessY and e1b_3_3 for illnessZ, but only because IllnessX was mentioned as first by that member, illnessY as second, and illness3 as third.

      The name of the variable is not saying anything about the type of illness: you can understand what type of illness illnessX, Y and Z were by looking at the value that e1b_3_1, e1b_3_2 and e1b_3_3 assume. Each type of illness is associated with a number from 1 to 34 in alphabetical order, starting from Asthma (number 1) and getting to Vomiting (number 33) and Other (number 34). Let's say that e1b_3_1 assumes value 12, e1b_3_2 has value 4 and e1b_3_3 has value 33. Then, the 3rd individual in the household has suffered: as a first illness, from difficulties in breathing (number 12 in the alphabetical list of the illnesses); as second illness from blood in urine (number 4), and lastly from Vomiting and nausea (number 33).

      I hope this explanation is clear enough. Unfortunately, the dataset is a bit messy and counterintuitive. I'll give you an example of it using dataex, as you suggested (I am sorry if I didn't do it before, I'm new to the forum and didn't know how to). the variable id refers to the household id, e1a_* and e1b_* variables are the ones i mentioned in my messages.

      Thank you!

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long id byte(e1a_1 e1a_2 e1a_3 e1a_4 e1a_5 e1a_6 e1a_7 e1a_8 e1a_9 e1a_10 e1a_11 e1a_12 e1a_13 e1a_14 e1a_15 e1b_1_1 e1b_2_1 e1b_3_1 e1b_4_1 e1b_5_1 e1b_6_1 e1b_7_1 e1b_8_1 e1b_9_1 e1b_10_1)
        3 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
       21 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
       22 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
       23 1 . . . . . . . . . . . . . .  8  .  .  .  . .  . . . .
       24 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
       25 2 . . . . . . . . . . . . . .  8  .  .  .  . .  . . . .
       26 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
       27 1 . . . . . . . . . . . . . . 88  .  .  .  . .  . . . .
       28 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
       29 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
       30 2 . . . . . . . . . . . . . . 20  .  .  .  . .  . . . .
       31 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
       33 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
       34 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
       35 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
       36 1 . . . . . . . . . . . . . . 16  .  .  .  . .  . . . .
       37 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
       38 1 . . . . . . . . . . . . . . 31  .  .  .  . .  . . . .
       39 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
       40 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
       41 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
       42 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
       43 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
       44 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
       49 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
       50 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
       81 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      105 1 . . . . . . . . . . . . . . 18  .  .  .  . .  . . . .
      132 2 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      135 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      136 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      137 . . . 1 . . . . . . . . . . .  .  .  . 31  . .  . . . .
      147 2 . . . . . . . . . . . . . . 31  .  .  .  . .  . . . .
      148 . . 3 . . . . . . . . . . . .  .  . 16  .  . .  . . . .
      158 1 . . . . . . . . . . . . . . 13  .  .  .  . .  . . . .
      203 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      238 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      254 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      300 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      310 1 . . . . . . . . . . . . . . 88  .  .  .  . .  . . . .
      311 1 . . . . . . . . . . . . . .  8  .  .  .  . .  . . . .
      312 1 . 2 . . . . . . . . . . . .  8  . 20  .  . .  . . . .
      313 1 . . . . . . . . . . . . . . 16  .  .  .  . .  . . . .
      314 2 . . . . . . . . . . . . . .  8  .  .  .  . .  . . . .
      351 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      352 1 . . . . . . . . . . . . . . 21  .  .  .  . .  . . . .
      355 . . . . . . 2 . . . . . . . .  .  .  .  .  . . 28 . . .
      356 1 . . . . . . . . . . . . . . 13  .  .  .  . .  . . . .
      357 . . . . 2 . . . . . . . . . .  .  .  .  . 27 .  . . . .
      358 1 . . . . . . . . . . . . . . 88  .  .  .  . .  . . . .
      359 1 . . . . . . . . . . . . . . 88  .  .  .  . .  . . . .
      360 1 . . . . . 1 . . . . . . . . 88  .  .  .  . . 27 . . .
      361 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      362 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      363 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      364 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      365 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      366 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      367 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      369 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      372 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      373 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      374 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      390 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      391 1 . . . . . . . . . . . . . .  2  .  .  .  . .  . . . .
      392 1 . . . . . . . . . . . . . . 31  .  .  .  . .  . . . .
      393 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      394 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      395 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      396 1 . . . . . . . . . . . . . . 20  .  .  .  . .  . . . .
      397 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      398 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      399 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      400 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      401 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      402 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      403 1 . . . . . . . . . . . . . . 18  .  .  .  . .  . . . .
      404 1 . . . . . . . . . . . . . . 18  .  .  .  . .  . . . .
      405 2 . . . . . . . . . . . . . .  8  .  .  .  . .  . . . .
      461 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      462 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      463 1 . 1 . . . . . . . . . . . . 31  .  8  .  . .  . . . .
      465 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      466 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      467 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      468 1 . 1 . . . . . . . . . . . . 27  . 27  .  . .  . . . .
      469 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      470 1 1 . . . . . . . . . . . . . 20 27  .  .  . .  . . . .
      491 2 2 1 1 . . . . . . . . . . .  8  8 10 12  . .  . . . .
      492 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      493 1 . . . . . . . . . . . . . . 18  .  .  .  . .  . . . .
      494 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      495 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      496 . . . . . . . . . . . . . . .  .  .  .  .  . .  . . . .
      497 1 . . . . . . . . . . . . . .  8  .  .  .  . .  . . . .
      531 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      532 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      534 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      535 1 . . . . . . . . . . . . . . 20  .  .  .  . .  . . . .
      536 1 . . . . . . . . . . . . . . 27  .  .  .  . .  . . . .
      end
      label values e1b_1_1 e1b_11
      label def e1b_11 2 "BLEEDING/WOUND", modify
      label def e1b_11 8 "COUGH, RUNNY NOSE, SORE THROAT", modify
      label def e1b_11 13 "DIFFICULTY BREATHING, FAST BREATHING", modify
      label def e1b_11 16 "FEVER", modify
      label def e1b_11 18 "FLU", modify
      label def e1b_11 20 "HEADACHE", modify
      label def e1b_11 21 "HEART DISEASE", modify
      label def e1b_11 27 "MALARIA", modify
      label def e1b_11 31 "PAIN/ACHES", modify
      label def e1b_11 88 "OTHER [SPECIFY]", modify
      label values e1b_2_1 e1b_21
      label def e1b_21 8 "COUGH, RUNNY NOSE, SORE THROAT", modify
      label def e1b_21 27 "MALARIA", modify
      label values e1b_3_1 e1b_31
      label def e1b_31 8 "COUGH, RUNNY NOSE, SORE THROAT", modify
      label def e1b_31 10 "DENTAL PROBLEM", modify
      label def e1b_31 16 "FEVER", modify
      label def e1b_31 20 "HEADACHE", modify
      label def e1b_31 27 "MALARIA", modify
      label values e1b_4_1 e1b_41
      label def e1b_41 12 "DIARRHEA", modify
      label def e1b_41 31 "PAIN/ACHES", modify
      label values e1b_5_1 e1b_51
      label def e1b_51 27 "MALARIA", modify
      label values e1b_6_1 e1b_61
      label values e1b_7_1 e1b_71
      label def e1b_71 27 "MALARIA", modify
      label def e1b_71 28 "MENTAL DISORDER", modify
      label values e1b_8_1 e1b_81
      label values e1b_9_1 e1b_91
      label values e1b_10_1 e1b_101

      Comment


      • #4
        Alright. I have a better idea of the pattern now. Much better to understand with the data example.

        So it is: e1b_X_Y, where X is the person number in the household, Y is the sequence umber for the disease mentioned.

        Questions remaining:
        In e1b_X_Y, the e1b is simply the question number?
        id is household id?
        the variables e1a_1 through e1a_15: are they relevant to your question here? In the reshape, should they just be repeated for every new observation of id created?

        Lastly, my approach would be to reshape long, to create variables "household id" "person id" "disease name" "order in which disease was mentioned"
        Would that be what you had in mind?

        Comment


        • #5
          1) Yes, e1b is just the question number
          2) Yes, id is the household id
          3) Questions e1a_1 to e1a_15 do not necessarily need to be reshaped, but it would be better if I could reshape them as well.
          4) Yes, that would work

          Thanks!

          Comment


          • #6
            Code:
            reshape long e1b_, i(id) j(q) string
            ren id hhid
            gen pid=substr(q,1,strpos(q,"_")-1)
            gen dis_seq=substr(q,strpos(q,"_")+1,.)
            destring pid dis_seq, replace
            drop q
            ren e1b_ disease_name
            label values disease_name e1b_11
            Note that I apply one of the existing labels to the new variable "disease_name". This is not a complete list, because different e1b_X_Y variables in the original had sub-sets of the complete list of lable sin your dataex output. I'm guessing you know best what the complete list would be.
            Some of the renaming etc is optional, of course, customize to personal preferences.

            Comment


            • #7
              Thank you very very much Jorrit!! Have a lovely day

              Comment

              Working...
              X