Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to separate an observation entered within an observation and use some of the content

    Hello,

    I have a dataset that is collected at village (group) level with randomly selected households that have more than one child (observations). Many of the responses on demographic variables are the same. However, information on child-related variables are not the same for each child.

    How can I create new observation for additional child within a same household at a same village in stata?
    The dataset has village number, village tract number, township code, geographic zone number, region/state number, year, and household number.

    Please help.

    Thank you.

  • #2
    Describing data sets in words is seldom helpful, and it certainly is far less useful than posting an example of what you have. Please use the -dataex- command for that.

    In addition, I have to say that it is not clear to me what you want the resulting data set to look like. So please hand work an example of that and show that, too, explaining how the values and variables in the resulting data set relate to the original.

    If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Hello,
      Thanks so much for your reply and suggestion.

      I here paste the data in two batches because -dataex- allows only certain number of variables across the rows. The data continues after a variable, household, across rows.
      I am imagining a unique identifier for the children especially that indicate that some children are from the same household, village, geographic zone, etc..
      Eg.
      009 9 2015 140204 14 1 79 01
      009 9 2015 140204 14 1 79 02

      Am I on the correct thinking path in terms of giving unique identifier? This data set will be merged with prior year dataset that have same data structure. The data were collected in same villages, not same children in 2015. Therefore, I really want to make sure that I would be giving identifier that do not mess up with the merge down the line.

      Very much looking forward to your reply. Sincerely, Aye Aye Khaine

      Example generated by -dataex-. To install: ssc install dataex
      clear
      input int questionnaire_no str3 villagecode int(villagetractcode year) long township byte(stateregion geographiczone) int houshold
      53 "009" 9 2015 140204 14 1 79
      187 "013" 12 2015 140204 14 1 13
      165 "015" 15 2015 140204 14 1 62
      1612 "145" 145 2015 80402 8 3 121
      1576 "146" 146 2015 80402 8 3 76
      end
      [/CODE]
      copy up to and including the previous line -- ----- ---- ----

      Listed 5 out of 21 observations

      input long q1_8 byte q1_9_years str14 householdincome str12 q20_6 str6 c1_12Child1 float(c1_65weight c1_66height c2_65weight c2_66height)
      13012016 13 "Ks 200,001 �" "Flush to pit" "Male" 9.5 82.2 16.5 104.2
      31012016 12 "Ks 50,001 �" "Hanging Latr" "Female" 12 86.9 11.9 91.1
      28012016 15 "Ks 75,001 �" "Flush to pit" "Female" 5.9 61.5 7.9 67.8
      22012016 11 "Ks 150,001 �" "Flush to pit" "Female" 13.2 99.6 . .
      20012016 14 "Ks 150,001 �" "Flush to pit" "Female" 7.4 70.3 13.8 98.9
      end

      Comment


      • #4
        Hello,
        The follow up to the previous post is that how can I copy the generic household information (such as income) and child specific information as a new row in Stata so that I do not need to do manual cut and paste procedure?

        Comment


        • #5
          Thank you for showing some example data. I still don't understand, however, what you want to do. Your original question asks how to create a "new observation for additional child within a same household at a same village in stata." But I don't understand how you would even recognize an additional child within the same household, as there seem to be no individual identifiers. In fact, it isn't even clear what you mean by "more than one child (observations)." In your example data, every household has only one observation and the only variable that is evidently related in some way to children is c1_12Child1. There are some c1_* and c2_* variables relating to weight and height. Perhaps those represent measurements of two different children and you want to make separate observations out of them? If that's it:

          Code:
          gen long obs_no = _n
          reshape long c@_65weight c@_66height c@_12Child1, i(obs_no) j(child_number)
          will do that. If that's not what you have in mind, please post back and show an example of what the end results should look like.

          Comment


          • #6
            Hello again,
            Thanks so much. You are right about not having individual identifiers and therefore, I am hoping to assign that individual identifier using Stata. I do not quite know how I generate individual identifier using

            villagecode int(villagetractcode year) long township byte(stateregion geographiczone) int houshold....... child#as imagined in the previous post:

            009 9 2015 140204 14 1 79 01 (for the first child)
            009 9 2015 140204 14 1 79 02 (for the second child from the same household)

            How do I generate such identifier as above example?

            Yes, I would like to make separate observations out of two different children. But, I would need a child-level identifier too as above. How would I do it correctly in stata?

            c1_65weight c1_66height is for child1 with his or her weight and height
            c2_65weight c2_66height is for child 2 (2nd child in the same household) with his or her weight and heigh


            villagecode villagetractcode year township stateregion geographiczone household#. householdincome toilet child#
            009 09 2015 140204 14 1 79 1000 yes 01 (for the first child)
            009 09 2015 140204 14 1 79 1000 yes. 02 (for the second child from the same household)

            010 09 2015 140204 14 1 79 1000 yes 01 (for the first child) (here household number is the same, but not village)
            010 09 2015 140204 14 1 79 1000 yes. 02 (for the second child from the same household)


            The above is what I am hoping to get.
            If I am not explaining it well. I shall try again.

            Thanks so much for your time and help. Very much appreciated and looking forward to reading more.

            Humbly,
            Aye Aye Khaine

            Comment


            • #7
              I meant:
              villagecode villagetractcode year township stateregion geographiczone household#. householdincome toilet child# childwt childhood
              009 09 2015 140204 14 1 79 1000 yes 01 12kg. 85.5cm (for the first child)
              009 09 2015 140204 14 1 79 1000 yes. 02 15kg 90.5cm (for the second child from the same household)

              010 09 2015 140204 14 1 79 1000 yes 01 15kg 100.00cm (for the first child) (here household number is the same, but not village)
              010 09 2015 140204 14 1 79 1000 yes. 02 12kg 85.5cm (for the second child from the same household)

              Comment


              • #8
                I believe the code I gave you in #5 creates everything you want except the identifier. Did you run it? If it doesn't do what you want, please post back and explain.

                As for creating an identifier, see -help egen- and look at the -concat()- function. That will create an identifier along the lines you are looking for. Following the code in #5:

                Code:
                tostring child_num, replace format(%02.0f)
                egen child_identifier = concat(villagecode villagetractcode year township stateregion geographiczone household child_num), punct(" ")

                Comment


                • #9
                  Hello,
                  This is what we did. Hope it is making sense without needing all the identifiers (village code villagetractcode year.....etc). The code below just gave Child id 1 (C1), Child id 2 (C2), etc... The assumption is that all other respective identifiers column get copied down to each row for that respective child.

                  gen id = _n

                  *** Reshape data
                  **Rename child code
                  ** 1st step: we rename the variables for children question from C1_ to C1_C to make
                  ** the converted name become string information, not numerical code
                  rename C1_* C1_C*
                  rename C2_* C2_C*
                  rename C3_* C3_C*
                  rename C4_* C4_C*
                  rename C5_* C5_C*

                  ** 2nd step: We move child number code from the start of variable name to the end
                  ** By doing so, we are easily identify the question number.
                  rename C1_* *C1
                  rename C2_* *C2
                  rename C3_* *C3
                  rename C4_* *C4
                  rename C5_* *C5

                  ** 3rd step: we reshape the data the structure of the code is
                  ** reshape long question_number, i(id) j(child_number) string
                  ** rehsape : used to reshape data structure
                  ** long : options, it means wide to long data
                  ** question_number: the list of every question number in the survey
                  ** i(id) : the id of household
                  ** j(child_number) : the number / order of children in the same houlhold for
                  ** those who has more than 1 child
                  ** string : options, it indicates that this is string variables.
                  reshape long C11 C12 C13 C13_OT_SP C14 C14_OT_SP C15 C16 C16_OT_SP C17 C17_OT_SP C21 C22 ///
                  C23 C24 C24_1 C24_2 C24_3 C31 C310 C311 C312 C313 C314 C315 C316 C317 C318 C319 C32 ///
                  C320 C321 C322 C323 C324 C325 C326 C327 C328 C329 C33 C330 C331 C332 C333 C334 C335 ///
                  C336 C337 C338 C34 C35 C36 C37 C38 C39 C41 C410 C411 C412 C413 C414 C415 C416 C417 ///
                  C418 C42 C43 C44 C45 C46 C47 C48 C49 C51 C60 C61 C62_DAYS C62_MONTHS C62_YEARS C63 ///
                  C63_OT_SP C64 C65 C66 C67 C68 C68_OT_SP, i(id) j(child_number) string

                  Comment


                  • #10
                    -reshape- will verify that any variable not mentioned in the list of variables to be -reshape-d -long- takes on the same value in all observations with the same value of the -i()-, i.e. that they are constant within id. If that verification is successful, then, yes, those values will be copied into all of the new observations that -reshape- creates.

                    If it does not appear to be working that way in your data, please post a data example where you are getting unexpected results.

                    Comment

                    Working...
                    X