Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • add incremental values to existing variable

    Hello,

    I have a household roster that contains household number, household member name, and member ID. This roster served as a baseline roster, and was used to collect end line data. All NEW members that were not in the baseline but were in the end line (new additions to the household either through marriage or birth) are now in the roster, but do not have a member ID. Is there a way to automatically generate a member ID for these new members that is n+1 from the highest value of baseline household member IDs?

    Example:
    Household Number Member Name Member ID
    1 Alex 1
    1 Jane 2
    1 Sarah .
    2 Ali 1
    2 Omar .

    Is there a command that would replace the missing ID for Sarah with "3" and Omar with "2"? Sometimes I have 2 new members, so it would need to be n+1, then n+2.

    Many thanks,
    Jowel
    Last edited by Jowel Choufani; 31 Jan 2018, 09:06.

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte householdnumber str5 membername byte memberid
    1 "Alex"  1
    1 "Jane"  2
    1 "Sarah" .
    2 "Ali"   1
    2 "Omar"  .
    end
    
    by householdnumber (memberid), sort: ///
        replace memberid = memberid[_n-1]+1 if missing(memberid)
    The logic here is that when the data are sorted on memberid (within householdnumber), the missing values sort last. Then it is just a matter of adding 1 to each subsequent memberid in the household group.

    In the future, please use the -dataex- command to show your example data, as I have done in this response. The table you show has the following drawbacks:

    1. It took you longer to create than it would have taken you to use -dataex-.
    2. It isn't actually Stata data, because the column headers in the table are not legal Stata variable names, due to embedded blanks.
    3. It leaves anyone responding to make assumptions about the data which, if wrong, will lead to incorrect solutions to your problem. For example, I'm assuming that memberid is actually a numeric variable, not a string that happens to look like numbers. If I have that wrong, the code above will fail abysmally. Writing code for imaginary data is always speculative. If you provide a real data example with -dataex-, then the code can be tested out on the kind of data you need it to run on, and you have a much better chance of getting the right answer the first time.

    If you are running Stata version 15.1, -dataex- is part of your official installation. If running an earlier version, run -ssc install dataex- to get the command. Either way, read -help dataex- for the simple instructions for using it. Going forward, whenever you want help with code, show a representative example of your Stata data. And whenever showing Stata data, use -dataex- to do it.

    Comment


    • #3
      Documented within FAQ https://www.stata.com/support/faqs/d...issing-values/ Section 7

      You don't give a legal Stata data example (please do read and act on FAQ Advice #12), but something like

      Code:
      bysort Household (MemberID) : replace MemberID = MemberID[_n-1] + 1 if missing(MemberID)
      would work for your example.

      Comment


      • #4
        Dear Clyde and Nick,

        Thank you for your response. The code Clyde provided worked.

        I will ensure to use the -dataex- command in the future.

        Many thanks,
        Jowel

        Comment


        • #5
          Just to point out that it's the same answer from both of us!

          Comment


          • #6
            Apologies for missing that, Nick!! Thanks a lot.

            Comment


            • #7
              Dear Nick Cox
              i have a household roster data in form like
              respondent 1
              name 1 status_in_HH var2data var3data
              name 2 status_in_HH var2data var3data
              .
              .
              .

              respondent 2
              name 1 status_in_HH var2data var3data
              name 2 status_in_HH var2data var3data
              .
              .
              .


              and so on.
              now for roster variables, data isn't captured for the respondent.

              the common identifier is an ID which is unique to each household

              Now, i want data like status in household to be populated against each respondent. How can this be done?

              I am really new to stata and strugglinh here.

              Comment


              • #8
                I can't see that #7 bears any relation to the thread title. That being so, please start a new thread with a better title. It would be a really good idea to read https://www.statalist.org/forums/help before you post as a schematic description of your data is less helpful than a concrete example.

                Comment

                Working...
                X