Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Create a counting variable

    Hello everyone, I'm new to Stata programing and need some help.

    I'm currently cleaning and adjusting some variables hat I'll need to use on my models and one of them is the number of members living in the interviwed household.

    I criated a ID variable (idom) to give each household a specific number based on serials, controls, etc.

    This count variable already exists for some years, but not for the entire datased. I'am basically trying to recreate it.

    Here is a sample of what I have:

    - idom is the ID that I created for each member of an specific household and v4741 is the variable with how many members this household has. I want to recreate the variable v4741 counting how many times this specific ID appears.


    Obs. idom v4741
    ----------------
    7593667. 400713 6
    7593668. 400713 6
    7593669. 400713 6
    7593670. 400713 6
    7593671. 400713 6
    7593672. 400713 6
    ----------------
    7593673. 400714 4
    7593674. 400714 4
    7593675. 400714 4
    7593676. 400714 4
    ----------------
    7593677. 400716 4
    7593678. 400716 4
    7593679. 400716 4
    7593680. 400716 4
    ----------------
    7593681. 400717 5
    7593682. 400717 5
    7593683. 400717 5
    7593684. 400717 5
    7593685. 400717 5
    ----------------
    7593686. 400718 3
    7593687. 400718 3
    7593688. 400718 3
    ----------------
    7593689. 400719 2
    7593690. 400719 2
    ----------------
    7593691. 400720 5
    7593692. 400720 5
    7593693. 400720 5
    7593694. 400720 5
    7593695. 400720 5

    Since my dataset has more than 7.5 million observations I think it's better to handle this on Stata, but this is something that I could make on Excel with a 'countifs' formula.

    Any tips to share?

    Thank you!

  • #2
    Code:
    bysort idom: gen wanted = _N

    Comment


    • #3
      Hi Nick, thanks for the answer.

      I tried your code and it yielded the results as follows (under the variable name 'membros'):

      Code:
               +--------------------------+
               |   idom   v4741   membros |
               |--------------------------|
      7593669. | 401203       2         1 |
      7593670. | 401203       2         2 |
               |--------------------------|
      7593671. | 401204       3         1 |
      7593672. | 401204       3         2 |
      7593673. | 401204       3         3 |
               |--------------------------|
      7593674. | 401205       1         1 |
               |--------------------------|
      7593675. | 401206       4         1 |
      7593676. | 401206       4         2 |
      7593677. | 401206       4         3 |
      7593678. | 401206       4         4 |
               |--------------------------|
      7593679. | 401207       3         1 |
      7593680. | 401207       3         2 |
      7593681. | 401207       3         3 |
               |--------------------------|
      7593682. | 401208       2         1 |
      7593683. | 401208       2         2 |
               |--------------------------|
      7593684. | 401209       1         1 |
               |--------------------------|
      7593685. | 401210       2         1 |
      7593686. | 401210       2         2 |
               |--------------------------|
      7593687. | 401211       3         1 |
      7593688. | 401211       3         2 |
      7593689. | 401211       3         3 |
               |--------------------------|
      7593690. | 401212       3         1 |
      7593691. | 401212       3         2 |
      7593692. | 401212       3         3 |
               |--------------------------|
      7593693. | 401213       3         1 |
      7593694. | 401213       3         2 |
      7593695. | 401213       3         3 |
               +--------------------------+
      As you can see the variable 'membros' counts the number of members on the household from 1 to N. I would like to 'membros' only to have N as value. The answer should be the same as v4741. For example, for the last 3 observations 'membros' should have all lines filled with '3' instead of '1, 2 and 3'.

      Thanks.

      Comment


      • #4
        Um, Nick's code in #2 will give you exactly what you are looking for. I suspect that you did not implement it correctly. If you typed _n where Nick has _N you will get 1, 2, 3. You must use the upper case _N to get 3, 3, 3.

        All Stata code is case-sensitive!

        Comment


        • #5
          Hi Clyde. Thanks for the answer.

          Yeah, you were right. I forgot that Stata is case-sensitive. Everything is googd to go now.

          Thank you both for the help.

          Regards,

          Fernando.

          Comment


          • #6
            Hello again.

            Just using the same topic for a similar question.

            The last variable that I need to create is the household earnigs from individual earnings of individual members (v4720).

            Code:
                     +----------------+
                     |   idom   v4720 |
                     |----------------|
            7593669. | 400713       0 |
            7593670. | 400713       0 |
            7593671. | 400713       0 |
            7593672. | 400713     812 |
                     |----------------|
            7593673. | 400714    3100 |
            7593674. | 400714       0 |
            7593675. | 400714       . |
            7593676. | 400714       . |
                     |----------------|
            7593677. | 400716    1600 |
            7593678. | 400716    1100 |
            7593679. | 400716       . |
            7593680. | 400716       . |
                     |----------------|
            7593681. | 400717     500 |
            7593682. | 400717       . |
            7593683. | 400717       . |
            7593684. | 400717       . |
            7593685. | 400717       . |
                     |----------------|
            7593686. | 400718     800 |
            7593687. | 400718     800 |
            7593688. | 400718     788 |
                     |----------------|
            7593689. | 400719    1500 |
            7593690. | 400719       0 |
                     |----------------|
            7593691. | 400720     900 |
            7593692. | 400720      96 |
            7593693. | 400720     300 |
            7593694. | 400720       0 |
            7593695. | 400720       . |
                     +----------------+
            It's basically the same problem from above, but instead of counting I need to sum the values.

            Any hints?

            Thanks.

            Comment


            • #7
              And it's basically the same answer.

              Code:
              by idom, sort: egen hh_income = total(v4720)

              Comment


              • #8
                Everything done.

                Thank you all!

                Comment

                Working...
                X