Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • inter-religious marriage

    Dear sir

    I am working with Census data. To make it simple, consider a database as follows:
    id family_id position (in family) religion
    1 1 head (1) A
    2 1 partner (2) A
    3 1 son (3) A
    4 2 head (1) A
    5 2 partner (2) B
    6 3 head (1) A
    7 4 head (1) B
    8 4 partner (2) A
    9 4 son (3) A
    I need to count the number of same-religion and different-religion marriages.

    In this very simple database above, the result should be: AA = 1; AB = 1; BA = 1.

    So far I managed to create a new variable "position-religion" (1-A; 2-A; 3-A; 1A; 2-B; ...).

    I guess I have to create another new variable, assigning the position_religion of the head of the family to all the other ids in the same family_ids. If I manage to do that, a simple frequency table will provide the result.

    Could you please help me in creating this new variable?

    Thanks in advance
    Sergio Goldbaum

  • #2
    Actually I guess I made it.

    sort family_id position

    gen position_religion = position + "-" + religion

    by family_id: gen head_religion = position_religion if position ==1

    replace head_religion = head_religion[_n-1] if missing(head_religion)

    gen partner_religion = position_religion + "-" + head_religion if position==2

    tab partner_religion

    Cgatgpt helped me. If someone has a more elegant way to solve it, I appreciate.

    Thanks
    Sergio

    Comment


    • #3
      Your data example doesn't provide enough variation or "realism" to offer a rigorous test of the solution that follows, but give this a try:

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte(id family_id position) str1 religion
      1 1 1 "A"
      2 1 2 "A"
      3 1 3 "A"
      4 2 1 "A"
      5 2 2 "B"
      6 3 1 "A"
      7 4 1 "B"
      8 4 2 "A"
      9 4 3 "A"
      end
      //
      // A numeric religion variable allows use of egen's min() function.
      encode religion, gen(numrelig)
      egen byte headrelig = min(numrelig/(position == 1)) , by(family_id)
      egen byte partnerrelig = min(numrelig/(position == 2)), by(family_id)
      gen byte samerelig = (headrelig == partnerrelig)
      tab samerelig if !missing(partnerrelig)

      For the future, I'd encourage you to check out the StataList FAQ about using the -dataex- command to post a data example. Another suggestion would be to provide example data that more fully (if not perfectly) illustrates the variations in data patterns that might occur. Both of these will increase your chances of getting a quick and helpful answer.

      Last edited by Mike Lacy; 21 May 2023, 23:43.

      Comment


      • #4
        Mike's trickery of dividing by expressions such as (position == 1) or (position==2) can be spelled out in this way.

        True or false expressions like those will evaluate to 1 if true and 0 if false.

        DIviding by 1 makes no change to the numerator, naturally -- consider that 42 / 1 = 42, 666 / 1 = 666, and so on -- but dividing by 0 produces missing values.

        Often in Stata, as in mathematics, dividing by 0 is a sign or a source of a problem! But in the context of egen functions such as min() that does not matter because missing values are just ignored to the extent possible. What is the minimum of 1, 2, 3 and missing? 1 is Stata's answer while missing could be a philosopher's answer.

        After I publicised this device in a paper linked below there was some direct or indirect flak with the flavour, We see what that does, but it is a little tricky to figure out, and the calculation best done more transparently.

        I tend to agree with the flak and now favour writing (say)

        Code:
         
         min(cond(position == 1, numrelig, .))
        rather than

        Code:
         
         min(numrelig/(position == 1))
        Naturally, some people understand cond() but don't like it and some people won't have met it. The bigger point here is just that you have choices here given your coding taste, including writing slower and more long-winded code.

        For more discussion if you want it see Sections 9 and 10 in https://journals.sagepub.com/doi/pdf...867X1101100210

        Comment


        • #5
          I had earned the denominator trick from one of Nick's postings, but using cond() is more transparent and therefore preferable.

          Comment


          • #6
            Dear Mike and Nick.

            First thank you very much for your suggestions and sorry for the long delay in answering you.

            Actually I need to count all the marriage types, like AA, AB, CA, etc

            Following Mike's suggestion, I run dataex. A sample of my database is below, where:

            v0300 is the family_id,

            v0502 is the position in the family (1 is for the head of family, 2 and 3 are for the different- and same-sex spouse respectively, 4 to 20 are for children, grandparents etc),

            v6121 is religion of the id and

            v0010m is the frequency weight (I had to multiply it by 10^8 to overcome the decimal).

            As you can notice, religion is already a numeric variable.

            Code:
            * Example generated by -dataex-. For more info, type help dataex
            clear
            input long v0300 byte v0502 int v6121 double v0010m
             1  1 110 18.4285827461399
             1  2 110 18.4285827461399
             1  4 110 18.4285827461399
             2  1 490 17.6242765703013
             2  2 110 17.6242765703013
             3  1 110 11.4125858336706
             3  2 110 11.4125858336706
             3  4 110 11.4125858336706
             3  4 110 11.4125858336706
             3  4 110 11.4125858336706
             3  4 110 11.4125858336706
             4  1 110  1.5962037653904
             4  2 110  1.5962037653904
             4  4 110  1.5962037653904
             5  1 110     8.0002675586
             5  2 110     8.0002675586
             5  4 110     8.0002675586
             6  1 110  6.7419148125844
             6  2 110  6.7419148125844
             6  4 110  6.7419148125844
             7  1 310 11.1835095058248
             7  2 310 11.1835095058248
             7  4 310 11.1835095058248
             8  1 310 20.0647494193818
             8  2 750 20.0647494193818
             8 10 750 20.0647494193818
             9  1 110 10.1054061583665
             9  2 110 10.1054061583665
            10  1 110 12.4278135165927
            10  2 110 12.4278135165927
            10  4 110 12.4278135165927
            10  4 110 12.4278135165927
            11  1 110  4.9728329577781
            12  1 110 12.8236357379708
            12  2 110 12.8236357379708
            12  4 110 12.8236357379708
            13  1 110  10.082938665489
            13  2 110  10.082938665489
            13  4 110  10.082938665489
            13  4 110  10.082938665489
            14  1 240 10.2709547423656
            14  2 240 10.2709547423656
            14  4 110 10.2709547423656
            15 20 310  3.0310872962394
            16  1 110  3.0987903288935
            16  2 110  3.0987903288935
            16  4 110  3.0987903288935
            16  4 110  3.0987903288935
            16 10 110  3.0987903288935
            16 10 110  3.0987903288935
            end

            I also mentioned that ChatGPT helped me, it suggested me a code that worked pretty well after a few adaptations.

            I am leaving the code here in case someone finds it useful.

            Code:
            gen pos_fam = v0502
            
            tostring pos_fam, replace
            
            gen relig = v6121
            
            tostring relig, replace
            
            gen pos_fam_relig = pos_fam + "-" + relig
            
            sort v0300 v0502
            
            by v0300: gen head_relig = pos_fam_relig[1] if v0502 == 1
            
            replace head_relig = head_relig[_n-1] if missing(head_relig)
            
            gen pos_fam_relig2 = pos_fam_relig + "-" + head_relig if (v0502==2 |v0502==3)
            
            tab2xl  pos_fam_relig2 [fw=v0010m] using testfile, col(1) row(1)
            Finally, I appreciate if someone suggests a more elegant code and I would like to thank the nice support again.

            All the best,

            Sergio Goldbaum
            Last edited by Sergio Goldbaum; 11 Jun 2023, 13:10.

            Comment


            • #7
              I think something like this would suffice (this is very similar to Mike's code):

              Code:
              egen head_religion = min(cond(v0502 == 1, v6121, .)), by(v0300)
              egen partner_religion = min(cond(inlist(v0502, 2, 3), v6121, .)), by(v0300)
              egen byte tag = tag(v0300)
              
              tab head_religion partner_religion [iw = v0010m] if tag
              Frequency weights can only be integers, so I have used the more generic "importance" weights here. You will be in the best position to decide how to weight the tabulation.

              Comment


              • #8
                Your first 5 lines

                Code:
                 
                 gen pos_fam = v0502  tostring pos_fam, replace  gen relig = v6121  tostring relig, replace  gen pos_fam_relig = pos_fam + "-" + relig
                boil down to

                Code:
                egen pos_fam_relig = concat(v0502 v6121), p("-")
                and some other simplifications are possible. But a direct attack would be preferable in my view. I revert to @Mike Lacy's data example from #3, rather than deal with your variable names.


                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input byte(id family_id position) str1 religion
                1 1 1 "A"
                2 1 2 "A"
                3 1 3 "A"
                4 2 1 "A"
                5 2 2 "B"
                6 3 1 "A"
                7 4 1 "B"
                8 4 2 "A"
                9 4 3 "A"
                end
                
                gen wanted = religion if position == 1 
                bysort family_id (wanted) : replace wanted = wanted[_N] 
                replace wanted = wanted + "-" + religion if position == 2 
                bysort family_id (wanted) : replace wanted = wanted[_N]
                
                split wanted, parse("-")
                gen different = wanted2 != wanted1 if !missing(wanted1, wanted2)
                
                list, sepby(family_id)
                
                    +-----------------------------------------------------------------------------+
                     | id   family~d   position   religion   wanted   wanted1   wanted2   differ~t |
                     |-----------------------------------------------------------------------------|
                  1. |  3          1          3          A      A-A         A         A          0 |
                  2. |  1          1          1          A      A-A         A         A          0 |
                  3. |  2          1          2          A      A-A         A         A          0 |
                     |-----------------------------------------------------------------------------|
                  4. |  4          2          1          A      A-B         A         B          1 |
                  5. |  5          2          2          B      A-B         A         B          1 |
                     |-----------------------------------------------------------------------------|
                  6. |  6          3          1          A        A         A                    . |
                     |-----------------------------------------------------------------------------|
                  7. |  9          4          3          A      B-A         B         A          1 |
                  8. |  7          4          1          B      B-A         B         A          1 |
                  9. |  8          4          2          A      B-A         B         A          1 |
                     +-----------------------------------------------------------------------------+
                I guess you need more checks than are shown here.

                Code:
                egen count1 = total(position == 1), by(family_id)
                egen count2 = total(position == 2), by(family_id) 
                egen which = concat(count1 count2), p(" ")
                so that values for which like "0 0" "1 0" "0 1" "1 1" "2 1" "1 2" should be of interest or concern.

                I am not clear why
                Code:
                 
                 (v0502==2 |v0502==3)
                appears in #6. What have sons got to do with it?

                Comment


                • #9
                  Nick Cox

                  2 and 3 are for the different- and same-sex spouse respectively

                  Comment


                  • #10
                    Hemanshu Kumar As said in #8, I am using the example from #3 which drew on the example and explanation in #1.

                    So thanks for flagging that Sergio Goldbaum changed the rules (in #6). I didn't spot that.

                    Comment

                    Working...
                    X