inter-religious marriage

Sergio Goldbaum

Join Date: Sep 2017

Posts: 10
#1

inter-religious marriage

21 May 2023, 21:18

Dear sir

I am working with Census data. To make it simple, consider a database as follows:

id family_id position (in family) religion

1 1 head (1) A

2 1 partner (2) A

3 1 son (3) A

4 2 head (1) A

5 2 partner (2) B

6 3 head (1) A

7 4 head (1) B

8 4 partner (2) A

9 4 son (3) A

I need to count the number of same-religion and different-religion marriages.

In this very simple database above, the result should be: AA = 1; AB = 1; BA = 1.

So far I managed to create a new variable "position-religion" (1-A; 2-A; 3-A; 1A; 2-B; ...).

I guess I have to create another new variable, assigning the position_religion of the head of the family to all the other ids in the same family_ids. If I manage to do that, a simple frequency table will provide the result.

Could you please help me in creating this new variable?

Thanks in advance
Sergio Goldbaum
Tags: None
Sergio Goldbaum

Join Date: Sep 2017

Posts: 10
#2

21 May 2023, 22:56

Actually I guess I made it.

sort family_id position

gen position_religion = position + "-" + religion

by family_id: gen head_religion = position_religion if position ==1

replace head_religion = head_religion[_n-1] if missing(head_religion)

gen partner_religion = position_religion + "-" + head_religion if position==2

tab partner_religion

Cgatgpt helped me. If someone has a more elegant way to solve it, I appreciate.

Thanks
Sergio
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2421
#3

21 May 2023, 22:58

Your data example doesn't provide enough variation or "realism" to offer a rigorous test of the solution that follows, but give this a try:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input byte(id family_id position) str1 religion 1 1 1 "A" 2 1 2 "A" 3 1 3 "A" 4 2 1 "A" 5 2 2 "B" 6 3 1 "A" 7 4 1 "B" 8 4 2 "A" 9 4 3 "A" end // // A numeric religion variable allows use of egen's min() function. encode religion, gen(numrelig) egen byte headrelig = min(numrelig/(position == 1)) , by(family_id) egen byte partnerrelig = min(numrelig/(position == 2)), by(family_id) gen byte samerelig = (headrelig == partnerrelig) tab samerelig if !missing(partnerrelig)

For the future, I'd encourage you to check out the StataList FAQ about using the -dataex- command to post a data example. Another suggestion would be to provide example data that more fully (if not perfectly) illustrates the variations in data patterns that might occur. Both of these will increase your chances of getting a quick and helpful answer.

Last edited by Mike Lacy; 21 May 2023, 23:43.
3 likes
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35780
#4

22 May 2023, 03:18

Mike's trickery of dividing by expressions such as (position == 1) or (position==2) can be spelled out in this way.

True or false expressions like those will evaluate to 1 if true and 0 if false.

DIviding by 1 makes no change to the numerator, naturally -- consider that 42 / 1 = 42, 666 / 1 = 666, and so on -- but dividing by 0 produces missing values.

Often in Stata, as in mathematics, dividing by 0 is a sign or a source of a problem! But in the context of egen functions such as min() that does not matter because missing values are just ignored to the extent possible. What is the minimum of 1, 2, 3 and missing? 1 is Stata's answer while missing could be a philosopher's answer.

After I publicised this device in a paper linked below there was some direct or indirect flak with the flavour, We see what that does, but it is a little tricky to figure out, and the calculation best done more transparently.

I tend to agree with the flak and now favour writing (say)

Code:

min(cond(position == 1, numrelig, .))

rather than

Code:

min(numrelig/(position == 1))

Naturally, some people understand cond() but don't like it and some people won't have met it. The bigger point here is just that you have choices here given your coding taste, including writing slower and more long-winded code.

For more discussion if you want it see Sections 9 and 10 in https://journals.sagepub.com/doi/pdf...867X1101100210
1 like
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2421
#5

22 May 2023, 08:57

I had earned the denominator trick from one of Nick's postings, but using cond() is more transparent and therefore preferable.
1 like
Comment

Sergio Goldbaum

Join Date: Sep 2017
Posts: 10

11 Jun 2023, 13:02

Dear Mike and Nick.

First thank you very much for your suggestions and sorry for the long delay in answering you.

Actually I need to count all the marriage types, like AA, AB, CA, etc

Following Mike's suggestion, I run dataex. A sample of my database is below, where:

v0300 is the family_id,

v0502 is the position in the family (1 is for the head of family, 2 and 3 are for the different- and same-sex spouse respectively, 4 to 20 are for children, grandparents etc),

v6121 is religion of the id and

v0010m is the frequency weight (I had to multiply it by 10^8 to overcome the decimal).

As you can notice, religion is already a numeric variable.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input long v0300 byte v0502 int v6121 double v0010m
 1  1 110 18.4285827461399
 1  2 110 18.4285827461399
 1  4 110 18.4285827461399
 2  1 490 17.6242765703013
 2  2 110 17.6242765703013
 3  1 110 11.4125858336706
 3  2 110 11.4125858336706
 3  4 110 11.4125858336706
 3  4 110 11.4125858336706
 3  4 110 11.4125858336706
 3  4 110 11.4125858336706
 4  1 110  1.5962037653904
 4  2 110  1.5962037653904
 4  4 110  1.5962037653904
 5  1 110     8.0002675586
 5  2 110     8.0002675586
 5  4 110     8.0002675586
 6  1 110  6.7419148125844
 6  2 110  6.7419148125844
 6  4 110  6.7419148125844
 7  1 310 11.1835095058248
 7  2 310 11.1835095058248
 7  4 310 11.1835095058248
 8  1 310 20.0647494193818
 8  2 750 20.0647494193818
 8 10 750 20.0647494193818
 9  1 110 10.1054061583665
 9  2 110 10.1054061583665
10  1 110 12.4278135165927
10  2 110 12.4278135165927
10  4 110 12.4278135165927
10  4 110 12.4278135165927
11  1 110  4.9728329577781
12  1 110 12.8236357379708
12  2 110 12.8236357379708
12  4 110 12.8236357379708
13  1 110  10.082938665489
13  2 110  10.082938665489
13  4 110  10.082938665489
13  4 110  10.082938665489
14  1 240 10.2709547423656
14  2 240 10.2709547423656
14  4 110 10.2709547423656
15 20 310  3.0310872962394
16  1 110  3.0987903288935
16  2 110  3.0987903288935
16  4 110  3.0987903288935
16  4 110  3.0987903288935
16 10 110  3.0987903288935
16 10 110  3.0987903288935
end

I also mentioned that ChatGPT helped me, it suggested me a code that worked pretty well after a few adaptations.

I am leaving the code here in case someone finds it useful.

Code:

gen pos_fam = v0502

tostring pos_fam, replace

gen relig = v6121

tostring relig, replace

gen pos_fam_relig = pos_fam + "-" + relig

sort v0300 v0502

by v0300: gen head_relig = pos_fam_relig[1] if v0502 == 1

replace head_relig = head_relig[_n-1] if missing(head_relig)

gen pos_fam_relig2 = pos_fam_relig + "-" + head_relig if (v0502==2 |v0502==3)

tab2xl  pos_fam_relig2 [fw=v0010m] using testfile, col(1) row(1)

Finally, I appreciate if someone suggests a more elegant code and I would like to thank the nice support again.

All the best,

Sergio Goldbaum

Last edited by Sergio Goldbaum; 11 Jun 2023, 13:10.

Comment

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1478
#7

12 Jun 2023, 00:16

I think something like this would suffice (this is very similar to Mike's code):

Code:

egen head_religion = min(cond(v0502 == 1, v6121, .)), by(v0300) egen partner_religion = min(cond(inlist(v0502, 2, 3), v6121, .)), by(v0300) egen byte tag = tag(v0300) tab head_religion partner_religion [iw = v0010m] if tag

Frequency weights can only be integers, so I have used the more generic "importance" weights here. You will be in the best position to decide how to weight the tabulation.
1 like
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35780

12 Jun 2023, 00:40

Your first 5 lines

Code:

 
 gen pos_fam = v0502  tostring pos_fam, replace  gen relig = v6121  tostring relig, replace  gen pos_fam_relig = pos_fam + "-" + relig

boil down to

Code:

egen pos_fam_relig = concat(v0502 v6121), p("-")

and some other simplifications are possible. But a direct attack would be preferable in my view. I revert to @Mike Lacy's data example from #3, rather than deal with your variable names.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(id family_id position) str1 religion
1 1 1 "A"
2 1 2 "A"
3 1 3 "A"
4 2 1 "A"
5 2 2 "B"
6 3 1 "A"
7 4 1 "B"
8 4 2 "A"
9 4 3 "A"
end

gen wanted = religion if position == 1 
bysort family_id (wanted) : replace wanted = wanted[_N] 
replace wanted = wanted + "-" + religion if position == 2 
bysort family_id (wanted) : replace wanted = wanted[_N]

split wanted, parse("-")
gen different = wanted2 != wanted1 if !missing(wanted1, wanted2)

list, sepby(family_id)

    +-----------------------------------------------------------------------------+
     | id   family~d   position   religion   wanted   wanted1   wanted2   differ~t |
     |-----------------------------------------------------------------------------|
  1. |  3          1          3          A      A-A         A         A          0 |
  2. |  1          1          1          A      A-A         A         A          0 |
  3. |  2          1          2          A      A-A         A         A          0 |
     |-----------------------------------------------------------------------------|
  4. |  4          2          1          A      A-B         A         B          1 |
  5. |  5          2          2          B      A-B         A         B          1 |
     |-----------------------------------------------------------------------------|
  6. |  6          3          1          A        A         A                    . |
     |-----------------------------------------------------------------------------|
  7. |  9          4          3          A      B-A         B         A          1 |
  8. |  7          4          1          B      B-A         B         A          1 |
  9. |  8          4          2          A      B-A         B         A          1 |
     +-----------------------------------------------------------------------------+

I guess you need more checks than are shown here.

Code:

egen count1 = total(position == 1), by(family_id)
egen count2 = total(position == 2), by(family_id) 
egen which = concat(count1 count2), p(" ")

so that values for which like "0 0" "1 0" "0 1" "1 1" "2 1" "1 2" should be of interest or concern.

I am not clear why

Code:

 
 (v0502==2 |v0502==3)

appears in #6. What have sons got to do with it?

Comment

Hemanshu Kumar

Join Date: Mar 2015

Posts: 1478
#9

12 Jun 2023, 00:46

Nick Cox

2 and 3 are for the different- and same-sex spouse respectively
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35780
#10

12 Jun 2023, 00:53

Hemanshu Kumar As said in #8, I am using the example from #3 which drew on the example and explanation in #1.

So thanks for flagging that Sergio Goldbaum changed the rules (in #6). I didn't spot that.
1 like
Comment

id	family_id	position (in family)	religion
1	1	head (1)	A
2	1	partner (2)	A
3	1	son (3)	A
4	2	head (1)	A
5	2	partner (2)	B
6	3	head (1)	A
7	4	head (1)	B
8	4	partner (2)	A
9	4	son (3)	A

Announcement