Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi Nick Cox. Thanks I can see there are problems. I think I have fixed line 7 based on your comments in #13, improved line 2 based on your comments in https://stackoverflow.com/questions/...range-function and applied this to lines 4 & 6.

    It is ok to condition on inequality (in 3 & 4) as I did on equality (in 1 & 2) and then condition values from the inrange()? If not, can you please provide clues as to how to go about this? Also, is there a way to use inrange(`v', ...) to call different conditions as in 5 & 6 or do I need to condition separately? If so, some notes on where am I going wrong would be appreciated?
    Code:
    foreach v in religb p_religb { 
        gen `v'2 = cond(religb == p_religb & inrange(`v', 2000, 2900), 1, ///
                   cond(religb == p_religb & inrange(`v', 1000, 6000) & !inrange(`v', 2000, 2900), 2, /// 
                   cond(religb != p_religb & inrange(`v', 2000, 2900), 3, ///
                   cond(religb != p_religb & inrange(`v', 1000, 6000) & !inrange(`v', 2000, 2900), 4, /// 
                   cond(p_religb == 7000 & inrange(religb, 2000, 2900), 5, ///
                   cond(p_religb == 7000 & inrange(religb, 1000, 6000) & !inrange(religb, 2000, 2900), 6, /// 
                   cond(`v' == 7000, 7, .))))))) 
    }

    Comment


    • #17
      You're ignoring my suggestion to work through a 4 x 4 toy dataset and say which answers are what you want and which are not. From experience with large and very large datasets and code that is problematic I can attest that simplifying input data like that is a much needed debugging trick.

      You also seem to be assuming that I can remember and understand the entire thread every time I pick it up, but I really can't. The gist of #16 is that the code doesn't still do quite what you want. so is it all right if you change it? I can't comment usefully on that level.

      Perhaps if you try a word definition of your intended 7 category variable it will be easier for someone to suggest code, although I can't be confident that many people are still watching.

      Assuming that this is for a thesis there is also a worry on your behalf that your categorisation is too complicated to be understood by your supervisor(s)/committee/examiners if it takes so much work to write the code.

      I will try once more.

      You seem to want to classify couples on their beliefs compared, but if so why two new variables and not one?

      At a minimum there seem to be two over-arching issues, each partner being in a particular category and whether partners are the same or different in their beliefs (or lack of them).

      Also you are working with a coarsened version of the original categories, which looks like this


      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input str24 code str32 explanation
      "2000"                     "* Christian"                  
      "1000/3000/4000/5000/6000" "* Other religions"            
      "[7000]"                   "No religion"                  
      "[7200]"                   "Secular Beliefs"              
      "[7300]"                   "Other Spiritual Beliefs"      
      "[8000]"                   "Non-Christian, nfi"           
      "[9000]"                   "Multiple religions, Christian"
      end
      Nothing in your code that I can see suggests that you care about [7200]-[9000], which is up to you. If so there are only 9 possibilities for cross-classification.

      Using those 7 categories, however, a toy dataset has all the possible pairings.

      Code:
       
      gen person = _n
      clonevar partner = person
      fillin person partner
      egen combine = concat(person partner)  , p(" ")
      
      gen same = person == partner
      
      . tab same
      
             same |      Freq.     Percent        Cum.
      ------------+-----------------------------------
                0 |         42       85.71       85.71
                1 |          7       14.29      100.00
      ------------+-----------------------------------
            Total |         49      100.00
      
      . list combine
      
           +---------+
           | combine |
           |---------|
        1. |     1 1 |
        2. |     1 2 |
        3. |     1 3 |
        4. |     1 4 |
        5. |     1 5 |
           |---------|
        6. |     1 6 |
        7. |     1 7 |
        8. |     2 1 |
        9. |     2 2 |
       10. |     2 3 |
           |---------|
       11. |     2 4 |
       12. |     2 5 |
       13. |     2 6 |
       14. |     2 7 |
       15. |     3 1 |
           |---------|
       16. |     3 2 |
       17. |     3 3 |
       18. |     3 4 |
       19. |     3 5 |
       20. |     3 6 |
           |---------|
       21. |     3 7 |
       22. |     4 1 |
       23. |     4 2 |
       24. |     4 3 |
       25. |     4 4 |
           |---------|
       26. |     4 5 |
       27. |     4 6 |
       28. |     4 7 |
       29. |     5 1 |
       30. |     5 2 |
           |---------|
       31. |     5 3 |
       32. |     5 4 |
       33. |     5 5 |
       34. |     5 6 |
       35. |     5 7 |
           |---------|
       36. |     6 1 |
       37. |     6 2 |
       38. |     6 3 |
       39. |     6 4 |
       40. |     6 5 |
           |---------|
       41. |     6 6 |
       42. |     6 7 |
       43. |     7 1 |
       44. |     7 2 |
       45. |     7 3 |
           |---------|
       46. |     7 4 |
       47. |     7 5 |
       48. |     7 6 |
       49. |     7 7 |
           +---------+
      I am trying to suggest technique, not what you should do to get what you want, because I don't understand that. But the entire theme of this thread seems to be that you are trying to code up something you can't explain simply, which has no obvious benefit. Keep it simple!
      .


      Comment


      • #18
        Hi Wouter Wakker. My apologies, I did not respond to all of your comments. In particular, you stated:
        Partners have a different religion, but here is where it goes wrong. You say that occurences of a partner with 3 and the other with 4 cannot happen, but this does happen when one is Christian and the other is not.
        I understand that such pairings can occur, but I am trying to find specific pairings. (1) is a pair of the same Christian religion - e.g. both == [2070]. (2) is a pair of the same non-Christian religion - e.g. both == [3000]. (3) is a pair of two different Christian religions - e.g. one [2070], 1 [2230] . (4) a pair of two different non-Christian religions - e.g. 1[3000], 1[1000]. (5) one no religion [7000], the other Christian. (6) one no religion [7000], the other non-Christian. (7) both no religion [7000].

        My guess is that you do have variables coded as 5 or 6 but that they are missing for the other variable. You can check with tab, missing.
        Yes you are correct. There are missings for 5 & 6.

        Hopefully this clears up what I'm trying to obtain from the various pairings.

        Comment


        • #19
          Hi Nick Cox. I'm hoping my explanation in #18 helps clarify that I want to code specific pairings of couples, based on mixes of religions (Christian [19 denominations 2000 - 2900], non-Christian [1,000, 3000, 4000, 5000, 6000] or no religion [7000]). As you stated, I exclude [7200-9000].
          You're ignoring my suggestion to work through a 4 x 4 toy dataset
          I certainly wasn't ignoring your suggestion, I just didn't get what you wanted. I think I do now.

          I would expect pairings of (1)(1), (2)(2), ... (7)(7). However, after looking at your output and my data, I notice I get these for 1, 3 & 7. It seems that the code in 5 & 6 appears to be working, but it's not displaying as (5)(5) pairings as I expected, instead it presents as (5)(3) - where (5) is [7000] (the first condition) & (3) is [2000-2900] (the second condition). The same applies to line 6, where instead of (6)(6) pairings, it presents as (5)(4) where (5) is [7000] (the first condition) & (4) is [1000, 3000-6000] (the second condition).

          Thank you and Wouter Wakker for all your help. I'll get there.

          Comment


          • #20
            As I commented earlier you are classifying couples, which seems like one wanted variable to me. From the helpful rules in #18

            (1) is a pair of the same Christian religion - e.g. both == [2070]. (2) is a pair of the same non-Christian religion - e.g. both == [3000]. (3) is a pair of two different Christian religions - e.g. one [2070], 1 [2230] . (4) a pair of two different non-Christian religions - e.g. 1[3000], 1[1000]. (5) one no religion [7000], the other Christian. (6) one no religion [7000], the other non-Christian. (7) both no religion [7000].
            I suggest the following. We can use the rule that if (1) A = B and (2) B is C then (3) A is automatically C too so the last of those three need not be spelled out.


            Code:
            gen wanted = 1 if religb == p_religb & inrange(religb, 2000, 2900)
            
            replace wanted = 2 if religb == p_religb & (religb == 1000  ! inrange(religb, 3000, 6000))
            
            replace wanted = 3 if religb != p_religb & inrange(religb, 2000, 2900) & inrange(p_religb, 2000, 2900)
            
            replace wanted = 4 if religb != p_religb & (religb == 1000  ! inrange(religb, 3000, 6000)) & (p_religb == 1000  ! inrange(p_religb, 3000, 6000)) 
            
            replace wanted = 5 if max(religb, p_religb) == 7000  & inrange(min(religb, p_religb), 2000, 2900)  
            
            replace wanted = 6 if max(religb, p_religb) == 7000  & (min(religb, p_religb) == 1000  | inrange(min(religb, p_religb), 3000, 6000)) 
            
            replace wanted = 7 if religb == p_religb & religb == 7000

            Comment


            • #21
              Hi Nick Cox. Thanks for your help. I understand the code (except for the use of max()/min() but get the gist and will read up on this). I don't follow your rule - does it relate to the new code or the code using the cond function?
              that if (1) A = B and (2) B is C then (3) A is automatically C too so the last of those three need not be spelled out.
              I think I could apply this to other problems now. Though there's one code rule I'm unsure about. What if religb != p_religb but where one was from inrange(2000-2900) and the other from (1000-6000 !inrange(2000, 2900)). I tried
              Code:
              replace fam = 5 if inrange(max(religb, p_religb), 2000, 2900) & inrange(min(religb, p_religb) == 1000 | inrange(3000, 6000))
              Stata output 'invalid syntax'.

              I actually started off using -gen, replace- but wanted to figure out how to achieve it using the cond function. I really appreciate your guidance and help with the code.

              [Edit] Just wanted to note that the tabulated frequencies using -gen, replace- looks identical to the cond fn code, which suggests it was mostly there.
              Last edited by Chris Boulis; 23 Mar 2020, 18:29.

              Comment


              • #22
                If I understand your code in #20, I think this may be a solution to my question in #21.
                Code:
                replace fam = 5 if inrange(max(religb, p_religb), 2000, 2900) & (min(religb, p_religb) == 1000 | inrange(min(religb, p_religb), 3000, 6000))

                Comment


                • #23
                  I am not using cond() or alluding to any code using cond(). I believe that #20 is a translation of your word rules is #18. I can't follow whether it's my code or yours that is "mostly there". If that refers to my code then I don't see any clue on what is wrong.

                  A, B, C: If two people have the same religion category and one is Christian then the other is too. If two people come from the same country and one is Australian then the other is too. So, code can be simplified: test (1) same religion and test (2) one of the couple is Christian makes test (3) the other of the couple is Christian redundant.

                  max() returns the maximum of its arguments and min() returns the minimum of its arguments. All you need to do is read the help for a function just as you would for a command. Or experiment:

                  Code:
                  . help max() 
                  
                  . help min() 
                  
                  . display max(2000, 7000)
                  7000
                  
                  . display min(2000, 7000)
                  2000

                  Code:
                   
                   replace wanted = 5 if max(religb, p_religb) == 7000  & inrange(min(religb, p_religb), 2000, 2900)
                  is code for one partner no religion and the other Christian. For that to be true the maximum must be 7000 and the minimum must be between 2000 and 2900. It does not matter which way round it is, that religb is one and p_religb the other, or vice versa.

                  However, this is quite wrong:

                  Code:
                   
                   inrange(max(religb, p_religb), 2000, 2900) & (min(religb, p_religb) == 1000 | inrange(min(religb, p_religb), 3000, 6000))
                  Translate into words: the maximum is between 2000 and 2900 and the minimum is 1000 --- that is possible.

                  OR the maximum is between 2000 and 2900 and the minimum is between 3000 and 6000 --- that is impossible.

                  Why is this

                  Code:
                  inrange(max(religb, p_religb), 2000, 2900) & inrange(min(religb, p_religb) == 1000 | inrange(3000, 6000))
                  wrong? The first part is right but the second part is wrong. inrange() needs three arguments, but you are giving it only one as at best

                  inrange(max(religb, p_religb), 2000, 2900) & inrange(min(religb, p_religb) == 1000 | inrange(3000, 6000))

                  is one argument -- one logical expression -- but it's broken also by the second inrange() having only one argument and the third inrange() embedded within it inrange(3000, 6000) being wrong also: only two arguments.

                  The category different religions and one Christian and the other any religion overlaps with your existing categories as different religions and both Christian, an existing category, is a subset. So, how to code it seems of no use to you.

                  I think I've done what I can here. Fluency with functions only comes with a lot of practice, reading the help repeatedly, trying out simple inputs.

                  Hope that helps!

                  Comment


                  • #24
                    Thank you for your reply Nick Cox. Yes it helps a lot thank you. Your code in #20 is very helpful and works well.
                    I can't follow whether it's my code or yours that is "mostly there". If that refers to my code then I don't see any clue on what is wrong.
                    To clarify, I was hoping to learn how your code in #20 could be applied to the code in #16, which uses the cond() function. The question as to whether the code being nearly there refers to that in #16.

                    So, code can be simplified: test (1) same religion and test (2) one of the couple is Christian makes test (3) the other of the couple is Christian redundant.
                    Thank you, I understand that now.

                    OR the maximum is between 2000 and 2900 and the minimum is between 3000 and 6000 --- that is impossible.
                    Thanks for the explanation on the rules for max() min().

                    Sorry for the confusion regarding #22 (which I labelled fam==5) , but I need to include one more rule where one is Christian (2000-2900) and the other, non-Christian (1000 | 3000-6000), could you kindly suggest an alternative way please?

                    Comment


                    • #25
                      Code:
                      (inrange(religb, 2000, 2900) | inrange(p_religb, 2000, 2900) ) & (inlist(religb, 1000, 3000, 4000, 5000, 6000) | inlist(p_religb, 1000, 3000, 4000, 5000, 6000))
                      If inlist() is new to you, see help inlist()

                      Comment


                      • #26
                        Thank you Nick Cox. Once I saw what you did, it hit me, yes inlist() - 'couldn't see the wood for the trees' but 'see clearly now'. It worked nicely (n>400; surprisingly more than I expected). I really appreciate your help and advice. I'll keep plodding away ... Take care.

                        Comment


                        • #27
                          Good. Thanks for the update!

                          Comment


                          • #28
                            Hi Nick Cox. I have a question about missing data and the code in #5 (and similar code based on generate, replace that I have used elsewhere in my research). After reading the very helpful article "Speaking Stata: Fun and fluency with functions", section 3.4 raises the issue of dealing with missings in data. Should I therefore include the following at the end of each line of code in #5 to ensure missing data for these variables are excluded?
                            Code:
                            if religb < . & p_religb < .
                            Last edited by Chris Boulis; 01 Sep 2020, 18:50.

                            Comment


                            • #29
                              No. You are selecting on being equal to one of a series of values or in a range of such values. The condition of being not missing is just redundant.

                              Comment


                              • #30
                                Hi Nick Cox. Ok, that makes sense. Thank you very much.

                                Comment

                                Working...
                                X