Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating an indicator variable using the cond() function

    I would like to categorise a variable into seven categories based on specific definitions. The categories I want to create have multiple conditions, though when I -tabulate- 'fam', Stata gives me "error r111 "variable fam not found". I've attached the output of -tabulate- 'religb' - the variable of interest and as you can see, use both the respondent and partner variable in the categories
    Code:
    label define fam 1 ...
    foreach v in religb p_religb { 
        gen `v'2 = cond(religb==p_religb & inrange(`v', 2000, 2999), 1,  ///
                   cond(religb==p_religb & !inrange(`v', 2000, 2999) & !(inrange(`v', 7000, 9000)), 2,  ///
                   cond(religb!=p_religb & inrange(`v', 2000, 2999), 3,  ///
                   cond(religb!=p_religb & !inrange(`v', 2000, 2999) & !(inrange(`v', 7000, 9000)), 4,  ///
                   cond(p_religb==7000 & inrange(`v', 2000, 2999), 5,  ///
                   cond(p_religb==7000 & !inrange(`v', 2000, 2999) & !(inrange(`v', 7000, 9000)), 6,  ///
                   cond(religb==7000 & p_religb==7000 , 7, .)))))))
        label val `v'2 fam 
    }
    ]Can this work using the cond() or should I use -generate, replace- Help very much appreciated.
    Click image for larger version

Name:	tab_religb.png
Views:	1
Size:	35.0 KB
ID:	1540945

  • #2
    The code looks alright to me, but tab expects a variable, not a label, so fam cannot be tabulated. You can tab religb2 and p_religb2 instead.

    Comment


    • #3
      Thank you Wouter Wakker. I should have clarified the intent of my code - my apologies.

      Lines 1 - 4: seeks to match pairs. e.g. in 1, I want religb2 & p_religb2 to have the same value from within the specified range. While some of the values are as expected, such as the [1] [1] pair, others are not. e.g. there should not be values for pairs p_religb2[3] & religb2[4] and p_religb[4] & religb2[3]. Also the value for pair [2] [2] should be below 20,000, but it is showing at above 130,000. I must be coding this incorrectly or could there be missings?

      Lines 5 - 6: religb2 is equal to p_religb2 and within (in 5), or not within the given ranges (in 6), but Stata provides no output for lines 5 or 6. Line 7: wants religb2 & p_religb2 to both == 7000. I appreciate any help offered.
      Click image for larger version

Name:	tab_religb2.png
Views:	1
Size:	12.9 KB
ID:	1541112

      Comment


      • #4
        Ok, let's take this line for line:
        Code:
        gen `v'2 = cond(religb==p_religb & inrange(`v', 2000, 2999), 1,  /// cond(religb==p_religb & !inrange(`v', 2000, 2999) & !(inrange(`v', 7000, 9000)), 2,  ///
        No problem here, partners have same religion and are 1, Christian or 2, non-Christian. However, you appear to have a closing parenthesis too much. I'm surprised this even executes and doesn't give you an unbalanced parenthesis error.
        Code:
        cond(religb!=p_religb & inrange(`v', 2000, 2999), 3,  /// cond(religb!=p_religb & !inrange(`v', 2000, 2999) & !(inrange(`v', 7000, 9000)), 4,  ///
        Partners have a different religion, but here is where it goes wrong. You say that occurences of a partner with 3 and the other with 4 cannot happen, but this does happen when one is Christian and the other is not.
        Code:
        cond(p_religb==7000 & inrange(`v', 2000, 2999), 5,  /// cond(p_religb==7000 & !inrange(`v', 2000, 2999) & !(inrange(`v', 7000, 9000)), 6,  ///
        Here you have hardcoded p_religb and not religb, which means that your two variables are defined differently. For p_religb2 the conditions do not make much sense (p_religb==7000 & !inrange(p_religb, 7000, 9000 for example). My guess is that you do have variables coded as 5 or 6 but that they are missing for the other variable. You can check with
        tab, missing.
        Code:
        cond(religb==7000 & p_religb==7000 , 7, .)))))))
        This one is alright. My advice would be to start defining one wanted variable first with the right conditions, to make sure that all conditions are okay, because it is a bit messy at the moment. Also, please note that the recommended way to share output is to directly copy it from Stata and put it in between code delimiters. Screenshots are often hard to read (see also the FAQ).

        Comment


        • #5
          Thanks for your reply Wouter Wakker. I appreciate help with the coding issues noted in #3. I attached the output as a .png file as I understood this was the graph/table requirement in FAQ. Due to data security, I provided an amended version.

          I am adapting code learnt from help provided in a thread by Nick Cox (thanks Nick), as well as Stata-related help files, though I have not yet been able to adapt it to creating this indicator variable. I tried using -min()- which I thought deals with .missings, but error message advised "invalid syntax".
          Code:
          foreach v in religb p_religb {
              gen `v'2 = cond(religb==p_religb & inrange(`v', min(2000, 2999)), 1,  ///
                         cond(religb==p_religb & !inrange(`v', min(2000, 2999)) & !inrange(`v', min(7000, 9900)), 2,  ///
                         cond(religb!=p_religb & inrange(`v', min(2000, 2999)), 3,  ///
                         cond(religb!=p_religb & !inrange(`v', min(2000, 2999)) & !inrange(`v', min(7000, 9900)), 4,  ///
                         cond(religb!=p_religb & p_religb==7000 & inlist(religb, 2000, 2999), 5,  ///
                         cond(religb!=p_religb & p_religb==7000 & !inlist(religb, 2000, 2999) & !inlist(religb, 7000, 9000), 6,  ///
                         cond(religb==p_religb & inrange(`v', min(7000, 7050)), 7, .)))))))
           }
          [edit] If I remove min(), the code runs, and includes output for 5 & 6 though noted errors in #3 remain.

          As noted in #2, line 2 should provide a result of under 20,000 (given the total is just above 40,000 observations). Also each pair is designed to provide output for same or different pairings, While missing from #3, lines 5 & 6 should provide pairs of a Christian/non-Christian & 'no religion' output. Line 7 asks for 'no religion' pairings. All helpful guidance appreciated, regards, Chris
          Last edited by Chris Boulis; 17 Mar 2020, 00:07.

          Comment


          • #6
            Code:
            inrange(`v, min(2000, 2999))
            gives the function two arguments, but it needs three. It is clear that min(2000, 2999) reduces to 2000 in any case, regardless of any data.

            I've not been following the thread but

            Code:
            inrange(`v', 2000, 2999)
            is legal and may be what you want. An integer range always excludes missings. Otherwise I can't follow why you are using min() here at all.

            Comment


            • #7
              Hi Nick Cox. Thank you for clarifying. I've removed min() from my code. If you have a moment, can you kindly shed some light on why I'm having the issues noted in #3 and #5.
              Code:
              foreach v in religb p_religb { 
                  gen `v'2 = cond(religb==p_religb & inrange(`v', 2000, 2999), 1,  ///
                             cond(religb==p_religb & !inrange(`v', 2000, 2999) & !inrange(`v', 7000, 9900), 2,  ///
                             cond(religb!=p_religb & inrange(`v', 2000, 2999), 3,  ///
                             cond(religb!=p_religb & !inrange(`v', 2000, 2999) & !inrange(`v', 7000, 9900), 4,  ///
                             cond(religb!=p_religb & p_religb==7000 & inlist(religb, 2000, 2999), 5,  ///
                             cond(religb!=p_religb & p_religb==7000 & !inlist(religb, 2000, 2999) & !inlist(religb, 7000, 9000), 6,  ///
                             cond(religb==p_religb & inrange(`v', 7000, 7050), 7, .)))))))
                  label val `v'2 couples 
              }
              I copied the tabulate of the variable of interest (religb) from which I'm creating this indicator variable below (also in #1 as .png):
              Code:
              tab religb
              DV: [SCQ] Religion - broad            
              [1000] Buddhism   
              [2000] Christian,
              [2010] Anglican 
              [2030] Baptist    
              [2050] Brethren  
              [2070] Catholic   
              [2110] Churches of Christ    
              [2130] Jehovahs Witnesses 
              [2150] Latter Day Saints (Mormons)    
              [2170] Lutheran 
              [2210] Oriental Christian  
              [2230] Orthodox   
              [2233] Greek Orthodox
              [2250] Presbyterian/Reformed   
              [2270] Salvation Army    
              [2310] Seventh-day Adventist  
              [2330] Uniting Church   
              [2400] Pentecostal 
              [2800] Other Protestant   
              [2900] Other Christian   
              [3000] Hinduism    
              [4000] Islam  
              [5000] Judaism 
              [6000] Other religion  
              [7000] No religion   
              [7200] Secular Beliefs 
              [7300] Other Spiritual Beliefs    
              [8000] Non-Christian, nfi   
              [9000] Multiple religions, Christian
              Total    41,031

              Comment


              • #8
                I don't know except that it seems puzzling that you use inlist() for some selections and inrange() for others. They certainly are not the same function. There is no 2999 listed so


                Code:
                 
                  inlist(religb, 2000, 2999)
                is just an equivalent of religb == 2000.

                Comment


                • #9
                  Hi Nick Cox. Thanks for you reply. I've obviously confused the features of inlist() with that of inrange() in lines 5 & 6). I thought that inrange(2000,2999) meant all values between these two - yes 2999 should be 2900, but as I thought it was a range it wouldn't matter. In reading over Stata tip 39: In a list or out? In a range or out? again, I now see that inlist is a list not a range as inrange(). So inrange(2000, 2900) means all values between and including these two. If correct, then lines 1 & 3 of my code should give me the results I'm after. But I'm not sure about lines 2, 4 +. So your help/guidance here is very much appreciated.

                  Comment


                  • #10
                    Hi all. I have fixed the issue I was encountering in line 2 and in lines 5 & 6 - and now have a more realistic value for [2] and output for [5] & [6], which is good news. That said, while I'm closer, there are still a few bugs in my code, so help weeding these out is very much appreciated.
                    Code:
                    foreach v in religb p_religb { 
                        gen `v'2 = cond(religb==p_religb & inrange(`v', 2000, 2900), 1,  ///
                                   cond(religb==p_religb & inrange(`v', 1000, 1050) | inrange(`v', 3000, 6000), 2, ///
                                   cond(religb!=p_religb & inrange(`v', 2000, 2900), 3, ///
                                   cond(religb!=p_religb & inrange(`v', 1000, 1050) | inrange(`v', 3000, 6000), 4, /// 
                                   cond(religb!=p_religb & p_religb==7000 & inrange(religb, 2000, 2900), 5, ///
                                   cond(religb!=p_religb & p_religb==7000 & inrange(`v', 1000, 1050) | inrange(`v', 3000, 6000), 6, /// 
                                   cond(religb==p_religb & inrange(`v', 7000, 7050), 7, .)))))))
                    }

                    Comment


                    • #11
                      Code:
                       
                        cond(religb!=p_religb & p_religb==7000 & inrange(religb, 2000, 2900), 5, ///
                      is going to put the same values in both of your new variables. Why doesn't that chunk refer to `v' just like the others?

                      Comment


                      • #12
                        Thanks for your reply Nick Cox. I did this because `v' relates to both religb and p_religb and in 5 & 6, I want each person in a couple to be from a different range. e.g. in 5, I want one person == 7000 and the other inrange(`v', 2000, 2900). Line 6 is similar but has a different range. I now see I can probably remove the first condition in 5 & 6 (& 7) but am unsure how to code 5 & 6 so they fit in the loop.

                        In 7, I want both == 7000 - can we include a single value in an inrange? If not, can we add another value (that doesn't exist) to enable the inrange to work and return 7000? I did this in 7 (there is no 7050) and in 2 & 4 as there is no 1050, but needed 1000 in the range. As always, appreciate your help/guidance.


                        Comment


                        • #13
                          inrange() can work testing for a single value, which just has to be repeated. This is something you can check for yourself:


                          Code:
                          . display inrange(2000, 7000, 7000)
                          0
                          
                          . display inrange(7000, 7000, 7000)
                          1
                          It would seem a lot simpler just to check for equality directly, just as you do elsewhere in the code.


                          Code:
                           
                           if `v' == 7000

                          Comment


                          • #14
                            Hi Nick Cox. Thanks for the help re repeating single values in inrange() and being able to use -if- (although Stata noted error 111 in line 7) . While I got this result, I don't get your point (do you mind clarifying please). I still not clear how to specify separate values for religb & p_religb in 5 & 6? This is where I'm at:
                            Code:
                            foreach v in religb p_religb {
                                gen `v'2 = cond(religb==p_religb & inrange(`v', 2000, 2900), 1, ///
                                           cond(religb==p_religb & inrange(`v', 1000, 1000) | inrange(`v', 3000, 6000), 2, ///
                                           cond(religb!=p_religb & inrange(`v', 2000, 2900), 3, ///
                                           cond(religb!=p_religb & inrange(`v', 1000, 1000) | inrange(`v', 3000, 6000), 4, ///  
                                           cond(p_religb==7000 & inrange(religb, 2000, 2900), 5, ///
                                           cond(p_religb==7000 & inrange(religb, 1000, 1000) | inrange(religb, 3000, 6000), 6, ///
                                           cond(religb==p_religb if `v' == 7000, 7, .)))))))
                            }

                            Comment


                            • #15
                              My point is only that

                              Code:
                              `v' == 1000
                              is a simpler alternative to

                              Code:
                              inrange(`v', 1000, 1000)
                              My reference was to another use of testing for equality in your code; the if qualifier does indeed make no sense and is illegal inside a function call.

                              As the code is equivalent, that can't account for the difficulties which you are still perceiving. Here is a fake data example with what your code from #10 produces. That may help you remove what you think are bugs.

                              Code:
                              clear
                              input int religb
                              1000
                              2000
                              3000
                              7000
                              end
                              
                              clonevar p_religb = religb
                              fillin religb p_religb 
                              drop _fillin
                              
                              sort religb p_religb 
                              
                              foreach v in religb p_religb { 
                                  gen `v'2 = cond(religb==p_religb & inrange(`v', 2000, 2900), 1,  ///
                                             cond(religb==p_religb & inrange(`v', 1000, 1050) | inrange(`v', 3000, 6000), 2, ///
                                             cond(religb!=p_religb & inrange(`v', 2000, 2900), 3, ///
                                             cond(religb!=p_religb & inrange(`v', 1000, 1050) | inrange(`v', 3000, 6000), 4, /// 
                                             cond(religb!=p_religb & p_religb==7000 & inrange(religb, 2000, 2900), 5, ///
                                             cond(religb!=p_religb & p_religb==7000 & inrange(`v', 1000, 1050) | inrange(`v', 3000, 6000), 6, /// 
                                             cond(religb==p_religb & inrange(`v', 7000, 7050), 7, .)))))))
                              }
                              
                              list, sepby(religb) 
                              
                              
                              
                              
                                   +----------------------------------------+
                                   | religb   p_religb   religb2   p_reli~2 |
                                   |----------------------------------------|
                                1. |   1000       1000         2          2 |
                                2. |   1000       2000         4          3 |
                                3. |   1000       3000         4          2 |
                                4. |   1000       7000         4          . |
                                   |----------------------------------------|
                                5. |   2000       1000         3          4 |
                                6. |   2000       2000         1          1 |
                                7. |   2000       3000         3          2 |
                                8. |   2000       7000         3          5 |
                                   |----------------------------------------|
                                9. |   3000       1000         2          4 |
                               10. |   3000       2000         2          3 |
                               11. |   3000       3000         2          2 |
                               12. |   3000       7000         2          . |
                                   |----------------------------------------|
                               13. |   7000       1000         .          4 |
                               14. |   7000       2000         .          3 |
                               15. |   7000       3000         .          2 |
                               16. |   7000       7000         7          7 |
                                   +----------------------------------------+
                              
                              .

                              Comment

                              Working...
                              X