Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    -rangejoin- only allows one interval. If you wanted to add another variable for exact match, you could do it by adding that variable to the list in the -by()- option. But for this you need additional code. Before you do the random selection of one age-matched control, weed out any of the age-matches where the IQ falls outside the matching interval.

    Code:
    clear
    
    // CREATE TOY DATA
    set seed 1234
    set obs 100
    
    gen id = _n
    
    gen arm = mod(_n, 2)
    label define arm 0 "Control" 1 "Treatment"
    label values arm arm
    
    gen sex = runiform() < 0.5
    label define sex 0 "Female"    1    "Male"
    label values sex sex
    
    gen age = rgamma(8, 5)
    
    gen iq = rnormal(100, 15)
    
    //    SAVE CONTROLS IN A TEMPORARY FILE
    preserve
    keep if arm == "Control":arm
    tempfile controls
    save `controls'
    
    //    LOAD IN THE TREATMENT GROUP
    restore
    keep if arm == "Treatment":arm
    rangejoin age -1 1 using `controls', by(sex)
    keep if inrange(iq_U - iq, -10, 10)
    
    gen double shuffle = runiform()
    by id (shuffle), sort: keep if _n == 1
    drop shuffle
    Note: I matched to within 1 year on age because the way my toy data worked out there were very few matches within 0.5 years. Modify your code accordingly.

    Comment


    • #32
      When I take a look at the Data Editor, I seem to be matching my controls with other controls. My code is below. diagnosis=0 is control, and diagnosis=3 is case.

      preserve
      keep if diagnosis=="0":diagnosis
      tempfile controls
      save `controls'

      restore
      keep if diagnosis=="3":diagnosis
      rangejoin age -0.5 0.5 using `controls', by(sex)
      keep if inrange(iq_U - iq, -10, 10)

      gen double shuffle = runiform()
      by id_num (shuffle), sort: keep if _n == 1
      drop shuffle

      Comment


      • #33
        When run with the toy data example shown in #31, it is clear that arm is always treatment and arm_U is always control, so these are matches of treatments with controls. (So I would expect in your case that diagnosis will always be 3 and diagnosis_U will be 0,.

        I do notice one strange thing about your code. You have -keep if diagnosis == "0":diagnosis"- and -keep if diagnosis == "3":diagnosis-. While this might be correct, it would be unusual. This would only be correct if your diagnosis is a numeric variable with some strange values, and you have applied to it a value label whose labels include 0 and 3, but 0 and 3 are probably not themselves values of the variable diagnosis. So, is this the case? Does your variable diagnosis have a value label diagnosis applied to it? -des diagnosis- will tell you. If so, does that value label include 0 and 3 as labels (not as values)? -label list diagnosis- will tell you that.

        That said, even if that is wrong, I don't see how it would end up producing controls matched with other controls. Rather, you should end up with an empty data set altogether since the conditions -diagnosis == "0":diagnosis- and -diagnosis == "3":diagnosis- will never be met.


        I think you need to post an example of the data that is producing this problem. Please be sure to use the -dataex- command to do that: -ssc install dataex- to install it, -help dataex- for instructions on using it.

        Comment


        • #34
          diagnosis is being stored as a "byte" - in this case should I remove the quotations? Or would it be better to recode 0 as "control" and 3 as "case"?

          Comment


          • #35
            Shannon, "byte" is the storage type. The issue is whether there is a value label. Look at this:
            Code:
            . sysuse auto
            (1978 Automobile Data)
            
            . des foreign
            
                          storage   display    value
            variable name   type    format     label      variable label
            ----------------------------------------------------------------------------------------------------------------------------------------
            foreign         byte    %8.0g      origin     Car type
            Notice under the value label header, it says origin. That means that the variable origin has a value label named origin attached to it. The effects of this are seen here:
            Code:
            . label list origin
            origin:
                       0 Domestic
                       1 Foreign
            
            . tab foreign
            
               Car type |      Freq.     Percent        Cum.
            ------------+-----------------------------------
               Domestic |         52       70.27       70.27
                Foreign |         22       29.73      100.00
            ------------+-----------------------------------
                  Total |         74      100.00
            
            . tab foreign, nolabel
            
               Car type |      Freq.     Percent        Cum.
            ------------+-----------------------------------
                      0 |         52       70.27       70.27
                      1 |         22       29.73      100.00
            ------------+-----------------------------------
                  Total |         74      100.00
            So you see that the actual content of the variable foreign is numbers, 0, and 1. But the label tells Stata that (in most sistuations) when displaying a value of the variable foreign, map 0 to "Domestic" and 1 to "Foreign" and display the strings instead of the numbers.

            The point is that -keep if diagnosis == "3":diagnosis- is not the same as -keep if diagnosis == 3-, and the question in my mind is which of these is what you need. If diagnosis does not have a value label named diagnosis, and if "3" is not one of the labels in that value label, then you are keeping no observations when you write -keep if diagnosis == "3":diagnosis- because "3":diagnosis doesn't exist! So if diagnosis does not have a value label, then you want -keep if diagnosis == 3- (and analogously 0 for controls). If diagnosis does have a value label, but it isn't called diagnosis, or it is called diagnosis but it does not contain "3" as a label, then something is seriously mixed up and I would not speculate at this juncture what to do about it. If the last is the situation, you really need to post an example of the data to get further help.


            Comment


            • #36
              Thank you Clyde for bearing with me! As I mentioned, I am quite the novice when it comes to Stata and Stata language.

              There is no value label for diagnosis. So I removed the quotations:

              preserve
              keep if diagnosis==0
              tempfile controls
              save `controls'

              restore
              keep if diagnosis==3
              rangejoin age -0.5 0.5 using `controls', by(sex)
              keep if inrange(iq_U - iq, -10, 10)

              gen double shuffle = runiform()
              by id_num (shuffle), sort: keep if _n == 1
              drop shuffle

              This code worked, but not all of my cases are remaining - is this because there was no match for them? Does the -keep if inrange(iq_U - iq, -10, 10)- line delete those participants whose match do not have an IQ within that range?

              Comment


              • #37
                Yes, the missing cases are those for which no match meeting your criteria was available.

                And yes, that -keep if inrange...- line deletes those participants who have a match on sex and age, but then none of those matches also match on IQ.

                We were all new to Stata once. Learning to use Stata by working on a substantial project that involves complex things like matching is probably not the best initiation: a bit like drinking water from a fire hose. If you are not under pressure to rapidly complete this project you are working on, I recommend you take a break from it and invest time learning the basics. Stata has PDF manuals that come with your Stata installation. Click on the Help menu and select PDF Documentation. Go to the Getting Started [GS] volume and read that in its entirety.* Then go to User's Guide [U] and read that in its entirety. This will be a long read, and you will not remember all the details. But you will see the general approach to data management and analysis that Stata uses and it will acquaint you with the fundamental commands that are so often needed in the majority of projects. There are worked examples to help. Once you have completed this tour of Stata, you will probably grasp the general approach and in most situations will know which commands are probably going to be useful. You can then consult the help files or manual sections for those particular commands for the details. If you are under too much pressure to finish your current project quickly, then I suggest you do this immediately afterward. I promise you the time you invest in this will be repaid many fold in the long run.

                It is particularly important for you, as an experienced SAS user, to do this. You have probably already noticed that Stata and SAS work rather differently. Your prior skills in SAS will not be all that helpful in learning Stata; to the contrary, your SAS-honed instincts will sometimes prove to be completely wrong for Stata. (But your statistics knowledge will, of course, be helpful, and you are probably less "brain damaged" by your SAS experience than those who were weaned on Excel and are now facing Stata as their first real statistics package.)

                *Actually, GS has separate sections for the different operating systems Stata runs on. So only read those sections that apply to the operating system(s) you actually will be running Stata on. Stata works pretty much the same across operating systems, but there are some differences in the user interface. And occasionally there are bugs that only affect some platforms and not others.
                Last edited by Clyde Schechter; 07 Sep 2017, 15:04.

                Comment


                • #38
                  Thank you Clyde!

                  I will certainly read the manuals, as suggested. I truly appreciate it. I have noticed SAS and Stata are quite different - at least I no longer have to worry about spending an hour trying to figure out why my code is not working just to find out I am simply missing a semi-colon.

                  Comment


                  • #39
                    Is there any way to write into the code to select the next possible control if the one selected does not meet the criteria of IQ within a range of 10? My issue is that I am essentially loosing "cases" based on this code, and I know that there are more than enough controls to choose from.

                    Comment


                    • #40
                      Well, one approach would be to just flat-out widen the caliper for the IQ match. What happens if you make it 15 or 20? Perhaps that leaves you with a more ample supply of matches.

                      Another approach is to abandon having a specific range of acceptable IQs but to just say: among those who are age-sex matched as previously specified, I want to keep the best possible IQ match (and if there are ties for best match, pick one at random). That would look like this:

                      Code:
                      clear
                      
                      // CREATE TOY DATA
                      set seed 1234
                      set obs 100
                      
                      gen id = _n
                      
                      gen arm = mod(_n, 2)
                      label define arm 0 "Control" 1 "Treatment"
                      label values arm arm
                      
                      gen sex = runiform() < 0.5
                      label define sex 0 "Female"    1    "Male"
                      label values sex sex
                      
                      gen age = rgamma(8, 5)
                      
                      gen iq = rnormal(100, 15)
                      
                      //    SAVE CONTROLS IN A TEMPORARY FILE
                      preserve
                      keep if arm == "Control":arm
                      tempfile controls
                      save `controls'
                      
                      //    LOAD IN THE TREATMENT GROUP
                      restore
                      keep if arm == "Treatment":arm
                      rangejoin age -1 1 using `controls', by(sex)
                      
                      gen delta = abs(iq - iq_U)
                      gen double shuffle = runiform()
                      by id (delta shuffle), sort: keep if _n == 1
                      drop shuffle

                      Comment


                      • #41
                        Great, thanks again! Your assistance with this has been extremely helpful.

                        Comment


                        • #42
                          In order to match cases and controls 1:1 for age and sex, I used the code below. However it resulted in repeated controls being matched to cases. I therefore removed the duplicate control matches (bysort case_id_U: keep if _n==1). However, I was wondering whether the duplicates happened because there was no other control match that fitted my criteria or whether for each case, the rangejoin function simply select a match from the whole pool of controls, without taking into account previous matches? Just thinking in terms of how to keep the matched sample numbers higher. Any help will be greatly appreciated. Thank you!

                          use "case_control.dta", clear
                          preserve
                          keep if condition == 0
                          tempfile controls
                          save `controls'

                          restore
                          keep if condition == 1

                          rangejoin age -2 2 using `controls', by(sex)

                          set seed 58
                          gen double shuffle = runiform()
                          by case_id (shuffle), sort: keep if _n == 1
                          drop shuffle
                          Last edited by Diane Reckziegel; 20 Mar 2018, 16:34.

                          Comment


                          • #43
                            -rangejoin- is like join: it creates all possible matches satisfying the criteria specified. You followed that by randomly selecting one of the possible matches for each case. Neither -rangejoin- nor the random selection takes into account what is going on with any other case when applied to a given case. This will in general result in some controls ending up matched to more than one case.

                            Sometimes it also happens that only one control satisfies those conditions, but that is a separate matter.

                            Comment


                            • #44
                              Hello,
                              I am a new memeber and I need your hepl! I have 49 cases and I wont to select 49 controls and I used the following code:

                              preserve
                              keep if group==1
                              rename * *_control
                              rename age_control age
                              rename sex_control sex
                              tempfile controls
                              save `controls'
                              *
                              restore
                              keep if group==0
                              rename * *_case
                              rename age_case age
                              rename sex_case sex
                              rangejoin age -20 20 using `controls', by(sex)
                              set seed 1234
                              gen double shuffle = runiform()
                              by ID_case (shuffle), sort: keep if _n == 1
                              drop shuffle


                              The final obs are those in group==0. I think that the command of:
                              preserve
                              keep if group==1
                              rename * *_control
                              rename age_control age
                              rename sex_control sex
                              tempfile controls
                              save `controls'

                              does not work.

                              Thank you in advance,
                              MK

                              Comment


                              • #45
                                "does not work" is not helpful. If you want help, please post example data. Also explain in what sense the code "does not work?" What results are you getting? Show them, don't explain them, and, if it isn't obvious, say why they aren't what you want.

                                For showing data examples, please use the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                                When asking for help with code, always show example data. When showing example data, always use -dataex-.

                                Comment

                                Working...
                                X