Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #46
    Originally posted by Clyde Schechter View Post
    So, something like this:

    Code:
    use my_data, clear
    
    // SEPARATE CASES FROM CONTROLS
    // AND DISTINGUISH VARIABLE NAMES
    preserve
    keep if group == 2
    rename * *_control
    rename age_control age
    rename sex_control sex
    tempfile controls
    save `controls'
    
    restore
    keep if group == 1
    rename * *_case
    rename age_case age
    rename sex_case sex
    
    // NOW JOIN ON AGE AND SEX
    joinby age sex using `controls'
    
    // RANDOMLY SELECT ONE MATCH IF THERE ARE MORE
    set seed 1234 // OR WHATEVER RANDOM NUMBER SEED YOU LIKE
    gen double shuffle = runiform()
    by case_id (shuffle), sort: keep if _n == 1
    drop shuffle
    The above will provide exact matches on age and sex. Now, in most real world situations, you won't be able to get enough matches with exact age. So typically people set some window, maybe 5 years, and require that the match be at least that close, if not exact. The code would be largely the same:

    Code:
    use my_data, clear
    
    // SEPARATE CASES FROM CONTROLS
    // AND DISTINGUISH VARIABLE NAMES
    preserve
    keep if group == 2
    tempfile controls
    save `controls'
    
    restore
    keep if group == 1
    
    // NOW JOIN ON AGE AND SEX
    // ALLOW WINDOW FROM 5 YEARS BELOW TO 5 YEARS ABOVE
    rangejoin age -5 5 using `controls', by(sex)
    
    // RANDOMLY SELECT ONE MATCH IF THERE ARE MORE
    set seed 1234 // OR WHATEVER RANDOM NUMBER SEED YOU LIKE
    gen double shuffle = runiform()
    by case_id (shuffle), sort: keep if _n == 1
    drop shuffle
    Evidently, if you want a narrower or wider window, you can just change the -5 and 5 in the -rangejoin- command to whatever you like.

    Note that when using -rangejoin-, it is unnecessary to rename variables as -rangejoin- will do it for you automatically.

    To run the second version, you need to have the -rangejoin- command installed. It was written by Robert Picard and is available from SSC. -ssc install rangejoin-
    Hello Clyde,

    I find this code very helpful and I wondered if you could advise a little further please! I have used this code to select one control per case, however I get duplicate controls (ie. one control matches to multiple cases) - is there any way to stop this?

    Also, I would like to select 4 controls per case. How could I expand this code to do that please? Or would I need to use different code?

    Thanks!

    Kate

    Comment


    • #47
      I have used this code to select one control per case, however I get duplicate controls (ie. one control matches to multiple cases) - is there any way to stop this?
      Yes, there is a way to stop this. But why do you want to? There is no statistical reason to avoid re-using controls for different cases. In fact, by doing so, you increase the probability of some cases finding no match at all. So think about it. If you have a really compelling reason to do this, post back and I will show you the substantially more complicated code that is needed. If you choose to do this, please be sure also to post example data that I can customize the code for. Use -dataex- for that.

      Also, I would like to select 4 controls per case. How could I expand this code to do that please? Or would I need to use different code?
      Just change the penultimate line from
      Code:
      by case_id (shuffle), sort: keep if _n == 1
      to

      Code:
      by case_id (shuffle), sort: keep if _n <= 4

      Comment


      • #48
        Dear STATA experts,I am new to STATA. I am getting larger beta coefficient (Such as 93.6 66.3 for wealth quintiles) for my multiple linear regression analysis. My outcome variable is birth weight (continuous 400g-6500 g). Is that normal or do I need to use adjusted eman birth weights to avoid this problem..

        Many Thanks

        Comment


        • #49
          This post is completely unrelated to the topic of the thread. Please repost as a New Topic. Also, before doing that, please read the Forum FAQ for excellent advice about how to maximize your chance of getting a timely and helpful response. In particular, your post provides far too little information for anybody to give you a sensible answer. At a minimum you need to show the actual code you ran and the actual output you got from Stata. In addition, for your particular question, an example of your data would be helpful. Finally, for those who do not normally work in this domain, provide some explanation why you think that the coefficients you are getting are unreasonably large. (This is a multi-disciplinary forum; when posting you should never assume that others here are familiar with the subject matter of your research. The only common knowledge here is statistics, Stata, and whatever any college-educated person around the world could be assumed to know.)

          Comment


          • #50
            Very helpful threat. The code above works for 1:1 matching. How to tweak that code to work for 4:1 matching.
            Thanks.

            Comment


            • #51
              See #47, where that very question was answered.

              Comment


              • #52
                I have similar but slightly different question. I want to match cases to controls. We have two IDs. The first one is participant ID and the second one is case ID which is used to match controls to a case. Each case has got 3 controls. How do I create the matched set ID in order to run a conditional logistic regression?

                Comment


                • #53
                  You say you have a case ID that is used to match controls to a case. This sounds like a matched set ID to me, so it's hard to understand what the problem is, as this variable would simply be specified in the -group()- option of -clogit-. I presume I'm misunderstanding, so in order to clarify the problem, I'd suggest you post some sample data using -dataex-, as is described and recommended in the StataList FAQ.

                  Comment


                  • #54
                    I am doing matched case-control as well and thank you for your code. However, my problem right now is that the data are not in a long format. All of my cases and controls are in the same row. How can I used clogit command? I want to see which variables are the main predictor of my case using clogit. Do I need to reconstruct my data again? Thank you in advance.

                    Comment


                    • #55
                      You need to use the -reshape- command to convert your data from wide to long. Read -help reshape-. It is a somewhat difficult command for people to grasp at first. If you do not see how to apply it to your data, post back, using -dataex- to show example data.

                      If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                      Comment


                      • #56
                        Since I matched by age+-1 and sex. Each of my case (_ca) has 2 controls (_c and _c_U), I guess. But actually I only need 1 case. Moreover, I need it as long format with match_ID to be working for clogit command. What is best way to generate that? Thank you in advance.

                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input double(clinic_ca bmi_v1_ca calciumfromdairyservings_v1_ca clinic_c bmi_v1_c calciumfromdairyservings_v1_c clinic_c_U bmi_v1_c_U calcium_v1_c_U)
                         770179 19.8       . 1786684 21.1 . 1786684 21.1 2622.44246575342
                         935100 27.5       . 2640255 32.3 . 2894943 26.3 1040.66395547945
                        1227138 26.8       . 2873105 33.5 . 1839203 26.2 870.183133561644
                        1503561 23.5       . 2989521 33.6 . 3138082 29.3 994.717465753425
                        1598525 36.7  1.3589 2873105 33.5 . 1839203 26.2 870.183133561644
                        1620273 25.8 3.53973 2766647   35 . 2411038 28.4 1025.26746575342
                        1666901 32.3       . 7075806 28.8 . 3357238 21.7  1863.0676369863
                        1789203   27       . 1591829 40.8 . 3839462   24 1689.32808219178
                        1795922 38.4  .56986 3138082 29.3 . 1459900 41.1 762.363955479452
                        1802766 29.7       . 1591829 40.8 . 7075806 28.8         461.1875
                        end

                        Comment


                        • #57
                          This will set you up to use -clogit-:
                          Code:
                          gen long match_id = _n
                          reshape long clinic  bmi_v1  calciumfromdairyservings_v1 , i(match_id) j(cc) string
                          gen byte case_status = cc == "_ca"
                          The variable case_status will be the outcome variable, and match_id will be the -group()- variable.

                          Comment


                          • #58
                            Hello!

                            I found this post while searching for tips on age/sex matching and I found it extremely useful!

                            However I was trying to match without replacement (to randomly select one match from both the cases and the controls). I used the code suggested in post #2 for randomly select one match from the cases, and then I tried to replicate the same code to select one match from the controls - as follows:


                            Code:
                            set seed 1234
                            gen double shuffle = runiform()
                            by id_case (shuffle), sort: keep if _n == 1
                            drop shuffle
                            
                            set seed 1234
                            gen double shuffle = runiform()
                            by id_control (shuffle), sort: keep if _n == 1
                            drop shuffle
                            Could this code be correct?

                            Thanks in advance

                            Nicoletta
                            Last edited by Nicoletta Riva; 11 Jun 2020, 19:01.

                            Comment


                            • #59
                              Hello all,
                              I have 24000 observations where there are about 1000 cases and about 23000 controls. I want to match 1 case with 1 control based on industry (Ind) and similar size (20% up or down in terms of market capitalization). I have variables such as Ind, Clean (where 1=Case and 0=control), MarketCap (in millions of dollars), and a host of other variables which I want to compare between case and Control group. I have STATA 15 version. What would be the best way to do this? Any assistance is welcome!.

                              Comment


                              • #60
                                Raj:
                                I'm not an expert with this kind of stuff, so take what follows as a temptative reply.
                                My gut-feeling is that you have to match one case with more than one controls (see Example 3 under -teffects psmatch- entry in Stata .pdf manual).
                                Kind regards,
                                Carlo
                                (Stata 18.0 SE)

                                Comment

                                Working...
                                X