Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Return values of varlist that corresponds to other varlist if it meets condition

    Hi there,
    Hope you all are well.
    I am trying to generate a new variable that takes the age value of correspondent variable. For instance, if the disease code was found in variable A_2001_0_0 then the new variable will take the age value in n_2001_0_0. And if the code was in A_2001_0_4 then it takes the age value in n_2001_0_4 etc. The age values could be -1, >1, or missing(.). This will help as there are hundreds of variables like this. I am particularly interested in code 1044. So, wherever the code 1044 was first found, I want the age value in the same correspondent variable.

    The data looks like:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float(A_2001_0_0 A_2001_0_1 A_2001_0_2 A_2001_0_3 A_2001_0_4 n_2007_0_0 n_2007_0_1 n_2007_0_2 n_2007_0_3 n_2007_0_4)
    1044    .    . .    . 60.23  .  . .  .
    1044    . 1044 .    . 66.02  . 69 .  .
    1044 1003    . .    .  56.5 54  . .  .
       .    . 1044 .    .     .  . -1 .  .
    1044 1044    . . 1044    55 58  . . 60
    1015    .    . . 1044    45  .  . . 47
    end
    I have searched the forum but didn't find something quite similar. If there is problem like this, please refer me to it and accept my apology.

    Any suggestions?
    Last edited by Mohammad Aladwani; 08 Nov 2019, 16:50.

  • #2
    I am confused as to what you want.
    if the disease code was found in variable A_2001_0_0 then the new variable will take the age value in n_2001_0_0.
    But there is no variable n_2001_0_0. Al lthe n_* variables are n_2007_*.

    Also, what would you do in observations like 2, 3, 5, and 6 where the disease code is found in more than one A_* variable: which n_* variable would you then take the age from?

    Comment


    • #3
      Hi @Clyde Schechter, thanks for your response. Sorry let me clarify it.
      I meant n_2007_0_0 .
      I want to take the age wherever the code 1044 is FIRST found. For example, in obs 2 I want age value of n_2007_0_0 because 1044 is first found in A_2001_0_0 though it is found also in A_2001_0_2. So in this case age value of 66.02 NOT 69 that goes into the new variable in Obs 3, it should be 56.5 and in Obs 5 it should be 55 and Obs 6 is 47. Hope it is clear now. Please let me know if you need more explanation.

      Thanks a lot for your help!

      Comment


      • #4
        To give you a clear idea, I have done this:

        Code:
        gen age1= n_2007_0_0 if A_2001_0_0==1044
        gen age2 = n_2007_0_1 if A_2001_0_1==1044
        gen age3 = n_2007_0_2 if A_2001_0_2==1044
        gen age3 = n_2007_0_3 if A_2001_0_3==1044
        gen age4 = n_2007_0_3 if A_2001_0_3==1044
        gen age5 = n_2007_0_4 if A_2001_0_4==1044
        egen age0= rowmin (age1-age5)
        The values in variable age0 in above example is what I need actually, but it requires a lot of repetition as you can see!

        Thanks

        Comment


        • #5
          As Stata version 16 has been out for a while now, I'm going to act on the Forum's FAQ that says that you are presumed to be using the current version of Stata unless you say otherwise. The following code relies on frames, so it will not work in earlier versions of Stata.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear*
          input float(A_2001_0_0 A_2001_0_1 A_2001_0_2 A_2001_0_3 A_2001_0_4 n_2007_0_0 n_2007_0_1 n_2007_0_2 n_2007_0_3 n_2007_0_4)
          1044    .    . .    . 60.23  .  . .  .
          1044    . 1044 .    . 66.02  . 69 .  .
          1044 1003    . .    .  56.5 54  . .  .
             .    . 1044 .    .     .  . -1 .  .
          1044 1044    . . 1044    55 58  . . 60
          1015    .    . . 1044    45  .  . . 47
          end
          
          gen long obs_no = _n
          frame copy default original
          reshape long A_2001_ n_2007_, i(obs_no) j(_j) string
          drop if missing(A_2001_)
          by obs_no (_j), sort: keep if _n == 1
          
          browse
          rename n_2007_ age0
          keep obs_no age0
          
          frame change original
          frlink 1:1 obs_no, frame(default)
          frget age, from(default)
          The data set in memory in frame original has what you want.

          If you are running an earlier version of Stata the same overall logic can be achieved using a tempfile instead of frames. The key idea is to capitalize on the parallelism in the suffixes between the A_2001_ and n_2007_ variables. Once the data is in long layout, it is easy to drop the observations where A_2001 is missing and then just retain the first of those that remain.

          Comment


          • #6
            Thanks @Clyde Schechter! your code is very close to get what I want, however, in obs 6 it has 45 where it should be 47. Also, I am using STATA 15.1 so when replaced frame with tempfile it worked but when I reached to
            Code:
              
             frame change original frlink 1:1 obs_no, frame(default) frget age, from(default)
            It says command frlink is unrecognized and I think is the same for "frget".

            Thanks again for you effort!

            Comment


            • #7
              OK, I didn't notice in #1 that you said you were only interested in code 1044. The code I wrote in #5 finds the first value of age that corresponds to any code (code 1015 in the case of observation 6. It's easy to change that:

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear*
              input float(A_2001_0_0 A_2001_0_1 A_2001_0_2 A_2001_0_3 A_2001_0_4 n_2007_0_0 n_2007_0_1 n_2007_0_2 n_2007_0_3 n_2007_0_4)
              1044    .    . .    . 60.23  .  . .  .
              1044    . 1044 .    . 66.02  . 69 .  .
              1044 1003    . .    .  56.5 54  . .  .
                 .    . 1044 .    .     .  . -1 .  .
              1044 1044    . . 1044    55 58  . . 60
              1015    .    . . 1044    45  .  . . 47
              end
              
              gen long obs_no = _n
              frame copy default original
              reshape long A_2001_ n_2007_, i(obs_no) j(_j) string
              drop if A_2001_ != 1044
              by obs_no (_j), sort: keep if _n == 1
              
              browse
              rename n_2007_ age0
              keep obs_no age0
              
              frame change original
              frlink 1:1 obs_no, frame(default)
              frget age, from(default)
              The one changed line of code is shown in italics.

              Now, here's how you can do it in version 15.1 with a tempfile:

              Code:
              gen long obs_no = _n
              tempfile copy
              save `copy'
              
              reshape long A_2001_ n_2007_, i(obs_no) j(_j) string
              drop if A_2001_ != 1044
              by obs_no (_j), sort: keep if _n == 1
              
              browse
              rename n_2007_ age0
              keep obs_no age0
              
              merge 1:1 obs_no using `copy'

              Comment


              • #8
                Thanks @Clyde Schechter. It works in the sample data but does it work in large data consisting of 502,245 obs?
                Because when I entered reshape the STATA is a bit not responding and it takes ages now ?!

                Comment


                • #9
                  I guess a loop could help.
                  Code:
                  gen Age = .
                  
                  forval i = 4(-1)0 {
                  replace Age = n_2007_0_`i' if A_2001_0_`i' == 1044
                  }

                  Comment


                  • #10
                    Thanks Romalpa for your support. I was happy when I saw your code but it missed some values therefore it doesn't cover everything. See example below.

                    Code:
                    * Example generated by -dataex-. To install: ssc install dataex
                    clear
                    input long(n_20001_0_0 n_20001_2_0 n_20001_2_1) double(n_20007_0_0 n_20007_2_0 n_20007_2_1) byte has1044 float Age
                    1044 1044    .           60.2338    61.5       . 1 60.2338
                    1044    .    .           66.0245       .       . 1 66.0245
                    1044    .    .           58.1941       .       . 1 58.1941
                    1044    .    .           52.7867       .       . 1 52.7867
                    1044    .    .           61.3701       .       . 1 61.3701
                    1044    .    .           60.8307       .       . 1 60.8307
                    1044    .    .           57.2933       .       . 1 57.2933
                       . 1044    .                 . 70.2383       . 1       .
                    1044    .    .           51.9352       .       . 1 51.9352
                       . 1044    .                 .   65.57       . 1       .
                    1044    .    .           62.5337       .       . 1 62.5337
                    1044    .    .           64.4064       .       . 1 64.4064
                    1044    .    .           55.7355       .       . 1 55.7355
                    1044    .    .           59.0319       .       . 1 59.0319
                    1044    .    .           60.4748       .       . 1 60.4748
                    1044    .    .           53.8846       .       . 1 53.8846
                    1044    .    .            62.238       .       . 1  62.238
                    1044    .    .           65.4276       .       . 1 65.4276
                    1044    .    .           61.0114       .       . 1 61.0114
                    1044    .    . 64.93210000000002       .       . 1 64.9321
                    1044    .    .           52.1378       .       . 1 52.1378
                    1044    .    .           55.7628       .       . 1 55.7628
                    1044    .    .           61.6329       .       . 1 61.6329
                    1044    .    .              64.5       .       . 1    64.5
                       . 1073 1044                 . 69.4551 67.4537 1       .
                       . 1044    .                 .  61.559       . 1       .
                    1044    .    .           60.3516       .       . 1 60.3516
                    1044    .    .           60.7513       .       . 1 60.7513
                    1044 1044    .           65.6084 65.6084       . 1 65.6084
                    end
                    See obs 8 it return missing value where it should be 70.23, similar to obs 10 and obs 26. Also, in obs 25, it should return 67.45 not missing or 69.45.


                    Thanks again!

                    I am still waiting

                    Comment


                    • #11
                      Clearly, the output of the code in #9, which serves for example in #1, could not be applied for your (other) example in #10.

                      The logic of the code should be working, but you need to customize it with your actual variable names/patterns.

                      Then, what are your actual variable names?

                      Comment


                      • #12
                        I used the code for the actual data set, of course with changing the variable names and it worked except in above example in #10 which is by the the way the result from a sample of the real data.

                        Comment


                        • #13
                          Try this:
                          Code:
                          ds n_20001*
                          local n20001s `r(varlist)'
                          
                          gen Age = .
                          
                          foreach v of varlist `n20001s' {
                              local pairing: subinstr local v "20001" "20007"
                              confirm numeric var `pairing'
                              replace Age = `pairing' if `v' == 1044 & missing(Age)
                          }

                          Comment


                          • #14
                            Also try below code:
                            Code:
                            gen Age = .
                            
                            forval i = 2(-1)0 {
                                forval j = 4(-1)0 {
                                capture replace Age = n_20007_`i'_`j' if n_20001_`i'_`j'  == 1044
                                }
                            }
                            Notice:

                            1. You should modify the numbers 2, 4 (in red) with the actual numbers.

                            2. With your actual data, the output of this code and Clyde's in #13 may be different. While Clyde's code concentrates on the appearance order of variables n_20001*, mine follows the sequential order of those variable names. Thus, you should be clear about what you do need in case those orders differing from each other.

                            Comment


                            • #15
                              Originally posted by Clyde Schechter View Post
                              Try this:
                              Code:
                              ds n_20001*
                              local n20001s `r(varlist)'
                              
                              gen Age = .
                              
                              foreach v of varlist `n20001s' {
                              local pairing: subinstr local v "20001" "20007"
                              confirm numeric var `pairing'
                              replace Age = `pairing' if `v' == 1044 & missing(Age)
                              }
                              Excellent. This is amazing. Thank you so much Clyde. Can I add one condition to that please. After identifying age values in that specific order, can I take the lowest age variable in case there are more than one?


                              I really appreciate your support.

                              Comment

                              Working...
                              X