Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • comparing two sets of columns in a data set to generate a new variable

    I am new to STATA and I need your help !! "experts only" lol


    So I have a set of columns DX1 to DX20 and another set of columns that is DXPOA 1 to 20

    I want to write a code that would first search for a variablex A, B ,C in columns DX1 to DX20 and if found would then check in the DXPOA1 to 20 in the cell that carries the same number in DX1 to 20 (eg, variable was found in DX5, then I want STATA to see if Cell in DXPOA 5 (notice same number) has "Y" , if so then give the new column that I generate 0 if it was anything but Y then give the new column the value of 1)

    Does that make any sense? LOL

  • #2
    Are these "columns" DX1 .. DX20 string variables or numeric variables?

    What should happen if the same value -- X in your example -- is found in DX5 and DX12 but Y is found in DXPOA5 but not in DXPOA12?

    The spelling is "Stata", please (FAQ Advice Section 18).

    Comment


    • #3
      Sorry about the spelling, Stata.
      Values in the DX1 to DX20 are codes for Disease diagnosis so it will be a 4 to 5 digit number that codes for a particular disease condition and it is reported only once in each column from DX1 to DX20 so having repeated numbers of the same values are not possible. I hope this could answer your question, cause I'm new to Stata and I can't really differentiate between string and numeric values.

      Comment


      • #4
        one other thing is that some of the cells in DX1 to DX20 have Character values. Like E1231 or V2342 ... something like that !! so does that make them a string value?

        Comment


        • #5
          Stata will tell you which variables are numeric and which string. Look at the results of

          Code:
          describe DX1-DX20
          If the storage type is something beginning with str then the variables are string; if not, they are numeric. You need to know which variables are which to do anything much in Stata.

          Comment


          • #6
            type of the variables DX1 to DX20 is "str15" and the same is for DXPOA1 to DXPOA20 !!

            Comment


            • #7
              so I want to look for a disease condition between DX1 to DX20,

              the thing is that each disease has multiple codes, which I'll have to search for in those columns and confirm that the cells In DXPOA with the same number doesnt contain "Y"

              Code:

              gen COM_ARRHYTHMIA=0
              foreach var of varlist DX1-DX20 {
              replace COM_ARRHYTHMIA= 1 if (`var'=="4260" |`var'=="42610" | `var'=="42611" | `var'=="42612" | `var'=="42613" | `var'=="4267" | `var'=="4269" | `var'=="4270" | `var'=="4271" | `var'=="4272" | `var'=="4273" | `var'=="4279" | `var'=="V450" | `var'=="V533")
              }

              but now I want to compare between
              1) the cell where Stata found one of those codes present
              2) and the equivalent cell in DXPOA1 to 20 tooking if it had Y or not ,

              outcome is if it had one of the codes and the DXPOA didn't have Y then a value of 1 should be assigned to my new variable column

              Comment


              • #8
                Suppose you are looking to see if "frog" occurs in one of DX1-DX20 and also "toad" in the corresponding DXPOA variable.

                Code:
                gen where = .
                gen count = 0          
                
                quietly forval j = 1/20 { 
                      replace where = `j' if DX`j' == "frog" & DXPOA`j' == "toad"
                      replace count = count + (DX`j' == "frog" & DXPOA`j' == "toad")
                }
                
                gen found = where < .
                The count variable is a check on your assumption that there is at most one match.

                Comment


                • #9
                  what if we are looking for more than one frog? lol

                  do I have to do this for each of the codes
                  (ie, (`var'=="4260" |`var'=="42610" | `var'=="42611" | `var'=="42612" | `var'=="42613" | `var'=="4267" | `var'=="4269" | `var'=="4270" | `var'=="4271" | `var'=="4272" | `var'=="4273" | `var'=="4279" | `var'=="V450" | `var'=="V533")"

                  or can I put
                  replace where = `j' if DX`j' == "4260" | "42610" | "42611" | "42612" | "42613" | "4267" | "4269" | "4270" | "4271" | "4272" | "4273" | "4279" | "V450" | "V533" & DXPOA`j' != "Y"

                  Comment


                  • #10
                    quietly forval j = 1/20 {
                    type mismatch
                    r(109);

                    Comment


                    • #11
                      DXPOA is str1 type ?? would that matter?

                      Comment


                      • #12
                        Posts are crossing. That's unfortunate.

                        The question keeps changing. That should be avoidable.

                        Which question are you asking? The only diagnosis from your last is what Stata is telling you, "type mismatch".

                        P.S., assuming that you are open to advice, I'd suggest that "LOL" doesn't fit the tone on Statalist. We have a sense of humour, but that doesn't seem included.

                        Comment


                        • #13
                          one other thing, not all of the cells in each raw are filled in both sets of columns !!

                          Comment


                          • #14
                            OK sorry about the LOL thing.

                            I have tried the code that you have given me. and this is what I got from Stata, type mismatch error.

                            Comment


                            • #15
                              gen CardArrythm1=0
                              replace CardArrythm1=1 if DX1=="427" & DXPOA1=="Y" | DX2=="427" & DXPOA2=="Y" | DX3=="427" & DXPOA3=="Y" | DX4=="427" & DXPOA4=="Y" | DX5=="427" & DXPOA5=="Y" | DX6=="427" & DXPOA6=="Y" | DX7=="427" & DXPOA7=="Y" | DX8=="427" & DXPOA8=="Y" | DX9=="427" & DXPOA9=="Y" | DX10=="427" & DXPOA10=="Y" | DX11=="427" & DXPOA11=="Y" | DX12=="427" & DXPOA12=="Y" | DX13=="427" & DXPOA13=="Y" | DX14=="427" & DXPOA14=="Y" | DX15=="427" & DXPOA15=="Y" | DX16=="427" & DXPOA16=="Y" | DX17=="427" & DXPOA17=="Y" | DX18=="427" & DXPOA18=="Y" | DX19=="427" & DXPOA19=="Y" | DX20=="427" & DXPOA20=="Y"


                              That's is another way to do it, to look for each disease code and then combining the variables ! This is a reliable method but it is going to take forever.

                              Comment

                              Working...
                              X