Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • wide to long I think?

    I am attempting to creating a data set that collapses a large data set that has multiple entries for each person (identified as FellowID), where the unique observations are identified by ProcID. My ultimate goal is to analyze this data set looking at the volume of entries for each Fellow and assessing predictors that changed the volume (ie year AdminSate etc), so I need to create a single entry for each Fellow ID that includes the year when the volumes happened but an individual variable that summarizes the total volume per ProcDate. I tried considering rangestat (count) ProcID, by(FellowID) but I don't really have an interval and not sure if this will create this volume indicator for me.


    input int ProgramID str32 ProgramType str2 AdminState int(FellowID YEARBEGIN YEAREND ProcDate) str7 Role str13 Type str43 category str80 subcategory str101 subcat2 long ProcID float(region COVID)
    1020 "Bariatrics/Foregut" "PA" 1951 2016 2017 21001 "Primary" "Laparoscopic" "HPB - Biliary" "Gallbladder" "Cholecystectomy without cholangiogram" 864902 0 0
    1230 "Bariatrics" "TX" 2064 2016 2017 21001 "Primary" "Laparoscopic" "Bariatric" "Sleeve gastrectomy" "n/a" 864913 2 0
    1230 "Bariatrics" "TX" 2064 2016 2017 21001 "Primary" "Laparoscopic" "Bariatric" "Sleeve gastrectomy" "n/a" 864914 2 0
    1230 "Bariatrics" "TX" 2064 2016 2017 21001 "Primary" "Laparoscopic" "Bariatric" "Bariatric procedure, revisional" "Lap-Band Removal w/o Reconstruction" 864915 2 0
    1230 "Bariatrics" "TX" 2064 2016 2017 21001 "Primary" "Endoscopic" "GI Endoscopy" "Endoscopy, upper, diagnostic" "Intraoperative" 864916 2 0
    1115 "Advanced GI MIS" "OH" 2012 2016 2017 21003 "Primary" "Laparoscopic" "Abdominal Wall" "Hernia, inguinal" "Repair with mesh" 865334 1 0
    1115 "Advanced GI MIS" "OH" 2012 2016 2017 21003 "Teach" "Laparoscopic" "Jejunum/Ileum" "Small bowel feeding tube" "n/a" 865335 1 0
    1115 "Advanced GI MIS" "OH" 2012 2016 2017 21003 "Teach" "Laparoscopic" "Jejunum/Ileum" "Small bowel feeding tube" "n/a" 865338 1 0
    1115 "Advanced GI MIS" "OH" 2012 2016 2017 21003 "Teach" "Laparoscopic" "GI Endoscopy" "Endoscopy, upper, therapeutic" "With dilatation of stricture"


  • #2
    Welcome to the forum. It sounds like you are trying to reshape long to wide, since you want a single row for each FellowID. Actually, you almost certainly want to keep your data in long format, since many operations on this kind of data are easier with long data. That said, it's not really clear what you're trying to do. I'm having trouble parsing this (run on) sentence:

    My ultimate goal is to analyze this data set looking at the volume of entries for each Fellow and assessing predictors that changed the volume (ie year AdminSate etc), so I need to create a single entry for each Fellow ID that includes the year when the volumes happened but an individual variable that summarizes the total volume per ProcDate.
    Could you please say a little bit more about how your data is organized? For example, there are multiple entries of FellowID, right? Usually this happens when some other variable is clustered within FellowID, like years.

    Second, could you please say a little more about what you are trying to do? It sounds like you are trying to make a new column that provides some extra information about different FellowIDs, but I have no idea what "total volume" means in this context.

    Comment


    • #3
      Thanks for the post. I think you are correct, that I want to keep the data long. Let me clarify my goals as I do still have some outstanding questions. My goal was to create a volume variable (how many procedures had each fellowID performed) which I was ultimately successful in doing using the code below:

      // gen number of cases variable for each Fellow

      levelsof FellowID // display each invidivual fellow
      gen casevolume = . // create case volume counter
      quietly {
      foreach lev in `r(levels)' {

      di `lev'
      di _n _n as txt "FellowID = " as result `lev'
      count if FellowID == `lev'
      replace casevolume = `r(N)' if FellowID == `lev'
      }
      }

      Each FellowID has multiple ProcID to account for each different procedure that same fellow performed.

      Now in my long data set each entry has a volume variable ( it is the same value for each unique FellowID). My ultimate goals are as follows:
      1. evaluate predictors of increased or decreased case volume looking at region, time, COVID
      2. Did the types of cases (var category) change following COVID?

      Is it better to keep the data set long in this situation and simply use code like this to evaluate goal 1 (var aboveavg is a binary variable of fellows who had above or below average cases as determined by the mean of the data set):
      regress casevolume COVID
      regress casevolume i.region
      regress casevolume i.YEARBEGIN i.region
      logistic aboveavg COVID i.region
      regress casevolume COVID i.region

      I believe for goal 2 above it is necessitated that I keep the data long as each fellowID? Thoughts?

      Comment


      • #4
        Originally posted by Yewande Alimi View Post

        // gen number of cases variable for each Fellow

        levelsof FellowID // display each invidivual fellow
        gen casevolume = . // create case volume counter
        quietly {
        foreach lev in `r(levels)' {

        di `lev'
        di _n _n as txt "FellowID = " as result `lev'
        count if FellowID == `lev'
        replace casevolume = `r(N)' if FellowID == `lev'
        }
        }
        This is not a standard way to deal with these kinds of problems. You want to use the -bysort- prefix, and it is a one-liner.

        Code:
        * Example generated by -dataex-. To install: ssc install dataex
        clear
        input int ProgramID str32 ProgramType str2 AdminState int(FellowID YEARBEGIN YEAREND ProcDate) str7 Role str13 Type str43 category str80 subcategory str101 subcat2 long ProcID float(region COVID)
        1020 "Bariatrics/Foregut" "PA" 1951 2016 2017 21001 "Primary" "Laparoscopic" "HPB - Biliary"  "Gallbladder"                     "Cholecystectomy without cholangiogram" 864902 0 0
        1230 "Bariatrics"         "TX" 2064 2016 2017 21001 "Primary" "Laparoscopic" "Bariatric"      "Sleeve gastrectomy"              "n/a"                                   864913 2 0
        1230 "Bariatrics"         "TX" 2064 2016 2017 21001 "Primary" "Laparoscopic" "Bariatric"      "Sleeve gastrectomy"              "n/a"                                   864914 2 0
        1230 "Bariatrics"         "TX" 2064 2016 2017 21001 "Primary" "Laparoscopic" "Bariatric"      "Bariatric procedure, revisional" "Lap-Band Removal w/o Reconstruction"   864915 2 0
        1230 "Bariatrics"         "TX" 2064 2016 2017 21001 "Primary" "Endoscopic"   "GI Endoscopy"   "Endoscopy, upper, diagnostic"    "Intraoperative"                        864916 2 0
        1115 "Advanced GI MIS"    "OH" 2012 2016 2017 21003 "Primary" "Laparoscopic" "Abdominal Wall" "Hernia, inguinal"                "Repair with mesh"                      865334 1 0
        1115 "Advanced GI MIS"    "OH" 2012 2016 2017 21003 "Teach"   "Laparoscopic" "Jejunum/Ileum"  "Small bowel feeding tube"        "n/a"                                   865335 1 0
        1115 "Advanced GI MIS"    "OH" 2012 2016 2017 21003 "Teach"   "Laparoscopic" "Jejunum/Ileum"  "Small bowel feeding tube"        "n/a"                                   865338 1 0
        end
        
        bysort FellowID: gen wanted=_N
        See

        Code:
        help bysort
        and

        Code:
        help _variables
        You have multilevel data (repeated observations of the same individuals). If you are using linear regression, at least cluster the errors at the individual level as these observations are not independent. But yes, you will want to keep a long data layout for analysis.

        Comment


        • #5
          Thank you that was very helpful!

          Comment

          Working...
          X