Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Importing excel sheet dropdown choices and assign value labels

    Hello all:

    I have an excel sheet for a collection form that has dropdowns which accesses a separate sheet with all the dropdown choices for the variable using these dropdowns. I am trying to find the easiest way to write the dropdown choices within excel so I could assign numeric values to my value labels in Stata and define a label to use. See below for two variables each with their dropdown choices. I have a specific numeric assignment which I want for each of the value labels in line1ind for instance.

    Is there a better prespecified format I could use in excel to list values next to each label so when I import this sheet into Stata, I can define the label more easily for all variables that need to be encoded without doing it label by label (0 "7+3-based" for instance). I also don't want to make it unreadable for the end user filling the data in excel using the dropdown choices.

    See below the imported excel data and my final Stata label define that is quite manual.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str20 line1ind str19 line1resp
    "7+3-based"            "CR/CRi"             
    "HMA"                  "Partial Response"   
    "HMA-BCL2i"            "SD/Non-response"    
    "HiDAC-based"          "Progressive Disease"
    "CPX-351/Vyxeos"       "No data"            
    "Other, Intensive"     ""                   
    "Other, Low-intensive" ""                   
    "Supportive care only" ""                   
    "DLI"                  ""                   
    "AlloSCT"              ""                   
    "Untreated"            ""                   
    "Unknown"              ""                   
    end

    Code:
    lab define linelbl                 ///
        0    "7+3-based"                ///
        1    "Other, Intensive"         ///
        2    "HiDAC-based"              ///
        3    "HMA"                    ///
        4    "HMA-BCL2i"             ///
        5    "CPX-351/Vyxeos"         ///
        6    "Other, Low-intensive"     ///
        7    "AlloSCT"                 ///
        8    "DLI"                      ///
        9    "Supportive care only"     ///
        10    "Untreated"             ///
        11    "Unknown"

  • #2
    Hi Girish,

    in Stata there are many ways NOT to have to do something by hand ;-)
    If I got you right you want to automatically define value labels from you inputted excel file. If I may assume that the numerical values shall be in ascending order, then we can write this as a loop that imports all lines for each variable and defines values labels.
    Code:
    forvalues i = 1/`c(N)' {
        foreach var of varlist _all {
            local `var'_number `=`i'-1'
            local `var'_label = `var'[`i']
            if !missing("``var'_label'") label define vl_`var' ``var'_number' "``var'_label'", modify
        }
    }
    label dir
    If you want to start the numbers by 1 instead, just use
    Code:
            local `var'_number `i'
    instead.

    Best,
    Benno

    Comment


    • #3
      Originally posted by Benno Schoenberger View Post
      Hi Girish,

      in Stata there are many ways NOT to have to do something by hand ;-)
      If I got you right you want to automatically define value labels from you inputted excel file. If I may assume that the numerical values shall be in ascending order, then we can write this as a loop that imports all lines for each variable and defines values labels.
      Code:
      forvalues i = 1/`c(N)' {
      foreach var of varlist _all {
      local `var'_number `=`i'-1'
      local `var'_label = `var'[`i']
      if !missing("``var'_label'") label define vl_`var' ``var'_number' "``var'_label'", modify
      }
      }
      label dir
      If you want to start the numbers by 1 instead, just use
      Code:
      local `var'_number `i'
      instead.

      Best,
      Benno
      Thanks much Benno Schoenberger. That worked well. I just did a little finagling to account for the varying `c(N)' for each column with a little preserve/restore. I will reorder the rows for each label in excel now so I can match up my referent level for each label appropriately for factor variables. Five labels and a modified version of your code below in case someone wants to use it.


      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str20 line1ind str25 deathtype str5 vital str20 priormdstx str15 ecog
      "7+3-based"            "Dead-Transplant-related"   "Alive" "Yes, HMA-based"       "0: Super fit"   
      "HMA"                  "Dead-Leukemia-related"     "Dead"  "Yes, Other"           "1: Fit/Walks"   
      "HMA-BCL2i"            "Dead-Non TRM/Non-Leukemia" ""      "Yes, Supportive only" "2: Not fit"     
      "HiDAC-based"          "Dead-Unknown cause"        ""      "No therapy"           "3: Unfit"       
      "CPX-351/Vyxeos"       "Alive-Censored"            ""      "Not applicable"       "4: Really Unfit"
      "Other, Intensive"     ""                          ""      ""                     ""               
      "Other, Low-intensive" ""                          ""      ""                     ""               
      "Supportive care only" ""                          ""      ""                     ""               
      "DLI"                  ""                          ""      ""                     ""               
      "AlloSCT"              ""                          ""      ""                     ""               
      "Untreated"            ""                          ""      ""                     ""               
      "Unknown"              ""                          ""      ""                     ""               
      end


      Code:
      ds line1ind deathtype vital priormdstx ecog
      local lablist `r(varlist)'
          foreach var of varlist `lablist'{
              preserve
                  drop if missing(`var')
                  local `var'_tot = `c(N)'
              restore
                  forvalues i = 1/``var'_tot' {
                      local `var'_number `=`i'-1' 
                      local `var'_vallab `=`var'[`i']'
                      label define vl_`var' ``var'_number' "``var'_vallab'", modify
          }
      
      }
      label dir

      Comment


      • #4
        If your code works, then everything is fine. However, I'm not sure whether it actually makes that much of a difference whether you drop the empty observations beforehand or let the loops run over all the empty cells anyway.
        Last edited by Benno Schoenberger; 24 Apr 2024, 06:50.

        Comment


        • #5
          Originally posted by Benno Schoenberger View Post
          If your code works, then everything is fine. However, I'm not sure whether it actually makes that much of a difference whether you drop the empty observations beforehand or let the loops run over all the empty cells anyway.
          The `c(N)' from the first variable with 12 levels ended up getting used on the second variable with 7 unmapped values, hence I had to make sure the `c(N)' was appropriate for the number of values for each label. Thanks again.

          Comment


          • #6
            Here is a (looong) one-liner in Mata to do what you want:

            Code:
            mata : for (k=st_nvar(); k; k--) st_vlmodify(st_varname(k),(0::st_nobs()-1),st_sdata(.,st_varname(k)))

            Comment


            • #7
              Originally posted by Girish Venkataraman View Post

              The `c(N)' from the first variable with 12 levels ended up getting used on the second variable with 7 unmapped values, hence I had to make sure the `c(N)' was appropriate for the number of values for each label. Thanks again.
              Now that I've reapplied my original code to your new dataset, I've come to the conclusion that it works exactly as I wrote it. As long as all variable labels from your dropdowns are in a single Excel file and are also imported together, the missinges will be correctly ignored (corresponding code marked in red)!
              Code:
              forvalues i = 1/`c(N)' {
                  foreach var of varlist _all {
                      local `var'_number `=`i'-1'
                      local `var'_label = `var'[`i']
                      if !missing("``var'_label'") label define vl_`var' ``var'_number' "``var'_label'", modify
                  }
              }
              label list
              Maybe yours is faster in case you are inputting some dropdowns with a huge amount of labels, but mine is working as well ;-)
              But neither mine nor yours is as elegant as Daniels...

              Comment


              • #8
                Originally posted by daniel klein View Post
                Here is a (looong) one-liner in Mata to do what you want:

                Code:
                mata : for (k=st_nvar(); k; k--) st_vlmodify(st_varname(k),(0::st_nobs()-1),st_sdata(.,st_varname(k)))
                This will be an inspiration for me start learning and finding use cases for implementing it. With reluctance, I embarked on programs in the past year and love that I invested the time. Perhaps it will be Mata this year...

                Comment


                • #9
                  Originally posted by Benno Schoenberger View Post

                  Now that I've reapplied my original code to your new dataset, I've come to the conclusion that it works exactly as I wrote it. As long as all variable labels from your dropdowns are in a single Excel file and are also imported together, the missinges will be correctly ignored (corresponding code marked in red)!
                  Code:
                  forvalues i = 1/`c(N)' {
                  foreach var of varlist _all {
                  local `var'_number `=`i'-1'
                  local `var'_label = `var'[`i']
                   if !missing("``var'_label'") label define vl_`var' ``var'_number' "``var'_label'", modify
                  }
                  }
                  label list
                  Maybe yours is faster in case you are inputting some dropdowns with a huge amount of labels, but mine is working as well ;-)
                  But neither mine nor yours is as elegant as Daniels...
                  Good see where you put the !missing clause. I was putting it somewhere else and the code would not run. Hence I started modifying things.

                  Comment

                  Working...
                  X