Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Splitting variables and filling options

    I have exported data from a survey software which has resulted in data being stored in a strange manner.particularly for multiple choice questions, instead of creating one variable for each option, it has stored all options in one variables separated by "space". Here's an example

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str4 c26_1_1 str1(c26_1_1_1 c26_1_2_1 c26_1_3_1 c26_1_4_1 c26_1_5_1 c26_1_6_1 c26_1_99_1)
    ""     ""  ""  ""  ""  ""  ""  "" 
    "4"    ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    "2"    ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    "1"    ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    "1 99" ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    "4"    "0" "0" "0" "1" "0" "0" "0"
    ""     ""  ""  ""  ""  ""  ""  "" 
    "4"    "0" "0" "0" "1" "0" "0" "0"
    ""     ""  ""  ""  ""  ""  ""  "" 
    ""     ""  ""  ""  ""  ""  ""  "" 
    end
    Here c26_1_1 is, for example, "main sources of irrigation", and the subsequent columns correspond to each of the options.- they should be 1, if the option was selected in c26_1_1 and 0 otherwise. For a subset of the sample, the columns have been filled as they should-- (see last few rows of example), for the others the c26_1_1 option is stored as, for example, "1 4 99" when option 1,4 and 99 were selected, and subsequently c26_1_1_1, c26_1_1_4 and c26_1_1_99 are blank when they should be 1. similarly, if only option 4 was selected, c26_1_1 should be 4 and c26_1_1_4 should be 1.

    Right now I am splitting the variables c26_1_1 and storing it into a temp variable and then using those temp variables to fill in the options (1/0)- but I want to to find a more efficient way to do this, since I have many such multiple choice variables in my dataset that have been stored in a weird manner.

    Alternatively, if there is a way to split the variables and create new columns for each of the options that is binary if the option is selected, that would be great as well. That is, instead of pre-specifying c_26_1_x where x=1,2,3,4,99, if c26_1_1 could be split and automatically create a new column for each of the options..

    any suggestions?






  • #2
    Before you attempt to fix a botched import, maybe better to figure out what went wrong with importing the data. Good chance that will be less laborious.
    Please tell us what file format you imported form and what command you used.

    Comment


    • #3
      I imported from excel. Unfortunately this is how the data appears in excel as well- the survey instrument was changed mid-way so the first few weeks of data has come in this manner. Its an issue with how the survey was coded fundamentally and not an issue with importing.

      Comment


      • #4
        Hello Fatima,

        I think Jorrit was trying to find out if the file was .csv, xls, or xlsx. Please specify the file format and do provide the code that you used to import as this could give us insight on what may have gone wrong.

        Best wishes,
        Patrick

        Comment


        • #5
          I am importing from a CSV file. Here's the code

          Code:
          clear all
          set more off
          set mem 100m
          
          * initialize form-specific parameters
          local csvfile "C:/MAIN_WIDE.csv"
          local dtafile "C:/MAIN.dta"
          local text_fields2 "c4_1_dec_cvt_* c4_1_hect_cvt_* c4_1_acres_* c10_1_other_* c12_1_other_* c26_1_* c26_1_other_* c41_* c41_other_* c38_1_* c38_1_other_* rb_1_* rb_repeat_count_* rb_name_* total_area inputs inputs_count"
          local date_fields1 "today"
          local datetime_fields1 "submissiondate starttime endtime"
          
          disp
          disp "Starting import of: `csvfile'"
          disp
          
          * import data from primary .csv file
          insheet using "`csvfile'", names clear
          
          * drop extra table-list columns
          cap drop reserved_name_for_field_*
          cap drop generated_table_list_lab*
          
          * continue only if there's at least one row of data to import
          if _N>0 {
              * drop note fields (since they don't contain any real data)
              forvalues i = 1/100 {
                  if "`note_fields`i''" ~= "" {
                      drop `note_fields`i''
                  }
              }
              
              * format date and date/time fields
              forvalues i = 1/100 {
                  if "`datetime_fields`i''" ~= "" {
                      foreach dtvarlist in `datetime_fields`i'' {
                          cap unab dtvarlist : `dtvarlist'
                          if _rc==0 {
                              foreach dtvar in `dtvarlist' {
                                  tempvar tempdtvar
                                  rename `dtvar' `tempdtvar'
                                  gen double `dtvar'=.
                                  cap replace `dtvar'=clock(`tempdtvar',"DMYhms",2025)
                                  * automatically try without seconds, just in case
                                  cap replace `dtvar'=clock(`tempdtvar',"DMYhm",2025) if `dtvar'==. & `tempdtvar'~=""
                                  format %tc `dtvar'
                                  drop `tempdtvar'
                              }
                          }
                      }
                  }
                  if "`date_fields`i''" ~= "" {
                      foreach dtvarlist in `date_fields`i'' {
                          cap unab dtvarlist : `dtvarlist'
                          if _rc==0 {
                              foreach dtvar in `dtvarlist' {
                                  tempvar tempdtvar
                                  rename `dtvar' `tempdtvar'
                                  gen double `dtvar'=.
                                  cap replace `dtvar'=date(`tempdtvar',"DMY",2025)
                                  format %td `dtvar'
                                  drop `tempdtvar'
                              }
                          }
                      }
                  }
              }
          
              * ensure that text fields are always imported as strings (with "" for missing values)
              * (note that we treat "calculate" fields as text; you can destring later if you wish)
              tempvar ismissingvar
              quietly: gen `ismissingvar'=.
              forvalues i = 1/100 {
                  if "`text_fields`i''" ~= "" {
                      foreach svarlist in `text_fields`i'' {
                          cap unab svarlist : `svarlist'
                          if _rc==0 {
                              foreach stringvar in `svarlist' {
                                  quietly: replace `ismissingvar'=.
                                  quietly: cap replace `ismissingvar'=1 if `stringvar'==.
                                  cap tostring `stringvar', format(%100.0g) replace
                                  cap replace `stringvar'="" if `ismissingvar'==1
                              }
                          }
                      }
                  }
              }
              quietly: drop `ismissingvar'

          Comment

          Working...
          X