Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • sequence of variable

    I have a few cases of mid way consent refusal. "refused" dataset contains key, unique id and a variable name(the point from where respondent refused to conduct the survey). In my survey dataset, I want to convert all variables as missing post the variable in "refused" dataset for that unique id.

    Is there a way to work with the sequence of variables or any other way to do this?

  • #2
    You can -merge- the refused data set with your survey data set, so that the survey data set now contains the variable name (an add choice for the name of a variable which is actually a date, but, whatever). I presume your survey data set contains a date for each observation. Then you can loop over all of the variables in the survey data set (except, presumably id) and -replace- the value of the variable by missing value if the survey date > name.

    If you need more specific advice than this, you need to show example data from both the refused dataset and the survey data set. Be sure to use the -dataex- command if you do that. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Refused dataset:

      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str24 unique_id str41 key str19 misvariable
      "TS-20-047-019-013/010343" "uuid:919072ce-0c56-4911-96bd-8ba9a0da31a5" "vk_2016"            
      "TS-43-006-023-001/010285" "uuid:daa50fe7-fadd-4f08-acd7-4ea620a49ebb" "nrega_5"            
      "TS-43-006-023-001/010092" "uuid:458890e4-e1bd-45cc-bd77-ce59b5f35018" "nrega_5"            
      "TS-43-006-010-014/010136" "uuid:96495ac3-0d9b-4eea-a550-5791e447a48f" "wg_who_1"           
      "TS-43-002-003-003/010880" "uuid:1e07b367-8763-49c2-bbde-4aacbc862fd3" "prim_dist_2"        
      "TS-14-008-027-001/030227" "uuid:4aba6842-23cc-47bc-a0dd-402b0a166cb5" "wg_final_1"         
      "TS-20-037-018-015/010859" "uuid:cea0dd84-d751-4800-a8dd-1c6a3668530e" "ref_start_date_list"
      "TS-20-047-017-011/010177" "uuid:5bd5bfa1-5643-489e-9722-a7219cccf994" "prim_3"             
      "TS-20-053-014-009/010088" "uuid:72ac46e6-22ef-46c1-9058-179238b2274b" "prim_5"             
      "TS-20-054-014-015/010652" "uuid:b5b538c5-8d50-4849-8f23-f68be4960e8f" "wg_who_1"           
      "TS-26-012-005-004/010305" "uuid:27809803-429d-418e-ace0-d9aeca4c070b" ""                   
      "TS-35-006-014-018/010126" "uuid:f9491e68-839b-4b42-9749-339ec3a08c17" "sec_code_1"         
      "TS-35-006-014-018/010210" "uuid:9a0d3957-7091-4e71-a5b9-1ff7bf1015a3" "sec_ref_1"          
      "TS-35-007-012-016/010451" "uuid:fc599bf4-cd0a-4660-8df9-600a2f787550" "prim_total_3"       
      "TS-35-015-013-014/010711" "uuid:32d47d1f-5657-4f31-b451-bd4a62085456" "prim_enter_1"       
      "TS-35-017-033-001/040006" "uuid:2f67016f-e8e3-4258-9203-0d661283f8d5" "prim_loc_1"         
      end
      I do not think using date variable will help. Var "misvariable" has a different variable for each unique id. I want to convert all var post this var as missing in my survey dataset.

      Comment


      • #4
        I'm sorry, but I don't understand this data in isolation. I have trouble imagining what the "key" variable is. I'm guessing that misvariable represents the variable name in the survey data set of some item marking the point where the respondent withdrew consent to the survey? Is that right? I would still need to see an example of the survey data set to help you further, as I have no understanding of how that survey data is organized, and, in particular, I don't know how to operationalize "post this var." Please post back with example survey data.

        Comment


        • #5
          1. Combination of unique_id and key forms a unique identifier for refused data and survey data.
          2. Yes, misvariable represents the variable name in the survey data set of some item marking the point where the respondent withdrew consent to the survey
          3. subset of Survey Dataset: (actual dataset has about 500 variables)

          [CODE]
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input str24 unique_id byte(b1_speaking_with prim_total_1_2 prim_loc_2 prim_dist_2 prim_dist_km_2) float prim_dist_time_2 byte husb_opinion_2 str41 key
          "TS-14-066-051-001/020215" 1 . . . . . . "uuid:005b5339-fb80-4d93-b9fe-7313afbe00c0"
          "TS-43-006-023-001/010092" 1 . . . . . . "uuid:008b5a61-6d17-485e-b530-8d90acbb456f"
          "TS-35-014-009-021/010373" 1 . . . . . . "uuid:00b55b7d-d3e9-4ec3-995b-c8524d15a2b7"
          end
          label values b1_speaking_with b1_speaking_with
          label def b1_speaking_with 1 "Yes", modify
          label values prim_total_1_2 prim_total_1_2
          label def prim_total_1_2 1 "Yes", modify
          label def prim_total_1_2 2 "No", modify
          label values prim_loc_2 prim_loc_2
          label def prim_loc_2 1 "Currently living GP", modify
          label def prim_loc_2 2 "Other GP", modify
          label values prim_dist_2 prim_dist_2
          label def prim_dist_2 1 "Distance in km", modify
          label def prim_dist_2 2 "Distance in time", modify
          label values prim_dist_km_2 prim_dist_km_2
          label def prim_dist_km_2 1 "Distance in km", modify
          label def prim_dist_km_2 2 "Distance in time", modify
          label values prim_dist_time_2 prim_dist_time_2
          label def prim_dist_time_2 2 "Distance in time", modify
          label values husb_opinion_2 husb_opinion_2

          after merging refused and survey dataset using key and unique_id, if the misvariable in refused dataset has value "prim_dist_2", then I want to convert the following variables to be converted to missing var prim_dist_km_2 prim_dist_time_2 husb_opinion_2 ( for this row only, not the entire dataset.)

          Hope it clarifies

          Comment


          • #6
            OK, now I understand it. The example data for the survey set does not have any id's that match the example for the refused data. And most of the values of misvariable in the refused example name variables that do not occur in the survey example. Nevertheless, I tampered with your examples to create some matches so I could test this out.
            Code:
            use survey_data, clear
            ds unique_id key, not
            local survey_items `r(varlist)'
            local order = 1
            foreach s of local survey_items {
                local `s'_order `order'
                local ++order
            }
            
            merge 1:1 unique_id using refused_data, keep(master match) nogenerate keepusing(misvariable)
            gen misvariable_order = .
            levelsof misvariable, local(mvs) clean
            foreach m of local mvs {
                local m_order :list posof "`m'" in survey_items
                replace misvariable_order = `m_order' if misvariable == "`m'"
                replace misvariable_order = . if misvariable_order == 0
            }
            
            foreach v of varlist `survey_items' {
                replace `v' = . if ``v'_order' > misvariable_order & !missing(misvariable_order)
            }
            Note: As you can see, this code relies heavily on local macros. Consequently, do not attempt to run this code in pieces or line-by-line. It will only work correctly if run without interruptions in "one fell swoop."

            Also, because all of the survey variables in your example data are numeric, I have written the code to deal only with numeric variables. This code will not handle string variables in the survey data. If there are any, post back with a new survey example that contains some of those and I'll make modifications accordingly.

            Comment


            • #7
              Data does have string variables also.

              Posting subset of data with more variables. It does have varied versions of string var- str20, str45, str109, str166,..

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str24 unique_id int prim_inc_amt_2 str20 prim_wage_calc_2 byte(prim_total_1_2 prim_loc_2 prim_dist_2 prim_dist_km_2) float prim_dist_time_2 byte(husb_opinion_2 wg_who_2) str20 wg_who_oth_2 byte details_2 str41 key
              "TS-14-066-051-001/020215"     . ""                     . . .  .  . . . "" . "uuid:005b5339-fb80-4d93-b9fe-7313afbe00c0"
              "TS-43-006-023-001/010092"     . ""                     . . .  .  . . . "" . "uuid:008b5a61-6d17-485e-b530-8d90acbb456f"
              "TS-35-014-009-021/010373"     . ""                     . . .  .  . . . "" . "uuid:00b55b7d-d3e9-4ec3-995b-c8524d15a2b7"
              "TS-14-023-044-001/010226"   500 "1"                    1 1 .  .  . . . "" . "uuid:00c154ee-1202-4ede-8e8d-5fcaf17773ba"
              
              end
              label values prim_total_1_2 prim_total_1_2
              label def prim_total_1_2 1 "Yes", modify
              label def prim_total_1_2 2 "No", modify
              label values prim_loc_2 prim_loc_2
              label def prim_loc_2 1 "Currently living GP", modify
              label def prim_loc_2 2 "Other GP", modify
              label values prim_dist_2 prim_dist_2
              label def prim_dist_2 1 "Distance in km", modify
              label def prim_dist_2 2 "Distance in time", modify
              label values prim_dist_km_2 prim_dist_km_2
              label def prim_dist_km_2 1 "Distance in km", modify
              label def prim_dist_km_2 2 "Distance in time", modify
              label values prim_dist_time_2 prim_dist_time_2
              label def prim_dist_time_2 2 "Distance in time", modify
              label values husb_opinion_2 husb_opinion_2
              label values details_2 details_2
              label def details_2 4 "No response", modify
              Can you also please add comments to the code explaining the code?

              TIA

              Comment


              • #8
                OK, the modification required is pretty simple, just inserting a branch in the final loop to treat string and numeric variables differently.

                I've added comments to explain the overall purpose of each section of code. But if you are unfamiliar with some of the commands themselves, you should consult the help file or the PDF documentation that is part of your Stata installation for information about them.

                Code:
                use survey_data, clear
                ds unique_id key, not
                local survey_items `r(varlist)' // LIST OF SURVEY VARIABLE NAMES IN DATA-SET ORDER
                local order = 1
                foreach s of local survey_items { // SAVE NUMERICAL ORDER OF EACH VARIABLE IN A LOCAL MACRO
                    local `s'_order `order'
                    local ++order
                }
                
                merge 1:1 unique_id using refused_data, keep(master match) /// BRING IN THE misvariable VARIABLE
                    nogenerate keepusing(misvariable)
                    
                //    IN EACH OBSERVATION, CALCULATE THE ORDER OF THE misvariable VALUE IN THE SURVEY DATA
                gen misvariable_order = .
                levelsof misvariable, local(mvs) clean
                foreach m of local mvs {
                    local m_order :list posof "`m'" in survey_items // FIND MISVARIABLE IN LIST OF SURVEY VARIABLES
                    replace misvariable_order = `m_order' if misvariable == "`m'"
                    replace misvariable_order = . if misvariable_order == 0 //     IF MISVARIABLE VALUE NOT FOUND, MARK AS MISSING
                }
                
                foreach v of varlist `survey_items' {    //    LOOP OVER ALL THE SURVEY ITEMS
                    capture confirm numeric var `v', exact    // FIND OUT IF THE VARIABLE IS NUMERIC OR STRING
                    if c(rc) == 0 {    // NUMERIC
                        replace `v' = . if ``v'_order' > misvariable_order ///
                            & !missing(misvariable_order)    //    REPLACE AS MISSING IF VARIABLE COMES AFTER MISVARIABLE
                    }
                    else {    //    STRING
                        replace `v' = "" if ``v'_order' > misvariable_order ///
                            & !missing(misvariable_order)  //    REPLACE AS MISSING IF VARIABLE COMES AFTER MISVARIABLE
                    }
                }

                Comment


                • #9
                  Thank you, Clyde!

                  Comment

                  Working...
                  X