Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with a loop command for string variables.

    Hi

    I am coding survey responses for analysis. I have one group of columns, which I need to encode, and then actually merge into one column. Namely, respondents could indicate a country as a destination, and the survey codes each country as a column. Q37_13 (Algeria) through Q37_207 (Zimbabwe). I need a destination variable. Right now, I have 194 columns of string variables, one per country
    I uploaded a screenshot for reference.

    I tried writing a loop command to either destring or encode, but I get the r(198) error each time.

    Here are various attempts:

    foreach var of Q37_* {
    encode 'var', gen(N_var)
    }
    invalid syntax
    r(198);


    foreach var of varlist Q37_* {
    destring 'var', force
    }
    ' invalid name
    r(198);

    . foreach var of Q37_13 - Q37_207 {
    2. encode 'var', gen('var'N)
    3. }
    invalid syntax
    r(198);


    Once I decode, I will have to create one variable out of the 194 columns, so if there is a different approach (like combining them first?) I am open to that too

    Thank you for your help!

    Natalie

    Click image for larger version

Name:	Stata Example.PNG
Views:	2
Size:	180.2 KB
ID:	1594004



    Attached Files

  • #2
    Produce some minimal example ( a couple of observations, a couple of countries), put it here using -dataex-, and explain how you want on this example the result to look like.

    Comment


    • #3
      Joro's request is right on target, as a) screenshots work badly on StataList; and b) an example is helpful for many reasons. Having said that, I have some questions. First, from your just barely readable screenshot, I see that some of your Q37* are strings and some are bytes. That's a bit unusual for variables that are supposed to code the same kinds of responses (?), so can you explain: If all the Q37* are supposed to be country codes, why are some strings and some numeric? I'd say that knowing *why* the data is strange in this respect might be necessary to solving the problem.

      Comment


      • #4
        P. S. : Note, by the way, that -encode- only applies to string variables, and -decode- only to numeric, so looping over the whole list with either of these commands will fail.

        Comment


        • #5
          Thank you Mike and Joro,
          I created a separate dataset to better explain, which just have the variables in question. I have country responses per Column. Most are String. Some are Bytes, I think, because no person chose that country/ there are no observations, and thus Stata coded that as Byte already. Those I can delete for the purpose of this analysis. I can do that manually if need be, unless there is a short cut for that?

          The data set attached hopefully explains my problem. I also created a table below to quickly visualize. Rows are responses. So any time a respondent chose Afghanistan, that is coded as a string in a column for Afghanistan. Same for every country. A respondent could pick one country.

          In the end, I would like to have one column, where each country has a numerical value attached to it, so that I can also code for regions (EU vs Africa etc) (which I included as Destination in the attached data set, or below am trying to visualize.)

          Is the problem the fact that some countries do not have observations/ are byte?

          Your help is much appreciated
          Coded as string Coded as string Byte right now/ no observations Coded long
          Afghanistan Afghanistan
          Belgium Belgium
          . This variable can be dropped (can do manually if need be)
          Attached Files

          Comment


          • #6
            Natalie, most of us here won't open files *attached* to postings on the list for reasons of computer safety. That's why the strong norm here is to use the convenient -dataex- command to show example data, as described in the StataList FAQ. So, unfortunately, I'm not going to use your example data to test my solution. However, now that you have explained the string/byte stuff, I can offer something for you to try, which depends on all the "byte" variables being of no interest, and on there only being one nonmissing destination variable on each observation.

            Code:
            // get a list of the string variables
            ds Q37*, has(type string)
            local vlist `r(varlist)'
            gen dstring = ""
            // keep the last non-empty string variable found
            foreach v of varlist `vlist' {
               replace dstring = `v' if (strtrim(`v') == "")
            }
            // make this a numeric variable with string contents as labels
            encode dstring, gen(destination)

            Comment


            • #7
              My proposal is

              Code:
              ds Q37*, has(type byte)
              
              drop `r(varlist)'
              
              egen str country = rowfirst(Q37*)
              Then if you want to encode it, you encode it as usual

              Code:
              encode country, gen(numcountry)

              Comment


              • #8
                Thank you so much everyone. Joro's code worked brilliantly

                Comment

                Working...
                X