Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • split a string variable

    Hello, I´ve got a multiple response variable from a survey:
    1 "Miradas lascivas (degeneradas)"
    2 "Silbidos y otros sonidos (besos, jadeos, bocinazos)"
    3 "Acoso verbal (aluciones al cuerpo y de tipo sexual)"
    4 "Arcamiento intimidante (tocar cintura, hablar al oido,etc)"
    5 "Agarrones (de senos, vulva, trasero, pene, besos a la fuerza)"
    6 "Sentimiento de presion"
    7 "Persecución (a pie o en medio de transporte)"
    8 "Exhibicionismo"
    9 "Violación"
    10 "Nunca he sido acosada/o"
    11 "Otro"
    This variable has a multiple response, so there are some respondent who can choose:
    1, 2 and 3; or just 1; or 1, 2, 5, 6, 7; or 1, 11 and so on.
    My data is in excel and what I want is to split every response in order to create a frequency chart to find how many respondent answer 1; how many 2; and so on
    I hope yo can guide me with this
    Kind regards

  • #2
    Your data is in Excel. You will, at some point, need to bring it into Stata, and there are various ways you might do that. I can think of at least 6 different ways your Stata data might be, all consistent with your explanation. Each would require a different solution. So first, get your data imported to Stata. Then show an example, using the -dataex- command. If you are running version 15.1 or a fully updated version 14.2, it is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Thank you very much for your answer Clyde.
      I already bring my data into stata. In my case my variable name is: Eformas_acoso and, for example, ths first observation is reading as follows (it is a string variable)
      Miradas lascivas (degeneradas), Silvidos y otros sonidos (besos, jadeos, bocinazos), Acoso suave ("halagos"), Acoso agresivo (alusiones al cuerpo y acto sexual), Acercamiento intimidante (tocar cintura, hablar al oido,etc), "Agarrones" (de senos, vulva, trasero, pene, besos a la fuerza), "Sentimientos de presion"( Presión de genitales sobre tu cuerpo), Persecución (a pie o en medio de transporte)
      So I need to spleat each observation in order to get one variable for each posible response to find the frequency of each response.
      Regards

      Comment


      • #4
        Also, using datex I´ve got this
        input str406 Eformas_acoso
        data width (410 chars) exceeds max linesize. Try specifying fewer variables

        Comment


        • #5
          OK, I can work with the description in this case. The major obstacle is splitting up this variable into the individual responses. As the responses are separated by commas (,), this would be straightforward, except that the responses also contain internal commas. So first we have to remove the internal commas--which is possible because we know the words that precede them. After that, it's a matter of -reshape long- to get all the responses into a single "vertical" variable and tabulate the response frequencies.

          Code:
          local precomma besos jadeos cintura senos vulva trasero pene 
          
          foreach p of local precomma {
              replace Eformas_acoso = subinstr(Eformas_Acoso, "`p'," "`p'", .)
          }
          split Eformas_acoso, gen(resp) parse(",")
          gen long obs_no = _n
          reshape long resp, i(obs_no) j(_j)
          tab resp
          Note: This code substantially changes the original data in ways that may prove cumbersome for other things you need to do. So you might want to precede this with -preserve- and then -restore- at the end.

          Code is not tested, so there may be typos.

          Comment


          • #6
            Thank you very much Clyde, it worked perfectly.
            However, my tabulation gives me the following:

            PHP Code:
             tab resp

                                               resp 
            |      Freq.     Percent        Cum.
            ----------------------------------------+-----------------------------------
             
            "Agarrones" (de senos vulva trasero .. |        289       10.89       10.89
             
            "Punteos"Presión de genitales sobr.. |        225        8.48       19.37
             Acercamiento intimidante 
            (tocar cint.. |        314       11.83       31.20
             Acoso agresivo 
            (alusiones al cuerpo .. |          7        0.26       31.46
                            Acoso suave 
            ("halagos") |          8        0.30       31.76
             Acoso verbal 
            aluciones al cuerpo y.. |        368       13.87       45.63
                      Exhibicionismo o masturbación 
            |        156        5.88       51.51
             Persecución 
            (a pie o en medio de tra.. |        276       10.40       61.91
             Silvidos y otros sonidos 
            (besos jade.. |        420       15.83       77.73
                                          Violación 
            |         26        0.98       78.71
                                               otro 
            |         27        1.02       79.73
            "Agarrones" (de senos vulva trasero p.. |         10        0.38       80.11
            "Punteos"Presión de genitales sobre.. |          1        0.04       80.14
            Acercamiento intimidante 
            (tocar cintu.. |          7        0.26       80.41
            Acoso verbal 
            aluciones al cuerpo y .. |         11        0.41       80.82
                      Exhibicionismo o masturbación 
            |          4        0.15       80.97
                     Miradas lascivas 
            (degeneradas) |        432       16.28       97.25
            Persecución 
            (a pie o en medio de tran.. |         10        0.38       97.63
            Silvidos y otros sonidos 
            (besos jadeo.. |         46        1.73       99.36
                                          Violación 
            |          2        0.08       99.43
                            nunca he sido acosada
            /|         14        0.53       99.96
                                               otro 
            |          1        0.04      100.00
            ----------------------------------------+-----------------------------------
                                              
            Total |      2,654      100.00

            .
            end of do-file 
            In summary, it is like answers have been break it into two parts. As you can see, after "otro" some of them are repeting.
            I can not see what is happening in this case, I really appreciate any comments
            Regards
            Last edited by Nicolas Rodriguez; 19 Nov 2018, 06:41.

            Comment


            • #7
              Well, it is hard to tell from the -tab- output because it does not show the full strings. But what I think is happening is that in some instances, there are different versions of the response that look the same to our eyes but are in fact different character strings. Looking carefully at the output, I notice that the results shown near the top of the table all begin with a blank space, whereas those at the bottom do not. So I think my code failed to consider that after splitting on the commas, responses that were listed first would not have a blank, but those that were listed later in the list would have a blank following the comma.

              The way to fix this, if I have the right diagnosis, is, between the -reshape- and -tab commands insert this command:

              Code:
              replace resp = trim(itrim(resp))

              Comment


              • #8
                Thank you ver much Clyde. It perfectly worked.
                Kind Regards

                Comment

                Working...
                X