Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • collapse or Roger Newman's xcollapse with string variables

    Tried help (and the help for xcollapse) but can't get an answer that makes sense to me.
    I have three variables for country identifiers: country countrynum and contab. The first is str24, the second is integer, and the third is str3. I have the fist two in another data set but want to include the third by a merge.

    But when I type

    xcollapse country contab,by(countrynum) saving (gallupcontab,replace) I get:

    type mismatch

    and I get the same error message for:

    collapse country contab,by(countrynum)

    Any help would be appreciated.

  • #2
    With either of these commands, the syntax -collapse country contab, by(countrynum)- means that you want to calcluate the means of variables country and contab for each distinct value of countrynum. As country and contab are string variables, you cannot do that. Stata tells you this by pointing out the type mismatch.

    It is clear that you do not really want to do what your command(s) ask Stata to do. But it isn't clear what you actually want to do. I suggest you repost. When you do, use the -dataex- command to show a brief example of your data, and then show a hand-worked example of what you want the end result to look like.

    If you are running version 15.1 or a fully updated version 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      collapse by default calculates, or tries to calculate, means of the variables named before the options. Asking for the mean of a string variable triggers a type mismatch error. Here is a minimal demonstration:

      Code:
      . clear
      
      . input str1 id whatever
      
                  id   whatever
        1. "A" 1
        2. "B" 2
        3. end
      
      . collapse id whatever
      type mismatch
      r(109);
      String identifiers typically belong in the by() option.

      Roger Newson [not Newman]'s command will work similarly.

      Comment


      • #4
        I realize that I don't want the mean. But I have a very large number of cases (1.6 million) across 166 countries. I want an aggregate data set by country with contab (the country abbreviation) as one of the variables in the data set. What I want to do is to organize the data by contab, which is indeed a string variable. (No need to worry about the string variable country, instead I have a numerical value countrynum where each value is associated with a value label). Is it not possible.to do this?

        Comment


        • #5
          Same answer from me. Whatever defines your groups belongs in the by() option. Whatever you want to summarize statistically belongs in the varlist. There is no problem in giving different identifiers to by(). If they match 1:1 they define the same groups. If they don't you get cross-combinations.

          Code:
          clear 
          input str1 country countrynum whatever 
          "A" 1 42 
          "A" 1 666
          "B" 2 3.14159
          "B" 2 2.71828 
          end 
          
          label def countrynum 1 "A" 2 "B" 
          label val countrynum countrynum 
          
          collapse whatever, by(country countrynum) 
          
          list
          
               +-------------------------------+
               | country   countr~m   whatever |
               |-------------------------------|
            1. |       A          A        354 |
            2. |       B          B   2.929935 |
               +-------------------------------+




          Comment

          Working...
          X