Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • type mismatch and partial tabulation help

    I am analyzing a large-ish data set (around 500,000 observations) .
    I am looking to tabulate between two variables, Zip Code and Assigned Yard, with Zip Code being the independent variable.
    I want to know the frequency of each assigned yard per zip code (each zip code will only go to 1-3 yards). As expected tabulate Zip Yard says "too many values". I tried to make a much smaller version of hte data set, and tabulate Zip Yard worked fine. So then using the entire data set I'm looking to do a few zips at a time. For example, tabulate Zip Yard if Zip= 1001 or tabulate Zip Yard if Zip <= 1001. I'm getting type mismatch.

    I know what "type mismatch" means , but don't see how my data meets that criteria. Furthermore, if the data is th ewrong type, how would it have tabulated in the smaller data set?

    Thanks in advance for any help.

  • #2
    You should start with
    Code:
    describe
    and look at the characteristics of your Zip variable. If it is a string rather than numeric variable, you will want to compare it to a string constant, not a number, with syntax like
    Code:
    tabulate Zip Yard if Zip=="01001"
    or
    Code:
    tabulate Zip Yard if Zip <= "01001"
    To be explicit, I'm assuming your Zip variable contains a 5-digit United States Postal Service ZIP Code. Also, note the use of == to compare for equality, rather than = as used i your example.

    Now, having said that, you might find that the collapse command will allow you to summarize your dataset to give you the sort of information you are looking for, without doing it in batches - and as a side benefit, to have the results as a Stata dataset in memory, so you can investigate them using Stata commands, rather than reading 35,000 lines of tabulations, or searching those tabulations with a text editor.

    Comment


    • #3
      Originally posted by William Lisowski View Post
      You should start with
      Code:
      describe
      and look at the characteristics of your Zip variable. If it is a string rather than numeric variable, you will want to compare it to a string constant, not a number, with syntax like
      Code:
      tabulate Zip Yard if Zip=="01001"
      or
      Code:
      tabulate Zip Yard if Zip <= "01001"
      To be explicit, I'm assuming your Zip variable contains a 5-digit United States Postal Service ZIP Code. Also, note the use of == to compare for equality, rather than = as used i your example.

      Now, having said that, you might find that the collapse command will allow you to summarize your dataset to give you the sort of information you are looking for, without doing it in batches - and as a side benefit, to have the results as a Stata dataset in memory, so you can investigate them using Stata commands, rather than reading 35,000 lines of tabulations, or searching those tabulations with a text editor.

      Thank you so much William. This worked. It was saved as a string after all.

      Comment

      Working...
      X