Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Keep Observation Trouble

    Hi there!

    I would first like to preface this post with the fact that I am very new to Stata; so, I apologize is this is a simple issue, but I am at a complete loss. I created a .do file to pre-process the ACS 5-year PUMS (Public Use Microdata Sample) files available as part of the census. Specifically, I am trying to only keep those observations specific to my project (those observations over the St. Louis Metro area where there are zero-car households or no hazard insurance). If I manually perform this process using the Data Editor, it successfully drops the irrelevant observations. However, the code I have written returns an excel file only with the heading labels and no subsequent observations.

    I have attached my .do file for review. Any help would be greatly appreciated!

    Thanks so much!
    Amy


    Attached Files

  • #2
    I would suggest you check each part of the -keep- command:

    Code:
     
     list if met2013==41180 list if met2013==41180 & vehicles == 9 list if met2013==41180 & propinsr == 0001
    Does Stata show you data for each condition? If not, then there is some mistake in your coding.
    Stata/MP 14.1 (64-bit x86-64)
    Revision 19 May 2016
    Win 8.1

    Comment


    • #3
      Welcome to Statalist, Amy.

      Your code raises a few questions for me.

      At the bottom of your code, following the export excel you save the same data creating a Stata dataset. How many observations does Stata report are in that dataset?
      Code:
      describe using "C:\Users\Tom and Amy\Documents\Amy SLU PhD\Potential Research Ideas\Levees\PUMS\STL_edited.dta", short
      If it has more than 0 observations, your problem is with your export excel command. Seems unlikely, but still. Myself, I always use describe, short before saving data, and after using it in a subsequent program, so my logs document exactly what version of the file was written or read.

      At the top of your code, you have an infix command
      Code:
      quietly infix                   ///
        int     year         1-4      ///
        byte    datanum      5-6      ///
        ...
        int     perwt        69-78    ///
        using `"C:\Users\Tom and Amy\Documents\Amy SLU PhD\Potential Research Ideas\Levees\PUMS\IPUMS_downloads\USA_Household\usa_00001.dta"'
      Why the "quietly"? My sense was not that infix is so verbose about what it is doing that its output needs to be suppressed.

      But the real question is, "USA_Household\usa_00001.dta"? I would expect a file suitable for input to infix to have a .txt extension, or something very similar. Conversely, I would expect a file having a .dta extension to be a Stata dataset, not a fixed-format text file. I know you said your keep commands worked when you ran them by hand, but I'm wondering if in creating the do-file you haven't accidentally specified the wrong input file to infix, and by running it with the quietly prefix, suppressed output that might have shown that in fact no data was read.

      Comment


      • #4
        Hi everyone!

        Thanks so much for such timely responses! I truly appreciate the help!

        First a little clarification: the .dta file is actually the downloaded Stata format from the IPUMS data website (https://usa.ipums.org/usa-action/variables/group). It is a rather large data set, with over 15 million observations. I tried using just the infix command as you suggested, and it actually froze up the program (I guess due to the size of the file and the substantial number of exceptions it results in??).

        And, I have attached some screen shots to demonstrate the discrepancies in results between manually entering the code in the command line (I get the same results by using the data editor) vs. the .do file. Hopefully, this will help!

        Thanks again!




        Attached Files

        Comment


        • #5
          I am going to start with advice I should have offered in post #3.

          Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your two posts. Note especially sections 9-12 on how to best pose your question.

          The question you ask - why does keep work differently from the command line and from the do-file - assume you are applying the keep commands to the same data in each case. This is probably not what you have done. Try running describe, brief from the command line before the keep commands, and add it to the do-fie before the keep commands.

          In particular, your do-file begins with a clear command that removes all the data from memory, and then uses the infix command to read a dataset which you tell us in post #4 is in Stata format. The first sentence in the Description section of the output of help infix tells us

          infix reads into memory from a disk dataset that is not in Stata format.
          Or else the do-file you attached in post #1 is not the do-file you are actually using.

          Added in edit: In post #1 you told us that your Excel file had no rows of data. Why did you not mention that Stata had given you an error message that told you it was because there were too many observations to fit into an Excel spreadsheet? You left the reader to believe that all your observations were lost, rather than that too many were kept. Do read the FAQ. We are not mind readers here on Statalist.

          In post #4 you tell us you obtained your dataset from IPUMS. The do-file you posted originated at Social Explorer for reading in downloaded data they have produced.
          Last edited by William Lisowski; 31 Aug 2018, 06:30.

          Comment


          • #6
            I continue to try different options. So, I simplified my code to the bare basics. By removing all of the original label define commands and simply running the keep if expressions directly on the file, I finally get the correct subset of observations. I'm not sure why those were so problematic.

            Comment

            Working...
            X