Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Having trouble with drop if command

    Have a data set that is too large to export to Excel (1,656,203 observations) and need to drop several years of observations in order to make it doable.

    When I try the following code, it does not work:

    drop if academicyearandperiod == 200020015

    [This variable is a numeric/float variable and basically captures the academic year ("2000-2001") and reporting period ("5" = fall to spring, full academic year) for student-level files in the data set I am using.]

    Is there something about my variable "academicyearandperiod" or the drop if command that would complicate this step?

    When I run the code, I do not get an error message, but instead get a "(0 observations deleted)" message indicating that it is in some way not recognizing the observations that exist under this year. There are, in fact, observations under the year and period 200020015, but for some reason when I run this code, it is not recognizing them or counting them. Any advice?

  • #2
    You do not present a data example, but if the variable is a float, it could be a precision problem. See

    Code:
    help precision
    This may work:

    Code:
    drop if academicyearandperiod == float(200020015)
    Otherwise, present a data example.

    Comment


    • #3
      The code suggested in #2 will not work, also for precision reasons. The value 200020015, is indeed to long to be represented with all 9 digits preserved in float. So with the truncation that results from the -float()- function we notice:
      Code:
      . display %12.0f float(200020015)
         200020016
      So the wrong observations would be deleted. (Or perhaps none will be deleted if that variable never takes the value 200020016.)
      So, O.P. is going to have to retrace his steps back to the creation of the variable academicyearand period, and re-create it as a long or a double. Once that is done, his original command from #1 will work as expected.

      Comment


      • #4
        Checking in with a different dataset here. I have made sure that the variable is re-created as a type long. Still, when I go to run commands referencing that variable (academicyearandperiod), which is nine digits long, I get error messages saying that it shows no observations.

        I have tried it with another variable that is just the academic year (for example 20202021, for AY 2020-2021), which is 8 digits instead of 9 and I get the same result. What am I doing wrong?

        . codebook academicyearfullplusperiodnum2

        ------------------------------------------------------------------------------
        academicyearfullplusperiodnum2 AY+Period (String)
        ------------------------------------------------------------------------------

        Type: Numeric (long)

        Range: [1.999e+08,2.020e+08] Units: 1
        Unique values: 74 Missing .: 0/11,059,400

        Mean: 2.0e+08
        Std. dev.: 628919

        Percentiles: 10% 25% 50% 75% 90%
        2.0e+08 2.0e+08 2.0e+08 2.0e+08 2.0e+08



        . codebook academicyearfullnum

        ------------------------------------------------------------------------------
        academicyearfullnum Numeric Version of Full Academic Year
        ------------------------------------------------------------------------------

        Type: Numeric (int)
        Label: academicyearfullnum

        Range: [1,23] Units: 1
        Unique values: 22 Missing .: 0/11,059,400

        Examples: 5 20042005
        10 20092010
        14 20132014
        18 20172018


        Two separate examples (below) of me running code with other variable based on academicyearfullnum:

        (1) Trying to generate a new variable that is dependent on academic year

        gen yeariwant = 0

        . replace yeariwant = 1 if academicyearfullnum == 20202021
        (0 real changes made)


        (2) Trying to drop all variables EXCEPT those in a given academic year

        . drop if academicyearfullnum != 20202021
        (11,059,400 observations deleted)


        As you can see all are unsuccessful. Is there anything I am doing wrong here?

        Comment


        • #5
          ------------------------------------------------------------------------------
          academicyearfullnum Numeric Version of Full Academic Year
          ------------------------------------------------------------------------------

          Type: Numeric (int)
          An int isn't even remotely long enough to hold 8 digits. Its range is limited to between -32767 and +32740. You aren't even in the ballpark here. This needs to be created as a long. A long can hold 9 digits, and you can squeeze a tenth digit out of it if you're only in the low 10-digits (as you would be with years).



          Comment


          • #6
            I'm not at my computer at the moment, but why not just have the year as the time variable (2010, not 20102011) and have the delta be equal to 2 (I think this is the solution). Or if not, just have a generic time variable that's meant to represent the period in question? That way, we wouldn't have to fool with the precision part... right?

            Comment


            • #7
              I think better create a binary flag that flags your observations for inclusion in the sample.

              Comment


              • #8
                On a broader note, it's perhaps more useful to store the academic year either as a string (e.g. "2010/11") or else as two numeric values that represent a range (e.g. a start and end year). These will be easier to use for filtering and sidestepping some of these less obvious data storage issues.

                Comment

                Working...
                X