Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • "Use in" error within "while" loop

    Hi all,

    I'm experiencing a curious problem while trying to loop through a particularly large dataset. I'm trying to compress and clean the data a million observations at a time, to ensure I don't go above my computer's memory capacity.

    My first iteration (when i = 1 and interval_start, interval_end are 1 and 1000000, respectively) works fine, but when the loop starts again I get error stating "using required". Why does it work the first time but not the second time? I know it successfully completes the second compress of the first iteration, and saves the first dataset, as this is the output I get.
    " variable concepts10 was str72 now str69
    variable concepts11 was str73 now str72
    variable concepts15 was str108 now str72
    variable concepts17 was str108 now str72
    variable concepts20 was str74 now str72
    variable concepts23 was str85 now str72
    variable concepts24 was str85 now str75
    variable concepts25 was str73 now str72
    variable concepts26 was str43 now str41
    variable concepts27 was str36 now str34
    variable concepts28 was str19 now str1
    (82,895,562 bytes saved)
    file OpenAlex_pull_p1.dta saved


    1000001
    2000000
    2

    using required "


    The code is included below, as well as a visual example of my data. (I'm sorry it's not in a good format - dataex was giving me a "data width (579 chars) exceeds max linesize. Try specifying fewer variables" error. The exact nature of the data is also less material than the nature of the code.)

    I'd appreciate any help or advice you could give!


    cd $pull_data
    describe using OpenAlex_pull
    local num_obs = `r(N)'
    display `num_obs'
    local interval_start = 1
    local interval_end = 1000000
    local done = 0
    local i = 1
    while `done' != 1 {
    display `interval_start'
    display `interval_end'
    clear all
    display `i'
    use in `interval_start'/`interval_end' using OpenAlex_pull, clear
    capture drop multiple_concepts
    local interval_start `interval_end' + 1
    display `interval_start'
    local interval_end `interval_end' + 1000000
    display `interval_end'
    compress
    split concepts, p(",")
    des, short
    local n_vars `r(k)'
    local n_concept_vars = `n_vars' - 12
    gen keep = 0
    forvalues j = 1/`n_concept_vars' {
    display `j'
    replace keep = 1 if substr(concepts`j', 1, 7) == "Physics" & (substr(concepts`j', -4, 1) == "9" | substr(concepts`j', -5, 1) == "1") // Identify those obs which have a Physics rating of 90-99% or 100%, respectively
    }
    drop if keep != 1
    cd $pull_data
    compress
    save OpenAlex_pull_p`i', replace
    local i `i' + 1
    if `interval_end' > `num_obs' {
    local done = 1
    }
    }

    search_name concepts
    A JORISSEN Physics/0/94.4,Astronomy/1/87.9,Astrophysics/1/87.5,Computer science/0/85.2,Computer vision/1/77.9,Stars/2/77.7,Quantum mechanics/1/65.6,Mathematics/0/38.5,Spectral line/2/31.7,Galaxy/2/24.4,Biology/0/22.5,Binary number/2/21.9,Arithmetic/1/21.9,Chemistry/0/20.8,Geography/0/20.2,

    A JORISSEN Art/0/29.8,Philosophy/0/26.3,History/0/21.1,Physics/0/21.1,

    A JORISSEN Physics/0/100.0,Astronomy/1/50.0,Geometry/1/50.0,Combinatorial chemistry/1/50.0,Theology/1/50.0,Mathematics/0/50.0,Biochemistry/1/50.0,Quantum mechanics/1/50.0,Stereochemistry/1/50.0,Biology/0/50.0,History/0/50.0,Thermodynamics/1/50.0,Mathematical analysis/1/50.0,Philosophy/0/50.0,Art/0/50.0,Medicinal chemistry/1/50.0,Multiplicity (mathematics)/2/50.0,Catalysis/2/50.0,Archaeology/1/50.0,Component (thermodynamics)/2/50.0,Dipole/2/50.0,Organic chemistry/1/50.0,Chemistry/0/50.0,Geography/0/50.0,Treasure/2/50.0,

    A JORISSEN Astronomy/1/100.0,Computer vision/1/100.0,Computer science/0/100.0,Astrophysics/1/100.0,Quantum mechanics/1/100.0,Physics/0/100.0,Supernova/2/100.0,Stars/2/100.0,Asymptotic giant branch/3/100.0,Nucleosynthesis/3/50.0,Spectral line/2/50.0,Orbital period/3/50.0,Mathematics/0/50.0,Binary number/2/50.0,Giant star/3/50.0,Arithmetic/1/50.0,Galaxy/2/50.0,Metallicity/3/50.0,Stellar evolution/3/50.0,s-process/4/50.0,Binary system/3/50.0,Atomic physics/1/50.0,Nuclear physics/1/50.0,Neutron star/2/50.0,White dwarf/3/50.0,

    A JORISSEN Physics/0/91.7,Mechanics/1/83.3,Engineering/0/83.3,Mathematics/0/75.0,Mechanical engineering/1/66.7,Geometry/1/58.3,Geology/0/58.3,Materials science/0/58.3,Flow (mathematics)/2/50.0,Thermodynamics/1/50.0,Geomorphology/1/50.0,Aerospace engineering/1/50.0,Venturi effect/3/41.7,Computer science/0/41.7,Nozzle/2/41.7,Oceanography/1/41.7,Discharge coefficient/3/41.7,Inlet/2/41.7,Biology/0/33.3,Meteorology/1/33.3,Composite material/1/33.3,Economics/0/33.3,Reynolds number/3/33.3,Turbulence/2/33.3,Geography/0/33.3,

    A JORISSEN Computer science/0/100.0,Astronomy/1/50.0,Information retrieval/1/50.0,Computer vision/1/50.0,Mathematics/0/50.0,Environmental science/0/50.0,Astrophysics/1/50.0,Quantum mechanics/1/50.0,Galaxy/2/50.0,Statistics/1/50.0,Physics/0/50.0,Mathematical analysis/1/50.0,Stars/2/50.0,Survey data collection/2/50.0,Milky Way/3/50.0,Content (measure theory)/2/50.0,

    A JORISSEN Computer science/0/100.0,Quantum mechanics/1/100.0,Physics/0/100.0,Astronomy/1/50.0,Remote sensing/1/50.0,Thermodynamics/1/50.0,Optics/1/50.0,Geology/0/50.0,Interferometry/2/50.0,Component (thermodynamics)/2/50.0,Geography/0/50.0,

  • #2
    The problem arises from the same error made in three places:
    Code:
    local interval_start `interval_start' + 100000
    although legal syntax does not do what you need it to do. In particular, it does not increment the value of local macro interval start by 100000. What it does is replace the value of interval_start by a string consisting of the previous value of interval_start followed by the string "+ 100000". The same problem arises in your re-initialization of local macros interval_end and i. The particular error message you are getting arises because, with these local macros being wrong, your -use command parses as:
    Code:
    use in 100000 + 1/100000 + 100000 using OpenAlex_pull, clear
    Because neither 100000 + 1 nor 100000 + 100000 is a literal integer (it is an expression that, if evaluated, would yield an integer--but that's not legal with -in-), Stata fails in its attempts to parse the -in- clause of the -user command. The error message, unfortunately, is misleading, because Stata is just completely confused by this syntax and can't really figure out what is going on.

    The correct syntax for incrementing your interval_start is
    Code:
    local interval_start = `interval_start' + 100000
    By including the = character, you tell Stata to actually evaluate that sum and then put the result in local macro interval_start. That will give you 100001 in interval_start when you go into the second iteration, rather than a text-string that Stata cannot decipher in the context of -in-. Make this change for the reinitialization of interval_end and i as well.

    Comment


    • #3
      Thanks Clyde. That fixed my problem.

      In the past I thought I'd used them interchangeably, but will be careful to note the difference you've outlined here in the future.

      Comment


      • #4
        In the past I thought I'd used them interchangeably, but will be careful to note the difference
        Well, in some contexts, it makes no difference. For example, if I were to write:
        Code:
        local i = 1
        ...
        local i `i'+1
        gen x = `i'
        that will create a varible x with value 2. That's because -gen x = `i'- expands to -gen x = 1+1-, and the -gen- command naturally evaluates whatever expression it finds to the right of =. But the -in #/#- clause does not allow expressions: you must have literal integer values there. -forvalues j = #/#- is another place where expressions are not permitted. So it's a matter of knowing what the syntax restrictions are in a particular command. (Of course, you can always "play it safe" by using -local macroname= expression-. I can't think of any situation where that will lead to a syntax error because any place that will accept the expression 1+1 will also accept 2. So unless you are specifically trying to build up the text version of an expression without evaluating it along the way, you won't go wrong by including the = when using the macro to hold the value of an arithmetic or logical expression.

        Comment

        Working...
        X