Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • STATA doesnt find variables

    Hi everyone. Im new to stata and cant figure out where the problem is. The errror says: Cant find variable race. If I put the weights back into the command: weights not found. The variables are in the dataset im using.

    use $data/psid_10.dta

    keep if region==1

    preserve
    * Collapse
    collapse (p90) wealth_w_equity
    *[aweights]

    **[aw=weights]
    collapse (sum) wealth_total=wealth_w_equity
    *[aweights],
    *by( race)

    * Scale to million USD
    replace wealth_total=wealth_w_equity/1000000

    * Graph
    *graph bar home_equity wealth_w_equity stack, over(region race)
    graph bar (sum) wealth_total, over(race)


    restore
    *log close


  • #2
    Once you run your collapse you get a reduced dataset with only those variables mentioned. From your code it seems that you commented out by(race) so the reduced dataset will be just one observation. with the total for all races.

    If I understand correctly the collapses are unnecessary, as with your original dataset


    Code:
    graph bar (sum) total_equity if region == 1, over(race)
    will give a bar chart of sums for each race in region 1.

    Comment


    • #3
      Just to add that often it's a good idea to list or edit the data after each data management step to see where you've got to.

      Comment


      • #4
        I'm not a computer scientist by trade, but I oftentimes adapt computer science principles to my coding. For example........ I'm obsessed with writing simple, effective tests for my code.

        Code:
        u "http://econ.korea.ac.kr/~chirokhan/panelbook/data/basque-clean.dta", clear
        
        loc int_time = 1975
        
        g treated = cond(regionno==17 & year >= `int_time',1,0)
        
        labvars year gdpcap "Year" "ln(GDP per 100,000)"
        
        replace regionname = trim(regexr(regionname,"\(.+\) *",""))
        
        egen id = group(regionname), label(regionname) // makes a unique ID
        
        order id, b(year)
        
        *keep if year >= 1960
        drop if inlist(id,18) //12
        
        as id != 18
        
        drop regionno
        xtset id year, y
        In this code, I wanna test the effect of terrorism on GDP per capita. But there's a problem: one of the units is the average for all 17 autonomous communities in Spain, the national average. This might give misleading conclusions if I compare a unit within a country to the entire country. So, I drop the unit if the unit is = to 18.

        To verify that my data are correct, I write what we'd call a unit test. A simple, short test to make sure your dataset looks right. Here I do this with the assert command.


        I also do this when I want to make sure the values of a variable make sense. So let's use more code from another paper of mine
        Code:
        import delimited "https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/counties/asrh/CC-EST2020-ALLDATA.csv", clear
        keep if agegrp==0 & year == 13
        
        
        destring tot_pop-hnac_female, replace
        
        
        g white_pop = wa_male + wa_female
        
        g black_pop = ba_male + ba_female
        
        g hisp_pop = h_male + h_female
        
        
        cls
        
        keep state-tot_pop white_pop black_pop hisp_pop
        
        drop year age
        
        rename (state county tot_pop) (sid cid pop)
        
        gcollapse (sum) pop white_pop black_pop hisp_pop, by(sid cid)
        
        
        // I use gcollapse, but the normal one works too.
        
        as pop > 0
        Here I grab population data from the internet for all 3000 something American counties. I calculate racial populations and keep the variables I'm interested in, and add them accordingly at the state-county level.


        Just to make sure my data are plausible though, I test the population variable to make sure that it only contains positive values, since negative values are impossible. Now you don't need to do this with everything, but especially when you need many variables for a given calculation, having your code not-work unless the values make sense/are present within the dataset is a great idea, in my opinion.

        Comment


        • #5
          thanks everyone for your help.

          Comment

          Working...
          X