STATA doesnt find variables

Martin Semmelberg

Join Date: May 2022

Posts: 22
#1

STATA doesnt find variables

25 May 2022, 02:24

Hi everyone. Im new to stata and cant figure out where the problem is. The errror says: Cant find variable race. If I put the weights back into the command: weights not found. The variables are in the dataset im using.

use $data/psid_10.dta

keep if region==1

preserve
* Collapse
collapse (p90) wealth_w_equity
*[aweights]

**[aw=weights]
collapse (sum) wealth_total=wealth_w_equity
*[aweights],
*by( race)

* Scale to million USD
replace wealth_total=wealth_w_equity/1000000

* Graph
*graph bar home_equity wealth_w_equity stack, over(region race)
graph bar (sum) wealth_total, over(race)

restore
*log close
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35653
#2

25 May 2022, 02:39

Once you run your collapse you get a reduced dataset with only those variables mentioned. From your code it seems that you commented out by(race) so the reduced dataset will be just one observation. with the total for all races.

If I understand correctly the collapses are unnecessary, as with your original dataset

Code:

graph bar (sum) total_equity if region == 1, over(race)

will give a bar chart of sums for each race in region 1.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35653
#3

25 May 2022, 07:55

Just to add that often it's a good idea to list or edit the data after each data management step to see where you've got to.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2170
#4

25 May 2022, 08:30

I'm not a computer scientist by trade, but I oftentimes adapt computer science principles to my coding. For example........ I'm obsessed with writing simple, effective tests for my code.

Code:

u "http://econ.korea.ac.kr/~chirokhan/panelbook/data/basque-clean.dta", clear loc int_time = 1975 g treated = cond(regionno==17 & year >= `int_time',1,0) labvars year gdpcap "Year" "ln(GDP per 100,000)" replace regionname = trim(regexr(regionname,"$.+$ *","")) egen id = group(regionname), label(regionname) // makes a unique ID order id, b(year) *keep if year >= 1960 drop if inlist(id,18) //12 as id != 18 drop regionno xtset id year, y

In this code, I wanna test the effect of terrorism on GDP per capita. But there's a problem: one of the units is the average for all 17 autonomous communities in Spain, the national average. This might give misleading conclusions if I compare a unit within a country to the entire country. So, I drop the unit if the unit is = to 18.

To verify that my data are correct, I write what we'd call a unit test. A simple, short test to make sure your dataset looks right. Here I do this with the assert command.

I also do this when I want to make sure the values of a variable make sense. So let's use more code from another paper of mine

Code:

import delimited "https://www2.census.gov/programs-surveys/popest/datasets/2010-2020/counties/asrh/CC-EST2020-ALLDATA.csv", clear keep if agegrp==0 & year == 13 destring tot_pop-hnac_female, replace g white_pop = wa_male + wa_female g black_pop = ba_male + ba_female g hisp_pop = h_male + h_female cls keep state-tot_pop white_pop black_pop hisp_pop drop year age rename (state county tot_pop) (sid cid pop) gcollapse (sum) pop white_pop black_pop hisp_pop, by(sid cid) // I use gcollapse, but the normal one works too. as pop > 0

Here I grab population data from the internet for all 3000 something American counties. I calculate racial populations and keep the variables I'm interested in, and add them accordingly at the state-county level.

Just to make sure my data are plausible though, I test the population variable to make sure that it only contains positive values, since negative values are impossible. Now you don't need to do this with everything, but especially when you need many variables for a given calculation, having your code not-work unless the values make sense/are present within the dataset is a great idea, in my opinion.
Comment
Martin Semmelberg

Join Date: May 2022

Posts: 22
#5

30 May 2022, 07:32

thanks everyone for your help.
Comment

Announcement

STATA doesnt find variables

Comment

Comment

Comment

Comment