Hello,
Currently, I am working on a project for my econometrics project at uni. I am using data from the British Election Study, to be specific the BES2017_W13 dataset that was available online. We are trying to see which variables influence the probability of voting for the conservative party, however, there is a problem with the data set.
The variable profile_gross_personal, which is an int according to stata, contained intervals as the outcome. To overcome this strange problem I used the following commands:
tostring(profile_gross_personal), generate(gross)
tabulate gross
tabulate profile_gross_personal
gen income=0
replace income=2500 if gross=="1"
replace income=7500 if gross=="2"
replace income=12500 if gross=="3"
replace income=17500 if gross=="4"
replace income=22500 if gross=="5"
replace income=27500 if gross=="6"
replace income=32500 if gross=="7"
replace income=37500 if gross=="8"
replace income=42500 if gross=="9"
replace income=47500 if gross=="10"
replace income=55000 if gross=="11"
replace income=65000 if gross=="12"
replace income=85000 if gross=="13"
replace income=100000 if gross=="14"
drop if income==0 (is getting rid of the missing value that we cause by the generate command earlier)
And I continued this in order to get the average value in the interval as the outcome value. However, as a sanity check, I tried to run a basic regression. I used reg income england (dummy variable I created) in order to see if the process I did work. In this case, it worked and yielded me a regression outcome. However, when I run the logit income england I get a r(2000) error; outcome does not vary.
Does someone know a way around this problem?
my first guess that it was because of the fact that it was a float with strange values, since normally floats are only dummies. Hence, I used the following command:
recast int income, force
However, this cuts of all the values of income that are above 32500. Therefore, this workaround did not work and it did not solve the r2000 problem.
So my question is, does someone know a way to overcome the first problem w.r.t. the r2000 error or a solution to the initial problem that it was stored as a string but classified as an int? I also got some strange classifications in other variables, so this r2000 appears more often.
Thank you in Advance!
Kind Regards
Currently, I am working on a project for my econometrics project at uni. I am using data from the British Election Study, to be specific the BES2017_W13 dataset that was available online. We are trying to see which variables influence the probability of voting for the conservative party, however, there is a problem with the data set.
The variable profile_gross_personal, which is an int according to stata, contained intervals as the outcome. To overcome this strange problem I used the following commands:
tostring(profile_gross_personal), generate(gross)
tabulate gross
tabulate profile_gross_personal
gen income=0
replace income=2500 if gross=="1"
replace income=7500 if gross=="2"
replace income=12500 if gross=="3"
replace income=17500 if gross=="4"
replace income=22500 if gross=="5"
replace income=27500 if gross=="6"
replace income=32500 if gross=="7"
replace income=37500 if gross=="8"
replace income=42500 if gross=="9"
replace income=47500 if gross=="10"
replace income=55000 if gross=="11"
replace income=65000 if gross=="12"
replace income=85000 if gross=="13"
replace income=100000 if gross=="14"
drop if income==0 (is getting rid of the missing value that we cause by the generate command earlier)
And I continued this in order to get the average value in the interval as the outcome value. However, as a sanity check, I tried to run a basic regression. I used reg income england (dummy variable I created) in order to see if the process I did work. In this case, it worked and yielded me a regression outcome. However, when I run the logit income england I get a r(2000) error; outcome does not vary.
Does someone know a way around this problem?
my first guess that it was because of the fact that it was a float with strange values, since normally floats are only dummies. Hence, I used the following command:
recast int income, force
However, this cuts of all the values of income that are above 32500. Therefore, this workaround did not work and it did not solve the r2000 problem.
So my question is, does someone know a way to overcome the first problem w.r.t. the r2000 error or a solution to the initial problem that it was stored as a string but classified as an int? I also got some strange classifications in other variables, so this r2000 appears more often.
Thank you in Advance!
Kind Regards
Comment