Dear Stata users,

I am trying to create two dummy variables respecting a few conditions, but for some reason it is not working all the time. For some years, it works perfectly, but for others I should have dummy equals to 1, but it is not the case.

Here are the steps: first, I create the 25th and the 75th percentiles that will become my thresholds. Then, I create two dummy variables using conditions.

Here is the code:

In a small dataset using the same figures I get from previous calculations, the above-mentioned code works perfectly

However, using my big dataset (in which previous calculations were made), I should find, like in this small dataset, that extKLyears =1 for the observation j="TZA" and year =2000, but it is not the case. I am also confused by the fact that it happens for some years but not all of them.

So checking a little bit, I think that the problem comes from the format of numbers: in my big sample where I do all the calculations, the format for the variable "diffKL_jw_mean" is %10.0g while for "p75KL" it's %9.0g (while in the small dataset I created above, formats are the same for both variables).

Could it be the problem and how can I fix it?

Thank you in advance !

I am trying to create two dummy variables respecting a few conditions, but for some reason it is not working all the time. For some years, it works perfectly, but for others I should have dummy equals to 1, but it is not the case.

Here are the steps: first, I create the 25th and the 75th percentiles that will become my thresholds. Then, I create two dummy variables using conditions.

Here is the code:

Code:

bysort prod year : egen p75KL=pctile(diffKL_jw_mean) if PTA_j==1, p(75) bysort prod year : egen p25KL=pctile(diffKL_jw_mean) if PTA_j==1, p(25) gen extKLyears = (p75KL <= diffKL_jw_mean) if PTA_j ==1 & KL_w !=. & KL_j !=. gen intKLyears = (p25KL >= diffKL_jw_mean) if PTA_j ==1 & KL_w !=. & KL_j !=.

Code:

input prod str3 j year diffKL_jw_mean PTA_j KL_w KL_j 1 "KEN" 2000 2.9516403 1 28022 1654 1 "UGA" 2000 3.2742886 1 28022 1029 1 "TZA" 2000 3.7610051 1 28022 668 end

So checking a little bit, I think that the problem comes from the format of numbers: in my big sample where I do all the calculations, the format for the variable "diffKL_jw_mean" is %10.0g while for "p75KL" it's %9.0g (while in the small dataset I created above, formats are the same for both variables).

Could it be the problem and how can I fix it?

Thank you in advance !

## Comment