No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • marie nour
    started a topic Why dummy variable is not correct?

    Why dummy variable is not correct?

    Dear Stata users,

    I am trying to create two dummy variables respecting a few conditions, but for some reason it is not working all the time. For some years, it works perfectly, but for others I should have dummy equals to 1, but it is not the case.

    Here are the steps: first, I create the 25th and the 75th percentiles that will become my thresholds. Then, I create two dummy variables using conditions.
    Here is the code:

    bysort prod year : egen p75KL=pctile(diffKL_jw_mean) if PTA_j==1, p(75)
    bysort prod year : egen p25KL=pctile(diffKL_jw_mean) if PTA_j==1, p(25)
    gen extKLyears = (p75KL <= diffKL_jw_mean) if PTA_j ==1 & KL_w !=. & KL_j !=.
    gen intKLyears = (p25KL >= diffKL_jw_mean) if PTA_j ==1 & KL_w !=. & KL_j !=.
    In a small dataset using the same figures I get from previous calculations, the above-mentioned code works perfectly
    input prod str3 j year diffKL_jw_mean PTA_j KL_w KL_j
    1 "KEN" 2000 2.9516403 1 28022 1654
    1 "UGA" 2000 3.2742886 1 28022 1029
    1 "TZA" 2000 3.7610051 1 28022 668
    However, using my big dataset (in which previous calculations were made), I should find, like in this small dataset, that extKLyears =1 for the observation j="TZA" and year =2000, but it is not the case. I am also confused by the fact that it happens for some years but not all of them.

    So checking a little bit, I think that the problem comes from the format of numbers: in my big sample where I do all the calculations, the format for the variable "diffKL_jw_mean" is %10.0g while for "p75KL" it's %9.0g (while in the small dataset I created above, formats are the same for both variables).
    Could it be the problem and how can I fix it?

    Thank you in advance !

  • marie nour
    Thank you for your reply,
    actually it was a problem with precision in stata. It was fixed using the command egen double to create the percentile variable.

    Leave a comment:

  • Phil Bromiley
    You need to follow the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output, and sample data using dataex. We need to be able to replicate the problem to really help you.

    You're calculating things by product year. Could the large dataset include different observations for the product-year you are looking at?

    Leave a comment:

  • Nick Cox
    That's a difference in display format only. It's not responsible for apparent differences in values.

    But I don't think we've got anywhere near enough information to be positive about whether this is even a real problem.

    Leave a comment: