Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Adding minimal observations requirement

    I have several questions, I hope someone can help me with some or all:
    1. I would like to add a requirement in my dataset that every company has minimal 5 observations (5 years of observations). I want Stata to drop the data for the companies that do not meet this requirement. How can I add such a requirement?
    2. How can I add a requirement in Stata that all control variables need to be non-missing? The command drop if missing (variables) is not giving me the desired outcome, so is there maybe another way?
    3. I would like to have a dummy variable which should be equal to 1 if variable X or/and variable Y are greater than 0, and it should equal 0 if variable X AND variable Y are not greater than 0. How can I include two variables in the dummy requirement or how can I combine two dummies to make it one? I did: gen newvariable = (variable>0) I did this for both variables separately, but I would like to have only 1 dummy.
    4. Also about the dummies: when there is a missing value, I would like the dummy to be also missing. However, now if I use the command gen newvariable = (variable>0) all missing values result in dummy = 1. I found already the command gen newvariable = (variable>0) if variable< but that does not work, Stata gives me "invalid syntax".
    I hope somebody can help me with this, thanks in advance!

  • #2
    I would not be sure that the way you want to handle missing data and so forth is a good idea, but it can be done. You didn't tell us anything about the structure of your data, which is generally necessary if you want to get a good answer. I'd encourage you to re-read the StataList FAQ for new users, with particular attention to discussion of using the -dataex- command to provide example data. With those qualifications, here are some suggestions:

    Question 1:
    Code:
    bysort Your_CompanyID_Variable: gen byte EnoughData = (_N >= 5)
    Run_Your_Command if EnoughData
    Question 2:
    You didn't show us what your "... if missing(variables)" command actually was, so we can't know what you might have done wrong. And, we don't know what your "desired outcome" was, so that makes the difficulty worse.
    Code:
    drop if missing(var1, var2, var3, ..... )
    should have eliminated those observations from the data set.

    Question 3.
    Your description of the desired logical condition is not clear, as "and/or" is contradictory. The following will assign 0 if both variables are <=0, and assign 1 if either of them is > 0.
    Code:
    gen byte XY_dummy = (X > 0) | (Y > 0) if !missing(X, Y)
    It sounds like you might think that logical conditions in Stata can only refer to one variable, and it also sounds like you might be confused about "&" and "|" in Stata. See -help operator- and section 13.2.3 of the PDF documentation for some clarification.

    Question 4.
    Because "missing" is a large positive value in Stata, observations with missing values must be excluded from being caught by the ">" operator. See -help missing-

    Comment

    Working...
    X