Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • when to delete missing observations in controls when calculating industry measures

    Dear Statalist,

    I’m running a panel regression on a data set with financial data using industry measures as control variables (e.g. industry growth rate measured by sales, industry ROA). Since I have missing data in variables other than those which I use for calculating the industry measures, I’m wondering if I should drop them BEFORE or AFTER calculating the industry variables.
    If I calculated the industry variables after deleting all cases with missing data, I would lose some information because some oft the dropped cases might have influenced the level oft the industry ROA or growth rate, right?
    On the other hand, I would use cases for calculating the industry measures even though they won’t be used in the actual regression analysis. This seems a bit strange to me, too.

    Can you help me with what would be the appropriate procedure? Is there a guideline?

    Thank you!
    Best, Kathie

  • #2
    Kathie:
    welcome to the list.
    In general, deleting missing observation is not the way to go, as you are likely to end up with a biased sample (missingness might be informative: some industries might be more reluctant than others to provide balance sheets that give a fair and true view about the way the business is managed).
    At the top of that, Stata automatically applies listwise deletion to all the observations with missing values in any variables.
    The fix might be to impute those missing values (please, see -help mi-).
    Another option might be -help ipolate-.
    Kind regards,
    Carlo
    (Stata 18.0 SE)

    Comment


    • #3
      Hi Carlo!
      Thanks for your reply.
      I know that imputing the missing values would be the "cleanest" way to fix it. However my supervisor told me not to use imputation and just drop them or let Stata exclude them listwise from the regression (like you said).
      So if I can't use imputation, would it be better to not delete missing observations and calculate the variables on as many cases as possible, even if not all of them go into the regression analysis?

      Kind regards,
      Kathie

      Comment


      • #4
        Kathie:
        if you cannot impute the missing values, the cleanest fix is to let Stata use listwise deletion.
        This approach does not require any strategic choice to be made, but comes at a cost of reducing (even having, sometimes) your original sample size, with possible consequences on your results.
        Kind regards,
        Carlo
        (Stata 18.0 SE)

        Comment

        Working...
        X