Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to cut data and then run in each category ?

    Hi

    I have a dataset that gathers information about firms (number of employees, tunrover, sales, long-term debt etc..).
    I want to divide for each year sales in categories (let's say every 100 000 euros) and in each of these categories, I want to look at the mean debt. So the firms have to be divided into subsets depending on their sales so that I can compute the mean debt and compare it with the mean of the other groups.
    Here is what I did, but the results are not what I should get when I look at a subset :

    forvalues i=2004/2009 {
    egen sales_cat_`i'=cut(sales) if sales<100000, at(-31000, -30900 to 100000)
    foreach x in sales_cat_`i' {
    egen mean_LT_debt_`i'=mean(long_term_debt)
    gen mean_lt_debt_to_sales_cat_`i'=(mean_LT_debt_`i')/sales_cat_`i'
    clear
    }
    }

    Where did I go wrong ?

    Manon

  • #2
    Welcome to Statalist.

    This may be closer to what you want.
    Code:
    forvalues i=2004/2009 {
    egen sales_cat_`i'=cut(sales) if sales<100000, at(-31000, -30900 to 100000)
    bysort sales_cat_`i': egen mean_LT_debt_`i'=mean(long_term_debt)
    gen mean_lt_debt_to_sales_cat_`i'=(mean_LT_debt_`i')/sales_cat_`i'
    clear
    }
    But without sample data I have no way of knowing if the results will make sense, or if the missing values in the sales_cat variable for sales>10000 will cause problems.

    Please review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. The more you help others understand your problem, the more likely others are to be able to help you solve your problem. It would be particularly helpful to post a small hand-made example showing the data before the process and how you expect it to look after the process. In particular, please read FAQ #12 and use dataex and CODE delimiters when posting to Statalist.

    Also, Section 12.1 is particularly pertinent

    Never say just that something "doesn't work" or "didn't work", but explain precisely in what sense you didn't get what you wanted.

    Comment


    • #3
      I'll just jump in to say that this entire approach may well be misguided. Taking a continuous variable and creating categories from it by imposing cutpoints just throws away information. Also, the proposed categorization says that two firms with sales of -30899 and +99999 are essentially the same but two firms with sales of -30899 and -30901 are radically different. There are occasional circumstances where something abrupt really does happen at a cut point, but they are pretty rare in nature.

      Whatever you're planning to do with this, I suspect you would get more useful and meaningful results by just keeping sales as a continuous variable and using it directly.

      Comment

      Working...
      X