Creating dummy variable (median) in panel

Sven Elst

Join Date: Jul 2016

Posts: 18
#1

Creating dummy variable (median) in panel

11 Jul 2016, 10:47

Hi everyone,

My dataset contains observations for countries and industries over 10 years. For every industry i have a measure of its performance, value added growth (per year and per country). What I want to do is create a (dummy) variable that divides "Value added growth" into 7 industries below and 7 industries above the median for the whole panel (not per country).

The end result should be something like: Value added growth (j,k,t) = A*(below median) + B(above median). I found a similar problem on the forum (link below), however the answer does not really solve my problem.

http://www.stata.com/statalist/archi.../msg00097.html

Thanks in advance!
Tags: None
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#2

11 Jul 2016, 11:19

Code:

egen median=median(va_growth) gen below_median = va_growth < median gen above_median = va_growth > median

That should work? (replace va_growth by the variable name of value added growth) Note that the median's not allocated to either group.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

11 Jul 2016, 11:37

As suggested before, your questions would benefit from attention to the Statalist FAQ material in sections 9-12. It would be particularly helpful to post a small hand-made example, with just a few countries, industries, and years, showing the data and what you expect the results to be. In particular, please read FAQ #12 and use dataex and CODE delimiters when posting to Statalist.

Having said that, in the absence of a clearer statement of the problem, I'lll make one up myself.

We will assume that i indexes your industries (apparently you have 14 of them), c indexes your countries, and t indexes your time (years). Let's now assume that G(i,c,t) is the value added growth for industry i in country c in year t.

Now, I assume your data is in long format, as advised in a previous thread, with one observation of the variable G for each combination of the variables i, c and t.

Now, I'm lost. I simply cannot figure out what steps you want to take next. You apparently need to create some sort of summary measure within each industry i and year t, call it Git. summariziung G for that industry and year across all countries. Then you apparently need to take the median of Git within each year - call it Gim. Then you want to compare Git to Gim. But without knowing what you think Git should be, I cannot proceed.

It wasn't intended this way when I started writing, but this is a demonstration of why your questions have not gotten strong response. Too much is left to the reader's imagination.

Added in edit: this crossed with post #2. Perhaps it is that simple, although when you write that you want the median for the whole panel, I assumed you meant for each year separately.

Last edited by William Lisowski; 11 Jul 2016, 11:40.
1 like
Comment
Sven Elst

Join Date: Jul 2016

Posts: 18
#4

12 Jul 2016, 04:05

Hey guys, thank you for the replies.

Thank to the posts of William, I finanlly have my model down like I want it to! However, the problem of data division remains. I tried Jesse's sugestion but ended up with 2 dummy variables that only devide "value added growth". I included a sample this time so you guys can see what i mean.

the sample includes:

9 countries (C_id),
9 years (t)
10 industries (3***)
and the variable i want to devide, value added growth (gr_ind). id is egen = group (country industry)

Now, maybe my question was a bit ambiguous last time, so I will try and rephrase. What I want (if possible) is to create 2 new variables, 1 which includes 5 industries above median growth, and 5 industries below median growth. The devision should be the same for all countries and across time. The end result should be something:

T industry country gr_ind above below
1 ind1 Bulgaria value 1 0
1 ind2 Bulgaria value 1 0
1 ind3 Bulgaria value 1 0
1 ind4 Bulgaria value 1 0
1 ind5 Bulgaria value 1 0
1 ind6 Bulgaria value 0 1
1 ind7 Bulgaria value 0 1
1 ind8 Bulgaria value 0 1
1 ind9 Bulgaria value 0 1
1 ind10 Bulgaria value 0 1

In this case I have the 10 industries devided per industry, which should be the same for every other country in the sample. If this is not possible, I would also like to kow.

With kind regards,

Sven

Last edited by Sven Elst; 12 Jul 2016, 04:08.
Comment
Sven Elst

Join Date: Jul 2016

Posts: 18
#5

12 Jul 2016, 04:07

Forgot to include the sample
Attached Files

sample.dta (22.0 KB, 2 views)
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#6

12 Jul 2016, 04:14

Which median exactly do you mean? If we take the sample data, we have 9 countries x 10 industries x 9 countries = 810 growth observations. How exactly do you want to rank these to find the median? Given that you want to divide your sample by industry, I assume you want to create a single growth value per industry first? Presumably the mean across years and countries.

This would give you 10 values, one per industry. Is the median you're looking for the median value of this group of 10?

By the way, please refer to point 12 in the FAQ about using the dataex command to include a sample. Dataex generates a chunk of text that can be copied directly into Stata, such that we can work with a sample of the data directly, making coding a bit easier.

Originally posted by Sven Elst View Post

Hey guys, thank you for the replies.

Thank to the posts of William, I finanlly have my model down like I want it to! However, the problem of data division remains. I tried Jesse's sugestion but ended up with 2 dummy variables that only devide "value added growth". I included a sample this time so you guys can see what i mean.

the sample includes:

9 countries (C_id),
9 years (t)
10 industries (3***)
and ofcourese the variable i want to devide, value added growth (gr_ind). id is egen = group (country industry)

Now, maybe my question was a bit ambiguous last time, so I will try and rephrase. What I want (if possible) is to create 2 new variables, 1 which includes 5 industries above median growth, and 5 industries below median growth. The devision should be the same for all countries and across time. The end result should be something:

T industry country gr_ind above below
1 ind1 Bulgaria value 1 0
1 ind2 Bulgaria value 1 0
1 ind3 Bulgaria value 1 0
1 ind4 Bulgaria value 1 0
1 ind5 Bulgaria value 1 0
1 ind6 Bulgaria value 0 1
1 ind7 Bulgaria value 0 1
1 ind8 Bulgaria value 0 1
1 ind9 Bulgaria value 0 1
1 ind10 Bulgaria value 0 1

In this case I have the 10 industries devided per industry, which should be the same for every other country in the sample. If this is not possible, I would also like to kow.

With kind regards,

Svenw
2 likes
Comment
Sven Elst

Join Date: Jul 2016

Posts: 18
#7

12 Jul 2016, 04:36

Dear Jesse,

Thank you for the quick reply. "This would give you 10 values, one per industry. Is the median you're looking for the median value of this group of 10?" I think what you suggested might actually be the best option. In such a case i would have growth per industry and then devide by median to create the dummies i need. Could you perhaps post the commands needed for achieving this? Would be much appreciated!
Comment
Jesse Wursten

Join Date: Jan 2016

Posts: 915
#8

12 Jul 2016, 05:50

If your panel is balanced, I think you can get away with using

bysort industry: egen mean_growth = mean(gr_ind)
egen median_growth = median(mean_growth)
gen large = mean_growth > median_growth

| large
industry | 0 1 | Total
-----------+----------------------+----------
3110 | 81 0 | 81
3210 | 81 0 | 81
3220 | 81 0 | 81
3310 | 81 0 | 81
3410 | 0 81 | 81
3420 | 0 81 | 81
3710 | 81 0 | 81
3810 | 0 81 | 81
3820 | 0 81 | 81
3825 | 0 81 | 81
-----------+----------------------+----------
Total | 405 405 | 810
Comment
Sven Elst

Join Date: Jul 2016

Posts: 18
#9

12 Jul 2016, 06:10

Panel is balanced (strong), but thank! You have been such a great help. This was the last piece of the puzzle, finally onto reporting some results. Thanks everyone who replied to my post. I know you guys have been patient with me!

kind regards,

Sven
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35725
#10

12 Jul 2016, 06:37

Good you solved the problem, but next time people will expect more attention to the points raised by William in #3.
Comment

Announcement

Creating dummy variable (median) in panel

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment