Showing distribution of groups

Eliza Brown

Join Date: Mar 2015

Posts: 4
#1

Showing distribution of groups

21 May 2015, 03:38

I am dealing with a huge amount of data, and I want to show that the distribution of students per class is not "normal". To give you an example:

If a school has 90 students who needs to be distributed over multiple classes, for example 3, you would expect each class to have 30 students. So class 1 has 30 students, class 2 has 30 students and class 3 has 30 students. But what I see in my data is that class 1 has 45 students, class 2 has 15 students and class 3 has 30 students. So this does not follow a normal distribution.

Because I have 992 schools for which I want to show this lack of normal distribution, I would like to ask you how can I show this in a graph/diagram/table so that it is still easy to read? Not all schools have 3 classes, some of them have only 1 and others have 6 due to the different enrollment of students.

I am completely new in Stata and do not really have an idea how to show this. Any help is welcome, I know I am at the beginners level so if you can help me out, its much appreciated!
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

21 May 2015, 04:27

Hello Eliza,

I don't know if I understood your query correctly, It depends on your study design but you may have "lack of balance" between clusters instead of depart from the normality assumption.

That said, to get a view of the distribution of the number of students by school, maybe you could type:

Code:

. by school, sort: summarize students, detail

You'll get the mean, the median, interquartile range, skewness and kurtosis.

Best,

Marcos

Last edited by Marcos Almeida; 21 May 2015, 04:32.

Best regards,

Marcos
Comment
Eliza Brown

Join Date: Mar 2015

Posts: 4
#3

21 May 2015, 05:47

Hi Marcos,

thank you very much for your reaction. I tried the suggested code, and it looks like something that could be useful. However, it shows the information about class sizes for each school seperately. I hope for a table/graph/regression that can show it for all the schools together. Therefore I tried "by classid, sort: summarize classize, detail" where classid is 1 2 3 for the class and classize is the size of the class.

It is interesting information but does not show that the enrollment of students in schools is not equally divided over classes. Thank for your help again!
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10173
#4

21 May 2015, 06:15

Assuming that your data is as follows

Code:

+--------------------------+ | class1 class2 class3 | |--------------------------| 1. | 1 2 5 | 2. | 2 3 5 | 3. | 3 4 5 | 4. | 1 4 5 | +--------------------------+

where each observarion represents the number of students in a given class (class 1 - class 3), what you want to show is that the number of students is not uniform across classes. You can use a bar graph to show this.

Code:

graph bar (mean) class1 (mean) class2 (mean) class3

Otherwise, show us how your data is structured if this does not work.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10173
#5

21 May 2015, 06:23

Sorry, I neglected to read the entire description. My suggestion is to pick a sample of the 992 schools (since it is not feasible to draw 992 graphs). You can group classes by region, for example, and plot mean class size by region. Alternatively, pick the average class for each region and plot this. The main point is to select a sample out of your sample - which should still put your point across.

You should use average class size per school, so that you have one observation for each school. Generally, the fact that class sizes are not uniform across schools would come as no surprise to anyone. There are a lot of factors that affect the size of a class: e.g., population density (densely populated areas would on average have bigger classes), level of investment in schooling (higher investments result in recruitment of more teachers and smaller class sizes), etc. So you need a regression framework to control for all these factors before you can make any conclusions.

Last edited by Andrew Musau; 21 May 2015, 06:38.
Comment
Eliza Brown

Join Date: Mar 2015

Posts: 4
#6

21 May 2015, 11:18

Thanks for your advice Andrew! Here's a copy of my data, as you can see the schools are marked with a code, and I have 992 different schlcodes in my dataset.

Indeed, I can show the diffent class sizes of each school seperately with a histogram or bar. But when I do that with 992 schools, it is hard to get any conclusion out of that. So therefore I am looking for a way to easily show that schools do not make an equitable distribution of students between classes.

1 Photo
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10173
#7

21 May 2015, 11:47

One way to narrow down is by towns. Can you check how many distinct towns you have in your dataset?

Code:

bys townid: gen n_towns = _n == 1 count if n_towns
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35589
#8

21 May 2015, 12:03

You can do analyses like this:

Code:

egen mean = mean(c_size), by(schlcode) egen sd = sd(c_size), by(schlcode) egen tag = tag(schlcode) histogram sd if tag histogram mean if tag

The idea is to get interesting summaries for each school and then include each school just once.

Nothing stops you starting with a synoptic view:

Code:

histogram c_size, discrete

Note: Please read FAQ Advice about using CODE delimiters as here, not photo attachments.
Comment

Announcement

Showing distribution of groups

Comment

Comment

Comment

Comment

Comment

Comment

Comment