trying to find the % of variable from the sample of each schools.

April Lia

Join Date: Nov 2023

Posts: 5
#1

trying to find the % of variable from the sample of each schools.

27 Nov 2023, 00:35

hi, i want to find the % of students that are included in treated and control groups based on the results and the % is from the sample size of each schools. i was wondering if there's any command to help me find the % of T and % of C from student sample on each schools?

************************
** 1. Non-compliance **
************************
*Keep needed variables non non-compliance table
keep school_id treatment_el ece_fr_cycles

duplicates drop

* Store the number of schools that completed 0-5 cycles of FR

replace ece_fr_cycles = 0 if missing(ece_fr_cycles) //replace the ECE cycle variable with 0 if data is missing

forval c = 0/5{
count if ece_fr_cycles == `c' & treatment_el == 1
local treat`c' `r(N)'
count if ece_fr_cycles == `c' & treatment_el == 0
local control`c' `r(N)'
}

* % students
???

*Import the endline datasum
use `el_data', clear

replace ece_fr_cycles = 0 if missing(ece_fr_cycles)

* Set up table variables
gen FRcycles = .
gen Treat_assigned_schools = .
gen Treat_assigned_students = .
gen Control_assigned_schools = .
gen Control_assigned_students = .
local i 1

* Input values in the variables
forval c = 0/5{
count if treatment_el == 0
local total_c `r(N)'
replace FRcycles = `c' in `i'
replace Treat_assigned_schools = `treat`c'' in `i'
count if treatment_el == 1
local total_t `r(N)'
count if ece_fr_cycles == `c' & treatment_el == 1
replace Treat_assigned_students = `r(N)'/`total_t' in `i'
replace Control_assigned_schools = `control`c'' in `i'
count if treatment_el == 0
local total_c `r(N)'
count if ece_fr_cycles == `c' & treatment_el == 0
replace Control_assigned_students = `r(N)'/`total_c' in `i'
local++i
}

* Export table
export excel FRcycles-Control_assigned_students using "${output}/Graphs and Tables/Non-compliance.xlsx", replace firstrow(variable)
Tags: None
Maarten Buis

Join Date: Mar 2014

Posts: 3465
#2

27 Nov 2023, 00:50

The answer depends on the structure of your data. Can you give us an example of your data, like is discussed in the FAQ (black bar near the top of this page)?

---------------------------------
Maarten L. Buis
University of Konstanz
Department of history and sociology
box 40
78457 Konstanz
Germany
http://www.maartenbuis.nl
---------------------------------
Comment
April Lia

Join Date: Nov 2023

Posts: 5
#3

27 Nov 2023, 01:00

since the data is not confidential, here's actually the data that i use
Attached Files

rf_clean_student_data_endline.dta (752.8 KB, 1 view)
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35757

27 Nov 2023, 03:22

Maarten Buis was referring to https://www.statalist.org/forums/help#stata where it is explained that and why .dta attachments are not a good idea.

Given an indicator variable with values 0 and 1, its mean is the fraction of values coded 1. If you want a percent, multiply by 100. Its complement is 1 minus the fraction, or 100 minus the percent as %.

Here are 10 values 0 0 0 1 1 1 1 1 1 1. The sum is 7, the total count is 10 and the mean is 0.7 or 70% and its complement is 0.3 or 30%.

So, you shouldn't need any loops to do the counting yourself. You just ask for the mean in whatever groups you want.

Here is an example you can play with: in the nlswork dataset, nev_mar is (0. 1). The proportion of never married people is 0.23.

Code:

. webuse nlswork, clear
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. tab nev_mar, nola

 1 if never |
    married |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |     21,968       77.03       77.03
          1 |      6,550       22.97      100.00
------------+-----------------------------------
      Total |     28,518      100.00

Wanting this for groups is not more difficult. Here are two ways to do it.

Code:

. egen fraction_nev_mar = mean(nev_mar), by(race)

. tabdisp race, c(fraction_nev_mar)

----------------------------
     Race | fraction_nev_mar
----------+-----------------
    White |         .1956974
    Black |         .3126477
    Other |         .2904291
----------------------------

. tab race, su(nev_mar)

            |    Summary of 1 if never married
       Race |        Mean   Std. dev.       Freq.
------------+------------------------------------
      White |   .19569743   .39674646      20,174
      Black |   .31264768   .46360095       8,041
      Other |   .29042904   .45471133         303
------------+------------------------------------
      Total |    .2296795   .42063408      28,518

.

The first method can easily be extended: just put more variables into the

Code:

 by()

option. The documented syntax shows a by: prefix, but as above the by() option continues to work fine.

Comment

April Lia

Join Date: Nov 2023

Posts: 5
#5

28 Nov 2023, 04:19

thank you and so sorry before. but how about % of T student sample and % of C student sample on this table? is it still using the same command?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35757
#6

28 Nov 2023, 05:15

Sorry, but as said I won't usually try to understand a complicated .dta file and I don't understand the details of your data otherwise. If #5 didn't help, we're back to #2 and the request for a simple data example.
Comment

Announcement