Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • trying to find the % of variable from the sample of each schools.

    hi, i want to find the % of students that are included in treated and control groups based on the results and the % is from the sample size of each schools. i was wondering if there's any command to help me find the % of T and % of C from student sample on each schools?

    ************************
    ** 1. Non-compliance **
    ************************
    *Keep needed variables non non-compliance table
    keep school_id treatment_el ece_fr_cycles

    duplicates drop

    * Store the number of schools that completed 0-5 cycles of FR

    replace ece_fr_cycles = 0 if missing(ece_fr_cycles) //replace the ECE cycle variable with 0 if data is missing

    forval c = 0/5{
    count if ece_fr_cycles == `c' & treatment_el == 1
    local treat`c' `r(N)'
    count if ece_fr_cycles == `c' & treatment_el == 0
    local control`c' `r(N)'
    }

    * % students
    ???

    *Import the endline datasum
    use `el_data', clear

    replace ece_fr_cycles = 0 if missing(ece_fr_cycles)

    * Set up table variables
    gen FRcycles = .
    gen Treat_assigned_schools = .
    gen Treat_assigned_students = .
    gen Control_assigned_schools = .
    gen Control_assigned_students = .
    local i 1

    * Input values in the variables
    forval c = 0/5{
    count if treatment_el == 0
    local total_c `r(N)'
    replace FRcycles = `c' in `i'
    replace Treat_assigned_schools = `treat`c'' in `i'
    count if treatment_el == 1
    local total_t `r(N)'
    count if ece_fr_cycles == `c' & treatment_el == 1
    replace Treat_assigned_students = `r(N)'/`total_t' in `i'
    replace Control_assigned_schools = `control`c'' in `i'
    count if treatment_el == 0
    local total_c `r(N)'
    count if ece_fr_cycles == `c' & treatment_el == 0
    replace Control_assigned_students = `r(N)'/`total_c' in `i'
    local++i
    }

    * Export table
    export excel FRcycles-Control_assigned_students using "${output}/Graphs and Tables/Non-compliance.xlsx", replace firstrow(variable)

  • #2
    The answer depends on the structure of your data. Can you give us an example of your data, like is discussed in the FAQ (black bar near the top of this page)?
    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      since the data is not confidential, here's actually the data that i use
      Attached Files

      Comment


      • #4
        Maarten Buis was referring to https://www.statalist.org/forums/help#stata where it is explained that and why .dta attachments are not a good idea.

        Given an indicator variable with values 0 and 1, its mean is the fraction of values coded 1. If you want a percent, multiply by 100. Its complement is 1 minus the fraction, or 100 minus the percent as %.

        Here are 10 values 0 0 0 1 1 1 1 1 1 1. The sum is 7, the total count is 10 and the mean is 0.7 or 70% and its complement is 0.3 or 30%.

        So, you shouldn't need any loops to do the counting yourself. You just ask for the mean in whatever groups you want.

        Here is an example you can play with: in the nlswork dataset, nev_mar is (0. 1). The proportion of never married people is 0.23.

        Code:
        . webuse nlswork, clear
        (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
        
        . tab nev_mar, nola
        
         1 if never |
            married |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  0 |     21,968       77.03       77.03
                  1 |      6,550       22.97      100.00
        ------------+-----------------------------------
              Total |     28,518      100.00
        Wanting this for groups is not more difficult. Here are two ways to do it.

        Code:
        . egen fraction_nev_mar = mean(nev_mar), by(race)
        
        . tabdisp race, c(fraction_nev_mar)
        
        ----------------------------
             Race | fraction_nev_mar
        ----------+-----------------
            White |         .1956974
            Black |         .3126477
            Other |         .2904291
        ----------------------------
        
        . tab race, su(nev_mar)
        
                    |    Summary of 1 if never married
               Race |        Mean   Std. dev.       Freq.
        ------------+------------------------------------
              White |   .19569743   .39674646      20,174
              Black |   .31264768   .46360095       8,041
              Other |   .29042904   .45471133         303
        ------------+------------------------------------
              Total |    .2296795   .42063408      28,518
        
        .
        The first method can easily be extended: just put more variables into the
        Code:
         by()
        option. The documented syntax shows a by: prefix, but as above the by() option continues to work fine.

        Comment


        • #5
          thank you and so sorry before. but how about % of T student sample and % of C student sample on this table? is it still using the same command?
          Click image for larger version

Name:	Screenshot 2023-11-28 at 18.17.43.png
Views:	1
Size:	68.2 KB
ID:	1735356

          Comment


          • #6
            Sorry, but as said I won't usually try to understand a complicated .dta file and I don't understand the details of your data otherwise. If #5 didn't help, we're back to #2 and the request for a simple data example.

            Comment

            Working...
            X