Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel Data - Weighted Average per state per year - Weighted for in-state and out-of-state

    I am using Stata version 15.1 and I have 47 variables and 1,010 observations in my dataset. I am doing educational research with 112 colleges included (This is setup as a panel data set). However, I have expanded it so that I can capture the average prices for all colleges in set categories. I now have 69,999 observations in my data set.

    My goal:
    I am trying to create a variable that will represents the weighted average price of colleges per state within a year range on my panel data set. For example the average price that would fill for this field would account for 40% of the average for the state the college is in plus 60% of the average of states that the college is not in.
    Articles and forums I have reviewed
    https://www.statalist.org/forums/for...weight-my-data
    https://www.stata.com/manuals13/svyp...tification.pdf
    These were very helpful, but it left me unsure how to account for set weights when I want it to account for the current state for the year
    What I have been able to do so far that has been effective:
    I have been able to generate a average per year for the following categories- 1—Public, four-year or above 2—Private not-for-profit, four-year or above 3—Private for-profit, four-year or above 4—Public, two-year. I have all the individual colleges in the data set and I know their prices. The variable for the price I want to average is "avgprice_per_year."

    What I am trying with code modification narrative:
    I now need to take this one step further to account for a weighted average.

    My first thought was to modify the code that is working to provide an average per year per type (see below):


    sort year
    forval sec = 1/6 {
    by year : egen _sector_`sec' = mean(avg_tuition_fees_ft) if sector==`sec'
    by year : egen sector_`sec' = max(_sector_`sec')
    local lab: label sectordf `sec'
    label var sector_`sec' `"`lab'"'
    }
    drop _sector*

    *****************Rename******************

    rename sector_1 Public_4yr_or_above
    rename sector_2 Private_NFP_4yr_or_above
    rename sector_3 Private_FP_4yr_or_above
    rename sector_4 Public_2yr
    rename sector_5 Private_NFP_2yr
    rename sector_6 Private_FP_2yr
    As I consider how to account for the averages on a per-state basis my best assessment is to simply add state like the following and then this would put the value per state per year with each college throughout my data set.

    For example, Anywhere University Arizona 2008 would have the same value as Unknown University Arizona 2008. However, Questionable University Arkansas 2008 would have a different average since it is a different state even though it is the same year.

    This is the code I was thinking would work:

    sort year state
    forval sec = 1/6 {
    by year state : egen _sector_`sec' = mean(avg_tuition_fees_ft) if sector==`sec'
    by year state : egen sector_`sec' = max(_sector_`sec')
    local lab: label sectordf `sec'
    label var sector_`sec' `"`lab'"'
    }
    drop _sector*
    *****************Rename******************
    rename sector_1 Public_4yr_or_above
    rename sector_2 Private_NFP_4yr_or_above
    rename sector_3 Private_FP_4yr_or_above
    rename sector_4 Public_2yr
    rename sector_5 Private_NFP_2yr
    rename sector_6 Private_FP_2yr

    Still Problematic
    This unfortunately is not working and I am not sure how to account for the weighted average I want for each college in each state per year. I would be most grateful for some guidance in this regard.

    Data Ex information (I populated the field I am struggling with, “weighted_average_60_40” as 1)

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long unitid int year str20 state double avg_tuition_fees_ft float(Public_4yr_or_above Private_NFP_4yr_or_above Private_FP_4yr_or_above Public_2yr Private_NFP_2yr Private_FP_2yr weighted_average_60_40)
    101189 2008 "Alabama"    12700 5922.394 19520.313 15352.144  2820.107  9714.963  12881.91 1
    101189 2009 "Alabama"    13690 6265.079  20407.69 15560.724   2956.39   9989.72 14087.479 1
    101189 2010 "Alabama"    14820 6608.007  21402.32 15243.452   3165.12 10624.734  13521.01 1
    101189 2011 "Alabama"    15880 7072.479  22200.95 15185.118  3381.933 11853.928  13948.85 1
    101189 2012 "Alabama"    17380 7360.464 23126.385 15275.388 3519.0115 12151.585 14208.567 1
    101189 2013 "Alabama"    18230 7573.859  23792.57 15208.097   3661.56 12504.496 14351.418 1
    101189 2014 "Alabama"    18750 7750.858 24454.877 15483.184  3777.615 12589.093   14354.9 1
    101189 2015 "Alabama"    19280 7991.114  25185.45 15604.883  3924.161 12476.803 14654.785 1
    101189 2016 "Alabama"    20130 8037.394  26062.97  15333.85 4075.6096 13207.118  14619.66 1
    105899 2008 "Arizona"    14440 5922.394 19520.313 15352.144  2820.107  9714.963  12881.91 1
    105899 2009 "Arizona"    15177 6265.079  20407.69 15560.724   2956.39   9989.72 14087.479 1
    105899 2010 "Arizona"    16174 6608.007  21402.32 15243.452   3165.12 10624.734  13521.01 1
    105899 2011 "Arizona"    18968 7072.479  22200.95 15185.118  3381.933 11853.928  13948.85 1
    105899 2012 "Arizona"    21040 7360.464 23126.385 15275.388 3519.0115 12151.585 14208.567 1
    105899 2013 "Arizona"    21832 7573.859  23792.57 15208.097   3661.56 12504.496 14351.418 1
    105899 2014 "Arizona"    22466 7750.858 24454.877 15483.184  3777.615 12589.093   14354.9 1
    105899 2015 "Arizona"    23110 7991.114  25185.45 15604.883  3924.161 12476.803 14654.785 1
    105899 2016 "Arizona"    23896 8037.394  26062.97  15333.85 4075.6096 13207.118  14619.66 1
    107044 2008 "Arkansas"   13130 5922.394 19520.313 15352.144  2820.107  9714.963  12881.91 1
    107044 2009 "Arkansas"   13580 6265.079  20407.69 15560.724   2956.39   9989.72 14087.479 1
    107044 2010 "Arkansas"   14040 6608.007  21402.32 15243.452   3165.12 10624.734  13521.01 1
    107044 2011 "Arkansas"   14610 7072.479  22200.95 15185.118  3381.933 11853.928  13948.85 1
    107044 2012 "Arkansas"   15240 7360.464 23126.385 15275.388 3519.0115 12151.585 14208.567 1
    107044 2013 "Arkansas"   16170 7573.859  23792.57 15208.097   3661.56 12504.496 14351.418 1
    107044 2014 "Arkansas"   17040 7750.858 24454.877 15483.184  3777.615 12589.093   14354.9 1
    107044 2015 "Arkansas"   17805 7991.114  25185.45 15604.883  3924.161 12476.803 14654.785 1
    107044 2016 "Arkansas"   18440 8037.394  26062.97  15333.85 4075.6096 13207.118  14619.66 1
    107141 2008 "Arkansas"   18066 5922.394 19520.313 15352.144  2820.107  9714.963  12881.91 1
    107141 2009 "Arkansas"   18880 6265.079  20407.69 15560.724   2956.39   9989.72 14087.479 1
    107141 2010 "Arkansas"   19730 6608.007  21402.32 15243.452   3165.12 10624.734  13521.01 1
    107141 2011 "Arkansas"   20766 7072.479  22200.95 15185.118  3381.933 11853.928  13948.85 1
    107141 2012 "Arkansas"   21774 7360.464 23126.385 15275.388 3519.0115 12151.585 14208.567 1
    107141 2013 "Arkansas"   22734 7573.859  23792.57 15208.097   3661.56 12504.496 14351.418 1
    107141 2014 "Arkansas"   23588 7750.858 24454.877 15483.184  3777.615 12589.093   14354.9 1
    107141 2015 "Arkansas"   24468 7991.114  25185.45 15604.883  3924.161 12476.803 14654.785 1
    107141 2016 "Arkansas"   25324 8037.394  26062.97  15333.85 4075.6096 13207.118  14619.66 1
    109785 2008 "California" 26640 5922.394 19520.313 15352.144  2820.107  9714.963  12881.91 1
    109785 2009 "California" 27750 6265.079  20407.69 15560.724   2956.39   9989.72 14087.479 1
    109785 2010 "California" 28800 6608.007  21402.32 15243.452   3165.12 10624.734  13521.01 1
    109785 2011 "California" 29940 7072.479  22200.95 15185.118  3381.933 11853.928  13948.85 1
    109785 2012 "California" 31076 7360.464 23126.385 15275.388 3519.0115 12151.585 14208.567 1
    109785 2013 "California" 32306 7573.859  23792.57 15208.097   3661.56 12504.496 14351.418 1
    109785 2014 "California" 33096 7750.858 24454.877 15483.184  3777.615 12589.093   14354.9 1
    109785 2015 "California" 34754 7991.114  25185.45 15604.883  3924.161 12476.803 14654.785 1
    109785 2016 "California" 36120 8037.394  26062.97  15333.85 4075.6096 13207.118  14619.66 1
    110097 2008 "California" 26424 5922.394 19520.313 15352.144  2820.107  9714.963  12881.91 1
    110097 2009 "California" 27744 6265.079  20407.69 15560.724   2956.39   9989.72 14087.479 1
    110097 2010 "California" 28852 6608.007  21402.32 15243.452   3165.12 10624.734  13521.01 1
    110097 2011 "California" 29908 7072.479  22200.95 15185.118  3381.933 11853.928  13948.85 1
    110097 2012 "California" 31004 7360.464 23126.385 15275.388 3519.0115 12151.585 14208.567 1
    110097 2013 "California" 32142 7573.859  23792.57 15208.097   3661.56 12504.496 14351.418 1
    110097 2014 "California" 33322 7750.858 24454.877 15483.184  3777.615 12589.093   14354.9 1
    110097 2015 "California" 34498 7991.114  25185.45 15604.883  3924.161 12476.803 14654.785 1
    110097 2016 "California" 36696 8037.394  26062.97  15333.85 4075.6096 13207.118  14619.66 1
    110361 2008 "California" 22330 5922.394 19520.313 15352.144  2820.107  9714.963  12881.91 1
    110361 2009 "California" 23266 6265.079  20407.69 15560.724   2956.39   9989.72 14087.479 1
    110361 2010 "California" 24654 6608.007  21402.32 15243.452   3165.12 10624.734  13521.01 1
    110361 2011 "California" 25808 7072.479  22200.95 15185.118  3381.933 11853.928  13948.85 1
    110361 2012 "California" 26900 7360.464 23126.385 15275.388 3519.0115 12151.585 14208.567 1
    110361 2013 "California" 28122 7573.859  23792.57 15208.097   3661.56 12504.496 14351.418 1
    110361 2014 "California" 29422 7750.858 24454.877 15483.184  3777.615 12589.093   14354.9 1
    110361 2015 "California" 30384 7991.114  25185.45 15604.883  3924.161 12476.803 14654.785 1
    110361 2016 "California" 31372 8037.394  26062.97  15333.85 4075.6096 13207.118  14619.66 1
    112075 2008 "California" 23930 5922.394 19520.313 15352.144  2820.107  9714.963  12881.91 1
    112075 2009 "California" 24950 6265.079  20407.69 15560.724   2956.39   9989.72 14087.479 1
    112075 2010 "California" 26000 6608.007  21402.32 15243.452   3165.12 10624.734  13521.01 1
    112075 2011 "California" 27300 7072.479  22200.95 15185.118  3381.933 11853.928  13948.85 1
    112075 2012 "California" 28500 7360.464 23126.385 15275.388 3519.0115 12151.585 14208.567 1
    112075 2013 "California" 29630 7573.859  23792.57 15208.097   3661.56 12504.496 14351.418 1
    112075 2014 "California" 30640 7750.858 24454.877 15483.184  3777.615 12589.093   14354.9 1
    112075 2015 "California" 31690 7991.114  25185.45 15604.883  3924.161 12476.803 14654.785 1
    112075 2016 "California" 32780 8037.394  26062.97  15333.85 4075.6096 13207.118  14619.66 1
    112084 2008 "California" 20980 5922.394 19520.313 15352.144  2820.107  9714.963  12881.91 1
    112084 2009 "California" 21610 6265.079  20407.69 15560.724   2956.39   9989.72 14087.479 1
    112084 2010 "California" 22692 6608.007  21402.32 15243.452   3165.12 10624.734  13521.01 1
    112084 2011 "California" 23814 7072.479  22200.95 15185.118  3381.933 11853.928  13948.85 1
    112084 2012 "California" 25108 7360.464 23126.385 15275.388 3519.0115 12151.585 14208.567 1
    112084 2013 "California" 25888 7573.859  23792.57 15208.097   3661.56 12504.496 14351.418 1
    112084 2014 "California" 27090 7750.858 24454.877 15483.184  3777.615 12589.093   14354.9 1
    112084 2015 "California" 19791 7991.114  25185.45 15604.883  3924.161 12476.803 14654.785 1
    112084 2016 "California" 21650 8037.394  26062.97  15333.85 4075.6096 13207.118  14619.66 1
    114813 2008 "California" 23202 5922.394 19520.313 15352.144  2820.107  9714.963  12881.91 1
    114813 2009 "California" 23898 6265.079  20407.69 15560.724   2956.39   9989.72 14087.479 1
    114813 2010 "California" 23904 6608.007  21402.32 15243.452   3165.12 10624.734  13521.01 1
    114813 2011 "California" 24790 7072.479  22200.95 15185.118  3381.933 11853.928  13948.85 1
    114813 2012 "California" 25336 7360.464 23126.385 15275.388 3519.0115 12151.585 14208.567 1
    114813 2013 "California" 25716 7573.859  23792.57 15208.097   3661.56 12504.496 14351.418 1
    114813 2014 "California" 26638 7750.858 24454.877 15483.184  3777.615 12589.093   14354.9 1
    114813 2015 "California" 27954 7991.114  25185.45 15604.883  3924.161 12476.803 14654.785 1
    114813 2016 "California" 29370 8037.394  26062.97  15333.85 4075.6096 13207.118  14619.66 1
    117104 2008 "California" 11650 5922.394 19520.313 15352.144  2820.107  9714.963  12881.91 1
    117104 2009 "California" 12500 6265.079  20407.69 15560.724   2956.39   9989.72 14087.479 1
    117104 2010 "California" 12560 6608.007  21402.32 15243.452   3165.12 10624.734  13521.01 1
    117104 2011 "California" 12500 7072.479  22200.95 15185.118  3381.933 11853.928  13948.85 1
    117104 2012 "California" 13100 7360.464 23126.385 15275.388 3519.0115 12151.585 14208.567 1
    117104 2013 "California" 12500 7573.859  23792.57 15208.097   3661.56 12504.496 14351.418 1
    117104 2014 "California" 12860 7750.858 24454.877 15483.184  3777.615 12589.093   14354.9 1
    117104 2015 "California" 13320 7991.114  25185.45 15604.883  3924.161 12476.803 14654.785 1
    117104 2016 "California" 14138 8037.394  26062.97  15333.85 4075.6096 13207.118  14619.66 1
    120537 2008 "California" 21890 5922.394 19520.313 15352.144  2820.107  9714.963  12881.91 1
    end
    label values unitid unitiddf
    label values year yeardf
    label values avg_tuition_fees_ft tuition_fees_ftdf

  • #2
    I am completely confused by your explanation. Your code attempts all involve a variable sector that does not seem to exist in your data set. So I don't even understand how you got those to run at all. Also, I cannot understand what it is you want to take the weighted average of. You have 6 different variables, Public_4yr_of_above - Private_FP_2yr. How are those to be put together to come up with this avg_tuition_fees_ft variable? And where do the 40% and 60% figures come from? Do they apply to all states in all years? And I don't understand how you want to get an average "per state within a year range" because, at least in the example data, and it would make sense for this to also be true of the full data set, there is only one observation for any state in any year. So, such an average, based on one observation, is just the value in the observation itself.

    Please clarify. It might help if you set out how you would go about calculating this by hand if you had to do it that way. That would bring clarity to what the ingredients are and how they get put together. Translating that into code is probably not too hard.

    Comment


    • #3
      Clyde Schechter I really appreciate you taking the time to look at this! You are certainly right that I need to narrow in on this and be clearer about what I am searching for. I apologize for not making this clear in the first place. Let me remove the parts about institution type and work toward a weighted average per state per year for now. I think that will make it less confusing and help me to wrap my mind around this portion of it.

      Towards More Clarity

      Using a selection of the data ex below is a sample of how it would work for the year 2009 using California as an example for the weighted average:

      For California = 23349
      =AVERAGE(D3:D30)*0.4+AVERAGE(D39:D93)*0.6

      For Arkansas
      =AVERAGE(D3:D12,D39:D93)*0.4+AVERAGE(D21:D30)*0.6
      unitid year state avg_tuition_fees_ft float(Public_4yr_or_above Private_NFP_4yr_or_above Private_FP_4yr_or_above Public_2yr Private_NFP_2yr Private_FP_2yr weighted_average_60_40)
      101189 2009 Alabama 13690 6265.079 20407.69 15560.72 2956.39 9989.72 14087.48 1
      105899 2009 Arizona 15177 6265.079 20407.69 15560.72 2956.39 9989.72 14087.48 1
      107044 2009 Arkansas 13580 6265.079 20407.69 15560.72 2956.39 9989.72 14087.48 20069
      107141 2009 Arkansas 18880 6265.079 20407.69 15560.72 2956.39 9989.72 14087.48 20069
      109785 2009 California 27750 6265.079 20407.69 15560.72 2956.39 9989.72 14087.48 23349
      110097 2009 California 27744 6265.079 20407.69 15560.72 2956.39 9989.72 14087.48 23349
      110361 2009 California 23266 6265.079 20407.69 15560.72 2956.39 9989.72 14087.48 23349
      112075 2009 California 24950 6265.079 20407.69 15560.72 2956.39 9989.72 14087.48 23349
      112084 2009 California 21610 6265.079 20407.69 15560.72 2956.39 9989.72 14087.48 23349
      114813 2009 California 23898 6265.079 20407.69 15560.72 2956.39 9989.72 14087.48 23349
      117104 2009 California 12500 6265.079 20407.69 15560.72 2956.39 9989.72 14087.48 23349
      I hope that adds clarity. I want it to put the weighted average for each institution in a particular state in a particular year across my entire data set.

      Comment


      • #4
        Well, we still need more clarity.
        =AVERAGE(D3:D30)*0.4+AVERAGE(D39:D93)*0.6
        You are using spreadsheet notation here. But Stata is not a spreadsheet, and I don't know what D3, D30, D39 and D93 refer to in your Stata data set. Also, on the assumption that in the spreadsheet all the observations for a given state and year would be in consecutive rows, why are rows 31 through 38 omitted from the calculation--and how in your Stata data set would we know which observations to similarly exclude from the calculation.

        Comment


        • #5
          Clyde Schechter ,

          I certainly understand and will do my best to further clarify. Each column is one after the other starting with A. This means that A is the unit ID and that D is avg_tuition_fees_ft. The rows are not one after the other since as you will see when comparing this selection to the data ex export these do not match up. This is because I narrowed the dataex list down to the current selection. I will put a screenshot below so this is clearer. Again, I apologize that I didn't think this through as I was posting.


          Click image for larger version

Name:	2023-01-12 08_34_58-Window.png
Views:	1
Size:	34.5 KB
ID:	1696918



          My thought is that Stata looks at the state listed for each year and performs the weighted average calculation when the current state is present to compare to all none state that are present. If 2009 and current state then average of current state across 2009 * weighted amount + average of all other states for 2009 * weighted average.




          Attached Files

          Comment


          • #6
            I'm still not sure I get this, but I think I have it. If I do, the following code will give what you want:

            Code:
            by year, sort: egen num = total(avg_tuition_fees_ft)
            by year: gen den = _N
            
            by state year, sort: egen st_num = total(avg_tuition_fees_ft)
            by state year: gen st_den = _N
            
            gen wanted = 0.4*(st_num/st_den) + 0.6*((num-st_num)/(den-st_den))

            Comment


            • #7
              Thank you very much, Clyde Schechter . I will give this a try and see how this works.

              Comment

              Working...
              X