Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Use cell size as weight in an OLS regression.

    As part of my project, I use cross-sectional data on 3 cells: education, year, and census metropolitan area. Since these are cells of different sizes, I would like to use by weight in my regression the size of the cells (aw = count) where count is the number of observations in the cell.

    How to create this variable which will take the place of the size of the cells?

  • #2
    Your description is not clear. But it sounds like this is not a case for -aweights-, but rather for -fweights-. -aweights- are used when the dependent variable represents the average of some number of measurements taken for that observation.* If, instead, each observation in your data set represents a certain number of people, who all had the same values for the variables in the regression, then you need -fweights-.

    As for how to create a variable for this purpose, you need to show an example of your data so we know what your starting point is. Can't give directions to somewhere without knowing where you're starting from. Please post back, and use the -dataex- command to show your example data. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.


    *There are a few other situations where -aweights- are appropriate, but they are fairly exotic and it is unlikely yours is one of them.

    Comment


    • #3
      Hello, thank you very much for your answer. I will try to be clearer. Indeed, I am seeking to regress the wage gap between natives and recent immigrants according to the census year (t), the census metropolitan area (cmap) and the level of education (dveduc3); on a packet of predictors. My dependent variable is therefore the ratio of the average wage of natives to that of recent immigrants. To do this, from the observations of each census I created cells using the command collapse. For example, for the average salary of natives, I did: collapse w if n == 1, by (t cmap dveduc3). Where n is a dummy variable equals 1 if respondents are natives and 0 otherwise. The same logic applies to the average salary of recent immigrants as well as the explanatory variables of the model.

      Since these are cells of different sizes because of the number of observation that varies in cmap for example. Sometimes cmap receive more migrants than others. So, I would like to use in weight in my regression the size of the cells (aw = count) where count is the number of observations in the cell. The idea is that if the residue has a variance sigma ^ 2 / n, the use of cells of different size implies that some cells havea variance of the error term lower than others because the averages have been estimated on more observations. Recognizing this and exploiting it allows efficiency gains. This is a generalized least squares estimator that is simply done by OLS using cell size as the weight. It should help the accuracy of my estimates.

      Last edited by Jerry Roc; 10 Dec 2019, 22:24.

      Comment


      • #4
        input float(t cmap dveduc3 y ly sect_bien sect_serv sect_public divorcé marrié séparé)
        1981 205 1 .7912293 -.23416746 .17982456 .4802631 .3399123 .04605263 .7719298 .04276316
        1991 205 1 . . .16035183 .541272 .2983762 .05886333 .7320704 .02503383
        1996 205 1 . . .14285715 .53492063 .3222222 .06349207 .7222222 .0484127
        2001 205 1 . . .14008942 .6266766 .23323397 .06952663 .7078403 .03402367
        2006 205 1 . . .15712075 .6199691 .2229102 .07715814 .6768526 .03208556
        2011 205 1 .7393227 -.30202085 .13359292 .640996 .225411 .08973812 .6669508 .04879954
        2016 205 1 .4808717 -.7321547 .1462307 .630336 .22343324 .06488889 .6382222 .04711111
        1981 205 2 .58939755 -.52865434 .14194578 .37161085 .4864434 .06539075 .7400319 .02073365
        1991 205 2 . . .1529664 .3602573 .4867763 .05503931 .7533953 .02859185
        1996 205 2 . . .1449381 .4333576 .4217043 .06263656 .7392571 .025491625
        2001 205 2 . . .13212273 .5115842 .35629305 .07375 .72125 .039375
        2006 205 2 .4108408 -.8895494 .14871195 .5029274 .3483607 .07492796 .6997118 .04207493
        2011 205 2 .6433704 -.4410346 .15650956 .4837107 .35977975 .065683894 .713389 .04582067
        2016 205 2 .8034392 -.21885377 .1632653 .483965 .3527697 .06219081 .6784452 .04734982
        1981 205 3 .3694641 -.9957018 .05376344 .26523298 .6810036 .028673835 .7275985 .021505376
        1991 205 3 . . .06842105 .3328947 .5986842 .02368421 .7157895 .01973684
        1996 205 3 . . .05963303 .3635321 .57683486 .03899083 .6857798 .02178899
        2001 205 3 . . .06013746 .4450172 .4948454 .0437018 .7155098 .02485004
        2006 205 3 .4550262 -.7874003 .0601626 .4723577 .4674797 .0446571 .7280702 .019936204
        2011 205 3 .5482132 -.601091 .05929359 .4368204 .503886 .04968261 .704375 .023479767
        2016 205 3 .52283686 -.6484858 .04619565 .43410325 .51970106 .03838518 .7187293 .02382528
        1981 421 1 .488448 -.7165223 .20664024 .4534192 .33994055 .03369673 .728444 .016352825
        1991 421 1 .5843189 -.5373084 .18154034 .502445 .3160147 .06143032 .7359413 .021393644
        1996 421 1 .433368 -.8361681 .18300654 .5335397 .28345373 .07946336 .7179223 .03027176
        2001 421 1 .8815145 -.12611377 .17623103 .5534987 .27027026 .09258576 .6801918 .023976393
        2006 421 1 .908971 -.09544208 .16682287 .5707591 .262418 .09468284 .6646455 .023787314
        2011 421 1 .12975143 -2.0421348 .1555779 .6041164 .2403057 .08622435 .6477885 .017346123
        2016 421 1 .4776449 -.7388877 .16607773 .5918728 .24204947 .072009295 .6445993 .011033682
        1981 421 2 .7913193 -.2340538 .1653474 .3271768 .5074758 .04221636 .7405453 .02022867
        1991 421 2 .4334904 -.8358857 .16720955 .39486066 .4379298 .04994571 .7394137 .02171553
        1996 421 2 .3308537 -1.1060789 .1596929 .4629559 .37735125 .06449136 .7309021 .017274473
        2001 421 2 .51055175 -.6722633 .1621172 .4853529 .35253 .067729086 .7018592 .021580346
        2006 421 2 .4919151 -.7094492 .17958365 .4853273 .335089 .06742971 .686738 .02214481
        2011 421 2 .702799 -.3526844 .16049147 .5043517 .3351569 .0618869 .6998185 .013737185
        2016 421 2 .4574843 -.7820126 .1803199 .467523 .3521571 .05420964 .6881746 .010314224
        1981 421 3 .7555176 -.2803522 .04647436 .21314102 .7403846 .036858976 .7051282 .02403846
        1991 421 3 .5064279 -.6803734 .0617207 .29551122 .6427681 .05236908 .7051122 .02618454
        1996 421 3 .446368 -.8066115 .05386819 .3283668 .6177651 .06074499 .6939828 .024641834
        2001 421 3 .3282664 -1.1139299 .062799364 .3714742 .56572646 .06733828 .6919406 .02014846
        2006 421 3 .5391802 -.6177054 .05866667 .3968889 .54444444 .05130473 .717824 .015922159
        2011 421 3 .46027175 -.7759382 .05550402 .3790221 .5654739 .0465623 .7282725 .012646934
        2016 421 3 .5263488 -.6417911 .0566108 .4130354 .53035384 .04008827 .7351968 .010665686
        1981 462 1 .532642 -.6299058 .3142147 .4786131 .20717217 .05260316 .7303953 .03845323
        1986 462 1 .4173817 -.8737542 .2917009 .5069844 .2013147 .06419474 .7182621 .034716517
        1991 462 1 .5083237 -.6766368 .27635613 .5156604 .2079835 .06937225 .7213115 .028322004
        1996 462 1 .4524606 -.7930546 .26734158 .535808 .1968504 .08458943 .6998875 .028046494
        2001 462 1 .4608352 -.7747148 .26541936 .5782464 .15633424 .08614677 .6855286 .025281344
        2006 462 1 .4245684 -.8566822 .24193706 .6028452 .15521777 .08885645 .6602433 .02374696
        2011 462 1 .5243362 -.6456221 .22281304 .6176488 .15953815 .08341462 .6429719 .02058302
        2016 462 1 .51758397 -.6585835 .2114422 .6471792 .14137852 .07991385 .6144865 .017229654
        1981 462 2 .6403642 -.4457182 .25551602 .3850534 .3594306 .0455516 .7269276 .033451956
        1986 462 2 .5293301 -.6361431 .22616315 .4147715 .3590653 .05570624 .710411 .024619237
        1991 462 2 .5325829 -.6300167 .2231277 .4391376 .3377347 .06354446 .7118837 .024035485
        1996 462 2 .5089632 -.6753795 .21036585 .4671748 .3224593 .07743903 .6810976 .024898374
        2001 462 2 .4989706 -.6952081 .21553503 .5296439 .25482106 .07542601 .6736323 .022600897
        2006 462 2 .4670363 -.7613482 .2135956 .53374577 .25265864 .07796445 .6571091 .02436823
        2011 462 2 .4788253 -.7364194 .19587426 .53200984 .27211592 .07782095 .6553815 .016992344
        2016 462 2 .4595092 -.7775964 .2047081 .5265833 .26870862 .06208322 .6523073 .016070953
        1981 462 3 .6961233 -.3622285 .11101322 .3105727 .5784141 .03568282 .6797357 .02599119
        1986 462 3 .4801833 -.7335873 .11612251 .3625042 .5213733 .05318075 .6570178 .027600134
        1991 462 3 .495815 -.7015524 .1182315 .4038748 .4778937 .05414804 .6658387 .0205332
        1996 462 3 .4814139 -.731028 .12606649 .4143866 .4595469 .05501618 .6624007 .01985878
        2001 462 3 .53442794 -.6265584 .11657374 .4799797 .4034465 .05435884 .6746072 .02090725
        2006 462 3 .4206707 -.865905 .09820614 .4999493 .4018445 .05842446 .6736533 .0191373
        2011 462 3 .4569258 -.7832342 .08827938 .4985503 .4131704 .05099962 .6720929 .015085313
        2016 462 3 .52652735 -.641452 .0875493 .5226376 .3898131 .04309534 .6786661 .012569474
        1991 499 1 2.0723553 .7286858 .3212074 .4837461 .19504644 .08900928 .7337461 .021671826
        1996 499 1 . . .299421 .48883376 .21174525 .0942928 .7361456 .019851116
        2001 499 1 .43869755 -.823945 .3106796 .4880847 .20123565 .1051709 .7212971 .01928133
        2006 499 1 . . .3178054 .51552796 .16666667 .1 .6989796 .02244898
        2011 499 1 .3265473 -1.1191806 .27508876 .5522411 .17267017 .08669812 .6785629 .027864054
        2016 499 1 .7352927 -.3074866 .3080745 .55031055 .14161491 .06333739 .6565164 .018270401
        1991 499 2 .602606 -.5064917 .28151986 .3385147 .37996545 .0656304 .7573403 .021588946
        1996 499 2 1.9682672 .6771536 .24642856 .4017857 .3517857 .08303571 .7553571 .02232143
        2001 499 2 .3608366 -1.01933 .29539952 .3769169 .3276836 .09032774 .7442046 .012789768
        2006 499 2 .4239582 -.8581203 .27509293 .4324659 .29244113 .09529627 .705559 .015882712
        2011 499 2 .9311587 -.07132553 .2317775 .4187021 .3495204 .07628935 .7141134 .016016636
        2016 499 2 .5422592 -.6120111 .27031803 .40459365 .32508835 .05172414 .6908046 .011494253
        1991 499 3 .7765098 -.25294608 .12997903 .20125785 .6687631 .07368421 .6905263 .029473685
        1996 499 3 .6206982 -.4769103 .0976059 .25414366 .6482505 .05893186 .7274401 .014732965
        2001 499 3 .1710549 -1.7657707 .11896551 .25 .6310345 .05479452 .7260274 .017123288
        2006 499 3 1.1474274 .13752243 .0910308 .3012048 .6077644 .06044678 .7293035 .017082786
        2011 499 3 .4064872 -.9002029 .0984964 .28733307 .6141706 .06629132 .7434617 .006780482
        2016 499 3 .655824 -.4218628 .11200923 .28406465 .6039261 .04396843 .7316798 .00789177
        1981 505 1 .3658735 -1.0054675 .13722126 .4193825 .4433962 .04502573 .7375643 .05403088
        1986 505 1 .4617907 -.7726434 .1654811 .40921 .4253089 .05466118 .7330588 .0490453
        1991 505 1 .5010676 -.6910143 .15337974 .4286998 .4179205 .05479452 .732989 .0354817
        1996 505 1 .4315348 -.8404072 .14182112 .4877787 .3704002 .07279076 .7085683 .033843674
        2001 505 1 .5421342 -.6122417 .1690821 .5390736 .29184428 .0765753 .7047188 .04040689
        2006 505 1 .4330326 -.8369422 .14743783 .53880733 .3137549 .0769914 .6763399 .03968019
        2011 505 1 .5488566 -.5999181 .14722154 .5331838 .3195947 .08229697 .6542303 .04663203
        2016 505 1 .4587387 -.7792745 .14698625 .5696158 .28339794 .071674705 .6350793 .031702273
        1981 505 2 .6901226 -.370886 .1299915 .29566696 .57434154 .03738318 .7527612 .05522515
        1986 505 2 .4714794 -.7518798 .13176778 .3307241 .53750813 .04631441 .7221135 .05088063
        1991 505 2 .438704 -.8239304 .13211314 .3484193 .51946753 .05858855 .7426764 .03195739
        1996 505 2 .4902787 -.7127813 .13296309 .40869 .4583469 .0663182 .7245998 .035282586
        2001 505 2 .54143035 -.6135408 .1393233 .4611885 .3994882 .07003119 .6991777 .03799263
        2006 505 2 .5986734 -.51303905 .1352409 .4460748 .4186843 .08046254 .6947724 .0373404
        2011 505 2 .6998247 -.3569254 .13381566 .4130806 .4531037 .08397255 .686661 .034158003
        2016 505 2 .4340231 -.8346574 .13633952 .4302387 .4334218 .05437597 .709218 .02822372
        end
        label values cmap cmapl
        label def cmapl 205 "Halifax", modify
        label def cmapl 421 "Québec", modify
        label def cmapl 462 "MTL-Montréal", modify
        label def cmapl 499 "Sherbrooke-Trois-Rivières", modify
        label def cmapl 505 "Ottawa-Gatineau", modify
        label values dveduc3 dveduc3l
        label def dveduc3l 1 "HS_&_Less", modify
        label def dveduc3l 2 "College", modify
        label def dveduc3l 3 "University", modify
        [/CODE]

        Comment


        • #5
          OK, that's much clearer. So your variable really is an average of different numbers of observations in each cell.

          I don't get why you are doing your -collapse- -if n == 1- and -if n == 0- separately. That leaves you with two separate data sets that you then have to append together. It is easier to just do the whole thing as -collapse … by(n)- and then you will have the results you want all in one data set at the end.

          Now, to turn to your original question, the way to get the count of the number of people that contributed to each average is to include it in the collapse command. If there is a variable that you know with certainty never has missing values, that can be the key ingredient. If there is no such variable you can create one:

          Code:
          gen never_missing = 1
          Then your collapse command becomes
          Code:
          collapse (count) never_missing  (mean) variables_to_be_averaged, by(n t cmap dveduc3)
          (If you have a variable that you are sure is never missing, then substitute that for never_missing in the -collapse- command, no need to create it.) The resulting collapsed variable will be a count of the number of observations in each n-t-cmap-dveduc3 cell and can be used as an -aweight-.

          Comment


          • #6
            Thank you so much for your help. Very appreciated Mr Clyde Schechter.

            Comment

            Working...
            X