Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a categorical variable from multiple variables

    I am working with county-level panel data. I am currently trying to create covariates for my regression model for education, age, and gender. Each variable reflects either the number of persons or % of persons in the county for the respective variable. What is challenging is that for example the education categories are three separate variables; each variable represents the number of residents in the county that had education at high school, some college, or college plus level. I would like to create one variable for education that represent each category of education (high school, high school plus, college).

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte year float(n_county edu_hsplus edu_hs edu_college) long(age_65y age_4064y age_19y age_1864y popmale popfemale)
    12  1  31629  4528  8721  47816 18352 15066  33559  27053  28461
    13  1  31629  4528  8721  47432 18392 14638  33539  26796  28450
    14  1  31629  4528  8721  47430 18487 14525  33627  26943  28452
    15  1  31629  4528  8721  47200 18593 14295  33642  26857  28490
    16  1  31629  4528  8721  47077 18543 14175  33628  26994  28422
    12  2 125758 13956 41289 154386 65150 44579 111922  93035  97755
    13  2 125758 13956 41289 157864 66573 45414 114630  95487 100053
    14  2 125758 13956 41289 160491 67839 45897 116765  97597 102514
    15  2 125758 13956 41289 162707 68690 46526 118459  99286 104423
    16  2 125758 13956 41289 165740 69880 47251 120771 101241 107322
    12  3  13563  4824  2366  19785  8099  6005  14129  14602  12599
    13  3  13563  4824  2366  19569  7928  5987  13927  14482  12594
    14  3  13563  4824  2366  19176  7739  5871  13586  14354  12533
    15  3  13563  4824  2366  18647  7454  5751  13154  14148  12341
    16  3  13563  4824  2366  17997  7179  5549  12725  13779  12186
    12  4  12705  3040  1885  17392  7055  5017  12648  12195  10402
    13  4  12705  3040  1885  17276  7097  4942  12591  12158  10354
    14  4  12705  3040  1885  17126  7106  4830  12520  12168  10338
    15  4  12705  3040  1885  17097  7087  4807  12541  12170  10413
    16  4  12705  3040  1885  17004  7000  4772  12458  12145  10498
    12  5  31581  7882  5151  48350 19450 14463  34641  28603  29223
    13  5  31581  7882  5151  48205 19467 14355  34565  28616  29256
    14  5  31581  7882  5151  47714 19424 14133  34274  28553  29166
    15  5  31581  7882  5151  47330 19283 13899  34126  28495  29178
    16  5  31581  7882  5151  47086 19158 13877  33891  28498  29206
    12  6   4881  2452   752   7418  2890  2309   5240   5701   4773
    13  6   4881  2452   752   7512  2908  2335   5304   5833   4806
    14  6   4881  2452   752   7620  2874  2385   5363   5893   4871
    15  6   4881  2452   752   7446  2882  2272   5300   5857   4839
    16  6   4881  2452   752   7095  2684  2223   4991   5600   4762
    12  7  11289  2642  2243  16691  6794  4818  12074   9498  10809
    13  7  11289  2642  2243  16658  6659  4930  11979   9437  10828
    14  7  11289  2642  2243  16587  6597  4948  11881   9421  10875
    15  7  11289  2642  2243  16320  6469  4872  11696   9371  10783
    16  7  11289  2642  2243  15965  6293  4808  11414   9322  10676
    12  8  64555 13894 13758  97209 38913 27526  71080  56483  60813
    13  8  64555 13894 13758  96405 38774 26851  70880  56400  60336
    14  8  64555 13894 13758  95267 38265 26435  70047  55880  60036
    15  8  64555 13894 13758  94447 37957 26350  69434  55755  59865
    16  8  64555 13894 13758  92747 37427 25732  68386  55138  59473
    12  9  19122  4708  2978  27907 11855  7776  20552  16256  17808
    13  9  19122  4708  2978  27993 11908  7693  20749  16384  17778
    14  9  19122  4708  2978  27678 11773  7567  20516  16248  17828
    15  9  19122  4708  2978  27504 11658  7493  20414  16340  17783
    16  9  19122  4708  2978  26987 11421  7323  20056  16243  17600
    12 10  15195  3499  2610  20877  9630  5623  15539  12923  13098
    13 10  15195  3499  2610  20855  9537  5658  15494  13006  13197
    14 10  15195  3499  2610  20495  9369  5516  15261  12959  13078
    15 10  15195  3499  2610  20159  9287  5398  15029  12839  13020
    16 10  15195  3499  2610  19890  9175  5299  14887  12769  12956
    12 11  23802  5918  4417  37157 14440 11153  26536  21634  22185
    13 11  23802  5918  4417  37156 14609 11020  26670  21611  22340
    14 11  23802  5918  4417  36989 14576 11011  26516  21619  22312
    15 11  23802  5918  4417  36782 14520 11002  26306  21596  22347
    16 11  23802  5918  4417  36457 14496 10813  26146  21605  22336
    12 12   7450  1998  1130  10910  4826  3102   7983   6508   7125
    13 12   7450  1998  1130  10676  4753  2939   7883   6379   7047
    14 12   7450  1998  1130  10519  4689  2872   7817   6325   6998
    15 12   7450  1998  1130  10315  4618  2793   7673   6252   6918
    16 12   7450  1998  1130  10068  4516  2742   7489   6180   6813
    12 13  13795  3227  2066  20665  8589  6233  14809  11888  13273
    13 13  13795  3227  2066  20697  8588  6120  14950  11924  13283
    14 13  13795  3227  2066  20338  8468  5892  14784  11765  13180
    15 13  13795  3227  2066  19950  8297  5690  14599  11665  13010
    16 13  13795  3227  2066  19498  8114  5565  14268  11539  12853
    12 14   7041  2398  1045  10858  4705  3093   7931   6640   6795
    13 14   7041  2398  1045  10838  4711  3039   7962   6640   6846
    14 14   7041  2398  1045  10821  4736  3027   7945   6693   6859
    15 14   7041  2398  1045  10731  4681  3003   7891   6688   6867
    16 14   7041  2398  1045  10533  4546  2910   7807   6616   6876
    12 15   7658  2676  1163  12130  5110  3549   8777   7361   7471
    13 15   7658  2676  1163  12209  5125  3534   8855   7440   7554
    14 15   7658  2676  1163  12202  5059  3619   8738   7482   7598
    15 15   7658  2676  1163  12087  5050  3597   8672   7424   7594
    16 15   7658  2676  1163  11992  5041  3553   8600   7421   7503
    12 16  28425  5145  7579  43121 16518 12768  31020  25371  25881
    13 16  28425  5145  7579  42583 16566 12555  30661  25311  25627
    14 16  28425  5145  7579  42368 16632 12488  30484  25180  25729
    15 16  28425  5145  7579  42545 16675 12645  30540  25418  25793
    16 16  28425  5145  7579  42419 16621 12685  30405  25319  25907
    12 17  31904  6344  7069  44451 18942 12351  32734  26211  28235
    13 17  31904  6344  7069  44385 18739 12380  32663  26227  28293
    14 17  31904  6344  7069  44201 18576 12316  32500  26290  28253
    15 17  31904  6344  7069  43866 18406 12174  32328  26174  28180
    16 17  31904  6344  7069  43493 18247 11965  32146  26158  28058
    12 18   6996  1812   763  10445  4527  3028   7595   6263   6718
    13 18   6996  1812   763  10346  4474  2965   7552   6223   6664
    14 18   6996  1812   763  10042  4373  2838   7378   6098   6572
    15 18   6996  1812   763   9940  4296  2820   7273   6091   6581
    16 18   6996  1812   763   9571  4128  2662   7054   5961   6434
    12 19   5767  2140   786   8924  4333  2218   6850   5438   5528
    13 19   5767  2140   786   8871  4321  2162   6842   5426   5472
    14 19   5767  2140   786   8586  4281  2073   6636   5492   5394
    15 19   5767  2140   786   8441  4231  2001   6557   5386   5338
    16 19   5767  2140   786   8221  4150  1921   6422   5369   5212
    12 20  21417  5051  3953  30480 12844  8820  22077  18365  19590
    13 20  21417  5051  3953  30305 12838  8669  22045  18340  19546
    14 20  21417  5051  3953  30216 12713  8684  21949  18361  19553
    15 20  21417  5051  3953  30017 12594  8648  21815  18239  19596
    16 20  21417  5051  3953  29485 12327  8472  21435  18073  19385
    end
    label values n_county n_county
    label def n_county 1 "Alabama Autauga", modify
    label def n_county 2 "Alabama Baldwin", modify
    label def n_county 3 "Alabama Barbour", modify
    label def n_county 4 "Alabama Bibb", modify
    label def n_county 5 "Alabama Blount", modify
    label def n_county 6 "Alabama Bullock", modify
    label def n_county 7 "Alabama Butler", modify
    label def n_county 8 "Alabama Calhoun", modify
    label def n_county 9 "Alabama Chambers", modify
    label def n_county 10 "Alabama Cherokee", modify
    label def n_county 11 "Alabama Chilton", modify
    label def n_county 12 "Alabama Choctaw", modify
    label def n_county 13 "Alabama Clarke", modify
    label def n_county 14 "Alabama Clay", modify
    label def n_county 15 "Alabama Cleburne", modify
    label def n_county 16 "Alabama Coffee", modify
    label def n_county 17 "Alabama Colbert", modify
    label def n_county 18 "Alabama Conecuh", modify
    label def n_county 19 "Alabama Coosa", modify
    label def n_county 20 "Alabama Covington", modify
    Any help would be appreciated

    Thanks,

    Using Stata 12.1 on Mac OSx

  • #2
    I don't understand what you want; I don't see how it is possible to do what you describe. Can you explain, for example, how you would calculate by hand the value of this new variable would be in the first observation of your example data?

    Comment


    • #3
      Thanks for responding. The way I conceptualize the new variable for example for 2011 the variable gender (for male/female) would be created by dividing the total persons who were male in 2011 over the total population for that year and the same would be for the female. I am having a hard time conceptualizing how I would include the separate variables for population male population female in a xtreg model and be able to interpret the results.

      Comment


      • #4
        Code:
        gen proportion_male = popmale/(popmale+popfemale)
        will do that. You cannot include both a male and female proportion in the model, as the two variables would always sum to 1, so one would be omitted due to collinearity.

        But you can't generalize that to a multi-level construct like education. You would need more than one variable for that purpose.

        Comment


        • #5
          I have previously created proportions for education using the following code:

          Code:
          egen edu_total = rowtotal(edu_*), missing
          foreach v of varlist edu_* {
              gen prop_`v' = `v'/edu_total
          }
          this generated proportions of the different education levels. But I presume that in building my model I would still have to include each variable separately:

          Code:
          xtset n_county
          xtreg opvisit i.fqhc i.year fqhc1##statemedicaid prop_edu_hs prop_edu_hsplus prop_male prop_female, fe //question: would I leave edu_college as reference variable?

          Comment


          • #6
            Yes, you have to leave out one of the education variables. The proportions will always sum to 1 (except possibly for very small rounding errors) and so be colinear. So you must leave one out.

            Now, you may be wondering, what happens if I put them all in? Probably Stata will figure out that they are colinear and will pick one to omit for you. But there is a small risk that due to rounding error, the sum will not always be exactly 1 and Stata will not pick up the collinearity. In that case, you will have a set of variables that are almost exactly colinear but not quite. That will result in their coefficients being estimated with very large standard errors. If these variables are being included only to adjust for their confounding effects, and you are not really interested in them directly, then that isn't a problem. But if you are actually interested in the effects of these variables on opvisit, then the results would not be usable. So just pick one of them to omit. It could be prop_edu_college, or it could be one of the others--it doesn't matter which.

            Comment


            • #7
              Thanks Clyde. You have been a great help.

              Comment

              Working...
              X