Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to control for age when I have the % of people at each age

    Hi, Currently my data looks like this

    [CODE]
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str43 name str9 code int year byte quarter float compost long inc float(hhsize popden)
    "Adur District Council" "E07000223" 2013 2 11.354862 19656 2.37 14.423
    "Adur District Council" "E07000223" 2016 2 13.037227 21528 2.39 14.772
    "Adur District Council" "E07000223" 2015 3 10.610566 21456 2.38 14.713
    "Adur District Council" "E07000223" 2012 2 9.733083 18881 2.36 14.276
    "Adur District Council" "E07000223" 2012 1 3.635365 18881 2.36 14.346
    "Adur District Council" "E07000223" 2014 1 4.6309547 19928 2.38 14.557
    "Adur District Council" "E07000223" 2016 3 12.508617 21528 2.39 14.772
    "Adur District Council" "E07000223" 2014 4 6.039579 19928 2.38 14.557
    "Adur District Council" "E07000223" 2012 4 5.109999 18881 2.36 14.276

    I would like to control for age somehow, and I currently have data on the percentage of people of each age, from 1-96 in each local authority (Adur is just the first local authority). Any suggestions on how I could use this data is greatly appreciated. Is there a way I can capture the age variation in the regression?
    The age data looks like this

    County Durham" 2012 1.0507407 1.1110009 1.0972707 1.0412058 1.0980335 1.0400616
    "Darlington" 2012 1.2217664 1.1626188 1.2032827 1.1681639 1.2790655 1.1459835
    "Hartlepool" 2012 1.1752046 1.1794244 1.2131826 1.1920837 1.1646552 1.1709849
    "Middlesbrough" 2012 1.3896546 1.468821 1.3076608 1.2991786 1.275146 1.2822144
    "Northumberland" 2012 .9301378 .9381616 .9684049 .9640844 1.0183991 .930755
    "Redcar and Cleveland" 2012 1.0885394 1.1505356 1.0813304 .9832898 1.0294266 1.0438443
    "Stockton-on-Tees" 2012 1.2718617 1.249387 1.25143 1.2555165 1.283099 1.227934
    "Tyne and Wear (Met County)" 2012 1.1399528 1.1167859 1.0998087 1.0817703 1.0952107 1.0589571
    "Gateshead" 2012 1.1080061 1.1783558 1.0913959 1.193989 1.0620835 1.1158228
    "Newcastle upon Tyne" 2012 1.1778002 1.2048107 1.182065 1.1145388 1.1408385 1.0484341
    "North Tyneside" 2012 1.1243176 1.1022534 1.0974568 1.0331827 1.1041721 1.0283861
    "South Tyneside" 2012 1.1034878 1.0304438 1.0526179 1.0226176 1.0734876 1.0500091
    "Sunderland" 2012 1.1567024 1.0422335 1.0514193 1.035874 1.0789765 1.0556588
    "NORTH WEST" 2012 1.219739 1.2045753 1.1805521 1.1674157 1.1848013 1.1503912
    "Blackburn with Darwen" 2012 1.5792457 1.4118727 1.4766623 1.5724968 1.5077072 1.515806
    "Blackpool" 2012 1.291145 1.1498393 1.1221323 1.1013521 1.0847279 1.1276737

    Thank You!!

  • #2
    Well, you will need to do a fair amount of work for this. First your age distribution file needs to be -reshape-d to long layout, so you have one observation for each local authority in each year and each age 1-96. Next you will need to convert those percentages of population to count of people. Then you will need to merge that (with local authority name, year, and age as the -merge- key variables) with your first dataset. At that point you can use -svyset- to specify that you want post-stratification using those population counts as post-stratification weights. From there, whatever analyses you do, you run with the -svy:- prefix.

    Comment


    • #3
      Thank you so much Clyde. I actually already have the data by number of people I had converted to percentages as I thought this was more intuitive. Sorry I am still a little confused and am new to stata. What is -svyset-? and post-stratification?

      Comment


      • #4
        -svyset- is a Stata command that enables you to specify departures from simple random sampling analyses. post-stratification is precisely what you are asking to do here: adjust an analysis to take into account an externally specified distribution of some attributes of a population. post-stratification is not a specific Stata command; it is a general statistical procedure which, in Stata, is handled using the post-stratification options of the -svyset- command.

        The details are too numerous and complicated to spell out here. Open the [SVY] volume of the PDF documentation that comes with your Stata installation and read the chapter about the -svyset- command. The documentation is well written, although if you do not have an understanding of the general statistical characteristics of survey designs, you may find it a bit rough going.

        Comment

        Working...
        X