Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Assigning state-level values to individual observations by year

    Hi, I am working with a large Stata dataset that contains individual-level information (file1.dta), for example
    Name State Salary Year Height
    John Nebraska 35000 2016 178
    Mary Texas 42000 2014 157
    I would like to merge this dataset with a state-level file (file2.xls) that contains data on GDP, population, and so on for each state per year. The second file looks like this:
    State GDP 2014 GDP 2016 Population 2014 Population 2016
    Nebraska 20 25 100 150
    Texas 30 35 200 300
    The final file should look as follows:
    Name State Salary Year Height GDP Population
    John Nebraska 35000 2016 178 25 150
    Mary Texas 42000 2014 157 30 200
    I wonder what is the easiest way to achieve this. Any suggestion would be most welcomed. The individual data file has 42 million observation, so a manual approach is unfeasible.

    Thanks.

  • #2
    You first need to reshape file2.xls to long format. One way to do this is to import into Stata using -import excel-. Assuming the variable names are in this format: gdp2014, population2014, etc., after you import them, you can reshape the data using this code:
    Code:
    reshape long gdp population,i(state) j(year)
    You can merge the two datasets once you have file2.dta in long format. In file1:
    Code:
    merge 1:1 state year using "file2.dta"

    Comment


    • #3
      Many thanks for your kind reply Ali. I am a bit lost on how to do the first step. I tried and I got the following error:

      Code:
      variable state not found
      I am not sure what should I be writing instead of i(state). Apologies for the confusion. Thanks.

      Comment


      • #4
        When you imported file2.xls, you may have kept the variable name as State, instead of state. In that case, you can use
        Code:
        rename State state
        to rename it.

        Comment


        • #5
          Many thanks for your help so far. I still get the error "Variable State not found", even though both files have a State variable which seems ok. I attach three screenshots:
          Click image for larger version

Name:	error1.png
Views:	1
Size:	184.4 KB
ID:	1582611


          error2.png

          Attached Files

          Comment


          • #6
            See how the variable State is defined in both data sets.
            Click image for larger version

Name:	error4.png
Views:	1
Size:	153.2 KB
ID:	1582621

            Comment


            • #7
              My only thought is that in the last screenshot you seem to be trying to merge using "broadband.dta," but when you sum the State variable you're using "broadbandlong.dta". Perhaps that has something to do with it?
              Last edited by Ali Atia; 20 Nov 2020, 07:20.

              Comment


              • #8
                I cannot read your screenshots (please read the FAQ) but I notice that in your command you type "state" in lower case, while in #6 you type "State" - Stata will treat these as different

                Comment


                • #9
                  Thanks Ali, again most helpful. I have a final question that is somewhat independent of this. I managed to save the file with individual-level variables and state=level variables. I have millions of observations for each US state for around 30 years. Some variables are the same for all people in the state for a year, such as temperature.

                  I want to run now a regression where the unit of observation is the temperature at each year and at each state, is explained by variables that are variant across state but not time (say dimension of state) and some variables that are varying across state and years (say population density). How would you run such regression? Ideally it would be something like (here is a very naive approach):

                  reg temperature dimensionstate population density

                  Thanks for your helpful comments.

                  Comment


                  • #10
                    Thanks Ali, again most helpful. I have a final question that is somewhat independent of this. I managed to save the file with individual-level variables and state=level variables. I have millions of observations for each US state for around 30 years. Some variables are the same for all people in the state for a year, such as temperature.

                    I want to run now a regression where the unit of observation is the temperature at each year and at each state, is explained by variables that are variant across state but not time (say dimension of state) and some variables that are varying across state and years (say population density). How would you run such regression? Ideally, it would be something like (here is a very naive approach):

                    reg temperature dimensionstate population density

                    Thanks for your helpful comments.

                    Comment

                    Working...
                    X