Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • PSID Individual Cross Sectional weights

    Dear Statalist Users:

    I am interested in the PSID location and mobility variables. I intend to use the variables in the PSID family and individual level data for 'current residence' and 'moved since last spring' to calculate the total numbers of individuals moving from, say, state i to state j, for all pairs of adjacent states in the US.

    For this work, I have selected the period 1997-2011 and I now have all the data downloaded in Excel. Since my analysis does not require a longitudinal angle, and is individual and not family-based, I am using the cross sectional, individual weights, for each year. So, now, my question is, if I wanted to calculate the above mentioned flows of movements of people, how exactly would I treat the individual cross sectional weights? In other words, does anybody know how those weights work for the PSID variables, and --in particular - how to sum single observations, each of which comes with a specific (individual) weight, and each of which -- in the example I have given above --- would represent one individual migrating from, say, AL to FL?

    Thank you so much for any help you provide here. It would be highly appreciated.

  • #2
    Elena,

    I wonder what you mean with "I have all the data downloaded in Excel". If I were you I would use Stata, especially since you have asked this on Statalist.

    To answer your question we need more information. What is the content of your variables and how do you have organised your data set (i.e. wide or long)? Your variable "moved since last spring" sounds like a binary response variable to me. If this is the case you will have to use the lagged values of current residence to get an answer to your research question -- and your research problem will be changed to a longitudinal one.

    If I were you, I would use my user written ado-package "psidtools" to do something like this:

    Code:
    * Install PSID data
    . psid insall
    
    * Extract the data
    . psid use || residence  [97]var1 [98]var2 ...
    
    * Make it long
    . psid long
    
    * Analyze
    . gen lag = residence[_n-1]
    . bys wave : tab lag residence [aw = weight]

    However this is only a startIng point. There is much left to say about gaps in the data, construction of the appropriate longitudinal weights and the merrits os Stata`s -xt- commands such as -xttrans- . -psidtools- can be installed with

    Code:
    . ssc install psidtools


    Comment


    • #3
      Ulrich: My objective is to find out the number of people moving from state i to state j in any given year. and their characteristics, such as employment status, age and similar items. For the PSID data, each observation for an individual comes with a cross sectional weight, so my question was simply how to use those weights to calculate the total numbers of movers.

      Comment


      • #4
        Assuming you have -- for a given year -- the variable origin for the state of origin and the variable destination for the state of destination, and you have crosssec as weighting variable, one way to get numbers is

        . tabulate origin destination [aw = crosssec]

        Whether these numbers answer your research question depend on the peculiarities of your data.
        Last edited by Ulrich Kohler; 25 Feb 2015, 09:03. Reason: hoppefully better formulation

        Comment


        • #5
          Thank you so much for your help! I am trying it, right today!

          Comment


          • #6
            Ulrich, that's an amazing program you wrote -- I wish I knew about it before I extracted/cleaned the PSID a year ago!

            How many people use the sampling weights? I've rarely seen papers use them because the weights have their own set of concerns and representativeness issues.

            Comment


            • #7
              I don't know.

              It seems economists tend to not use them, because they want to study causal effects using models of which they believe that they are correct. However, if you whish to describe a variable, a relationship, or a process for a fixed population during a fixed period of time which you have sampled with varying sampling probabilities, I think one should use correctly defined probability weights. There is an entertaining and instructive debate in Groves (2004) between a modeller (against using weights) and a describer (in favour of using weights) on that issue that is worth reading.

              My opinion on this is that the describers weights should be correct, just like the modellers theoretical models. In practice, neither of them are correct, really, because they simplify things, as they should. So you have to find your way between between Scylla and Charybdis .

              Uli



              Comment


              • #8
                Hello, Sorry for this intrusion but I need to ask a question relating to the PSID weights. For my research work on mobility, I need to understand how to use PSID longitudinal weights to run a panel regression. The problem -- so far --- has been the following: Whenever I try to use the longitudinal weights to run the 'xtreg' command, in Stata, I receive an error message saying that Stata needs the weights to be constant within groups, or a similar message. I feel I must be missing something simple. Since I need to control for individual specific effects, I need a panel regression, but as I noted, somehow the longitudinal weights from the PSID questionnaire, e.g., the variables known as "CORE/IMM INDIVIDUAL LONGITUDINAL WT," are not applicable in conjunction with the ‘xtreg’ command. Any help you can provide here is greatly appreciated. Thanks, Elena Quercioli
                Last edited by Elena Quercioli; 19 Jun 2015, 13:43.

                Comment


                • #9
                  Elena,

                  the longitudinal weights are organized such, that they represent the probability of being samples up to a given year. Therefore you should construct a variable that is is constant within respondents, holding the longitudinal weight for the persons last year of the observation period.

                  If the longitudinal weights are stored in the variable lweight, time is time, and respondents-id is id a variant of

                  by id (time): gen weight = lweight[_N]

                  should do the trick.

                  I wrote "a variant of" because some extra work might be necessary because of missing data or other unbalanced pane design problems might come up.

                  Uli


                  Comment


                  • #10
                    Ulrich, I just want to restate the thanks for that awesome program. This will make my life so much easier in the future!

                    Comment

                    Working...
                    X