Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Merging two cross sectional data as pseudo panel

    Hi everyone, I have an urgent request and I humbly request you help me out.
    I wish to merge two cross-sectional data so as to estimate the pre and post effect of a policy.
    I guess this is a pseudo panel data, and I want to know if it will be appropriate to use the merging command using the year of birth and sex as the one to one key variables.
    I would be glad to hear of different ways of doing this.
    Thank you for your timely response.

  • #2
    More information is needed to answer your question. It would be helpful if you showed brief, representative examples from both data sets. You also need to describe what kind o analysis you plan to do with the data. In general, if you want to make a pseudo-panel out of two cross-sections, the -append- command would more often be appropriate than -merge-. But perhaps you are trying to form matched pairs based on birth year and sex. There are ways to do that, and they may involve -merge-, but it would be surprising to see a data set in which year of birth and sex uniquely identify the observations, unless the observations are actually aggregate data.

    To show examples of your data, download and install the -dataex- command by running -ssc install dataex-. Read -help dataex- for the simple instructions, and then use it for each of the two data sets. Then describe the analysis you plan to do, specifically referring to the variables in your data by name. You will be able to get a more specific answer then.

    Comment


    • #3
      Akwa:
      as an aside to Clyde's excellent advice, it would also be interesting to know whether the two pseudo-panels refer to the same sample unit (i.e. the dependent variable is the gross domestic product of the same set of nations before and after hypothetical an embargo) or not.
      Kind regards,
      Carlo
      (Stata 18.0 SE)

      Comment


      • #4
        Thank you Clyde and Carlo. Yes the data is actually a household survey data of Uganda. Respondents were asked the same questions in both surveys and they contain demographic and economic information of the respondents. What i intend to do is to actually pair these respondents by their month and year of birth together with their sex and ethnic background. I am hoping there could be possible match. I intend to find the effect of a wage policy change for Public sector workers in Uganda which actually came before the first survey but after the second survey. So i want to know how to go about this. Though i can pool and use DID for this, I would like to support this with a matching technique by having the two income variables and running a propensity score matching too.
        Unfortunately Clyde, I don't have the data set with me now. I am not by the computer with the data. I will surely do as you suggested with the dataex. Thank you

        Comment


        • #5
          Akwa:
          thanks for providing more details.
          You have two surveys and respondents were not (necessarily) the same.
          Before any analysis, you should get familiar with Stata -svy- prefix.
          Kind regards,
          Carlo
          (Stata 18.0 SE)

          Comment


          • #6
            Thank you very Much Mr. Carlo lazarro.
            Yes please I checked the command svy, and also I read a paper by Deaton (1985). What i gathered was that, since it is a repeated cross section, using the ID from each survey to make a panel would be wrong since the person with the id from the first survey might have a different id in the second one hence the use of invariant characteristics like year of birth and sex. So merging the data has to be done using possible invariant variables.

            Comment


            • #7
              Akwa:
              I'm not sure I follow your last statements, whereas I agree that a panel cannot be created from your data, as the ids from the two surveys are, in all likelihood, different.
              Just an aside: as per FAQ, please post full reference of everything you quote, as that contribution might be useful for others on the list. Thanks.
              Kind regards,
              Carlo
              (Stata 18.0 SE)

              Comment


              • #8
                Thank you Carlo.
                Please this is the paper from Deaton "Deaton, Angus (1985), Panel Data from Time Series of Cross Sections, Journal of Econometrics,30, 109–126."

                Comment


                • #9
                  Akwa:
                  well done, thanks.
                  Kind regards,
                  Carlo
                  (Stata 18.0 SE)

                  Comment


                  • #10
                    If I understand it well, you need to create a pseudo panel data set from two sets of cross sectional data sets. It's understood that in both cross sectional data sets the observations are different individuals, and you need to create cohorts of individuals based on birth year and sex. Say you have variables x1 x2 x3, so the suggested codes is:

                    Code:
                    use data.dta, clear
                    collapse x1 x2 x3 if sex==0, by(birth_year)
                    save col1.dta, replace
                    use data.dta, clear
                    collapse x1 x2 x3 if sex==1, by(birth_year)
                    save col2.dta, replace
                    append using col1.dta
                    gen ID=_n
                    save pseudo.dta, replace
                    Now you have a pseudo panel data set where each observation (ID) is a cohort sharing the same sex and birth year. The total number of observation is now N(birth_year)*N(sex).

                    Comment


                    • #11
                      Hello Diana Abdwahab thank you for your suggestion. I am sorry for the late reply.

                      Comment


                      • #12
                        I have been working on this and came up with this command. I want to please know if this is the right way of forming pseudo panels as proposed by Deaton (1985)


                        Code:
                        clear
                        webuse nlswork
                        gen Byear= birth_yr
                        recode Byear (41/43=43) (54=53)
                        tab Byear
                        tab race
                        tab year
                        bysort Byear race year: egen newincome= mean(ln_wage)
                        bysort Byear race year: egen newgrade= mean( grade )
                        bysort Byear race year: egen newwks= mean( wks_work )
                        bysort Byear race year: egen newexp= mean(ttl_exp)
                        sum ln_wage grade wks_work ttl_exp newincome newgrade newwks newexp
                        egen Cohorts=group(Byear race)
                        xtset Cohorts
                        xtreg newincome newgrade newwks newexp,fe
                        estimates store FE1
                        xtset idcode
                        xtreg ln_wage grade wks_work ttl_exp,fe
                        estimates store FE2
                        esttab FE1 FE2
                        I look forward to hearing your take on this. Thank you.

                        Comment


                        • #13
                          hi every one i am P.hd economic scholar. My study on primary data unit of analysis is household. please guide me can i collect the data about two observation on income (past income when start the job and current income start of interview) at one point of time through questionnaire. because i can not use longitudinal data

                          Comment


                          • #14
                            please also guide me about pesuado panal data how can i use it. it is restriction for me to study on primary data. my topic has been approved by advance board no other option

                            Comment


                            • #15
                              for mobility analysis, it is required that data should be over the time period so that we can check the effect of current and past income

                              Comment

                              Working...
                              X