Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Parsing and then Merging CPS data

    I am looking at the effects of the Affordable Care Act on the job mobility of young adults using CPS data, looking at the variable EMPSAME, which indicates whether or not the respondent was employed by the same employer and the same job he/she reported working as his/her main job in the previous month's survey.

    Right now, I have the data all in one big dataset, and am curious at looking at the change in responses to EMPSAME across the first four "waves" (defined as MISH in the dataset (month in sample household)) of a survey. I have the variables as shown below along with some demographic variables such as state, race, education, and marital status that have all been modified such that the response in the first wave remains the same for the three subsequent waves.



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    
    clear
    
    input int year byte(month mish age) double cpsidp byte empsame
    
    2008 1 1 25 20080100006601 99
    
    2008 2 2 25 20080100006601  2
    
    2008 3 3 25 20080100006601  2
    
    2008 4 4 25 20080100006601  2
    
    2008 1 1 26 20080100006602 99
    
    2008 2 2 27 20080100006602  2
    
    2008 3 3 27 20080100006602  2
    
    2008 4 4 27 20080100006602  2
    
    2008 4 4 22 20080100006606 99
    
    2008 1 1 16 20080100007203 99
    
    2008 2 2 16 20080100007203 99
    
    2008 3 3 16 20080100007203 99
    
    2008 4 4 16 20080100007203 99
    
    2008 1 1 21 20080100007204 99
    
    2008 2 2 21 20080100007204 99
    
    2008 3 3 22 20080100007204 99
    
    2008 4 4 22 20080100007204 99
    
    2008 1 1 18 20080100008002 99
    
    2008 2 2 18 20080100008002 99
    
    2008 3 3 18 20080100008002 99
    
    2008 4 4 18 20080100008002 99
    
    2008 1 1 19 20080100008003 99
    
    2008 2 2 19 20080100008003 99
    
    2008 3 3 19 20080100008003 99
    
    2008 4 4 20 20080100008003 99
    
    2008 1 1 21 20080100008004 99
    
    2008 2 2 22 20080100008004 99
    
    2008 3 3 22 20080100008004 99
    
    2008 4 4 22 20080100008004 99
    
    2008 1 1 30 20080100010701 99
    
    2008 2 2 31 20080100010701  2
    
    2008 3 3 31 20080100010701  2
    
    2008 4 4 31 20080100010701  2
    
    2008 1 1 26 20080100010702 99
    
    2008 2 2 26 20080100010702  2
    
    2008 3 3 27 20080100010702  2
    
    2008 4 4 27 20080100010702  2
    
    2008 1 1 32 20080100010703 99
    
    2008 2 2 32 20080100010703  2
    
    2008 3 3 32 20080100010703  2
    
    2008 4 4 32 20080100010703  2
    
    2008 1 1 20 20080100016703 99
    
    2008 2 2 20 20080100016703  2
    
    2008 3 3 20 20080100016703  2
    
    2008 4 4 20 20080100016703 99
    
    2008 1 1 22 20080100016704 99
    
    2008 2 2 22 20080100016704  2
    
    2008 3 3 22 20080100016704 99
    
    2008 4 4 22 20080100016704 99
    
    2008 1 1 18 20080100020104 99
    
    2008 2 2 18 20080100020104 99
    
    2008 1 1 32 20080100022101 99
    
    2008 2 2 32 20080100022101  2
    
    2008 1 1 16 20080100022605 99
    
    2008 2 2 16 20080100022605 99
    
    2008 3 3 16 20080100022605 99
    
    2008 4 4 17 20080100022605 99
    
    2008 2 2 19 20080100028903 99
    
    2008 3 3 19 20080100028903 99
    
    2008 4 4 19 20080100028903 99
    
    2008 1 1 28 20080100032901 99
    
    2008 2 2 29 20080100032901  2
    
    2008 3 3 29 20080100032901  2
    
    2008 4 4 29 20080100032901  2
    
    2008 1 1 21 20080100033801 99
    
    2008 2 2 22 20080100033801  2
    
    2008 3 3 22 20080100033801  2
    
    2008 4 4 22 20080100033801  2
    
    2008 1 1 20 20080100033802 99
    
    2008 1 1 26 20080100033803 99
    
    2008 2 2 26 20080100033803 99
    
    2008 3 3 26 20080100033803 99
    
    2008 4 4 26 20080100033803 99
    
    2008 3 3 20 20080100033804 99
    
    2008 4 4 20 20080100033804 99
    
    2008 1 1 20 20080100038203 99
    
    2008 2 2 20 20080100038203 99
    
    2008 3 3 20 20080100038203 99
    
    2008 4 4 20 20080100038203 99
    
    2008 1 1 23 20080100038801 99
    
    2008 2 2 23 20080100038801 99
    
    2008 3 3 23 20080100038801 99
    
    2008 4 4 23 20080100038801 99
    
    2008 1 1 28 20080100038802 99
    
    2008 2 2 28 20080100038802  2
    
    2008 3 3 28 20080100038802  2
    
    2008 4 4 28 20080100038802  2
    
    2008 3 3 16 20080100043102 99
    
    2008 4 4 16 20080100043102 99
    
    2008 1 1 18 20080100050902 99
    
    2008 2 2 18 20080100050902 99
    
    2008 1 1 29 20080100051101 99
    
    2008 2 2 29 20080100051101  2
    
    2008 3 3 29 20080100051101  2
    
    2008 4 4 29 20080100051101  2
    
    2008 1 1 32 20080100051102 99
    
    2008 2 2 32 20080100051102  2
    
    2008 3 3 32 20080100051102  2
    
    2008 4 4 32 20080100051102  2
    
    2008 1 1 32 20080100062001 99
    
    end
    
    label values month month_lbl
    
    label def month_lbl 1 "January", modify
    
    label def month_lbl 2 "February", modify
    
    label def month_lbl 3 "March", modify
    
    label def month_lbl 4 "April", modify
    
    label values mish mish_lbl
    
    label def mish_lbl 1 "One", modify
    
    label def mish_lbl 2 "Two", modify
    
    label def mish_lbl 3 "Three", modify
    
    label def mish_lbl 4 "Four", modify
    
    label values age age_lbl
    
    label def age_lbl 16 "16", modify
    
    label def age_lbl 17 "17", modify
    
    label def age_lbl 18 "18", modify
    
    label def age_lbl 19 "19", modify
    
    label def age_lbl 20 "20", modify
    
    label def age_lbl 21 "21", modify
    
    label def age_lbl 22 "22", modify
    
    label def age_lbl 23 "23", modify
    
    label def age_lbl 25 "25", modify
    
    label def age_lbl 26 "26", modify
    
    label def age_lbl 27 "27", modify
    
    label def age_lbl 28 "28", modify
    
    label def age_lbl 29 "29", modify
    
    label def age_lbl 30 "30", modify
    
    label def age_lbl 31 "31", modify
    
    label def age_lbl 32 "32", modify
    
    label values empsame empsame_lbl
    
    label def empsame_lbl 2 "Yes", modify
    
    label def empsame_lbl 99 "NIU", modify


    How can I parse the dataset by waves such that I can then merge them back and have the dataset sorted by CPSIDP and have the responses to EMPSAME all in one row? Would it make sense to make a variable denoting the change in responses to EMPSAME before the parsing or once I have remerged the dataset (I just need to know if there was a change from "Yes" to "No" or "NIU" within those four waves for each participant). The "1" (No) response to EMPSAME exists, it's just not in this sample from the dataset, not sure if that would matter.

    For the parsing part, I am very new to loops so I tried the following but keep getting an invalid name error that I haven't been able to figure out:

    Code:
    forval i=1/4{
    use bigdata.dta
    keep if mish=='i'
    sort cpsidp 
    save data 'i'.dta
    Furthermore, there are responses in which a person does not show up for all four waves, will that conflict with the parsing & merge, and if so, how can I drop those observations from the dataset?

    Any help is much appreciated. Thanks!

  • #2
    You can do this data reorganization with -reshape wide- (see -help reshape- if you decide to proceed with it.) But I think you will probably regret doing it. Most Stata commands are optimized to work with data in the long layout you already have. Most analyses are difficult or impossible in the wide layout you are asking for. And unless you expect to need separate data sets for each wave for some other purpose, I certainly wouldn't waste time and disk space creating them. You don't need to do that to reorganize the data.

    The code below shows how to create new variables that mark a change in empsame from no to yes, both at the current observation, and at any time for the same respondent. Note that I have changed the coding of empsame from 1/2/99 to 0/1/missing value--which will work much better for almost any analysis you want to do with this variable. 1/2 coding of binary variables is really awkward to use in Stata: very few commands work well with it. And using numeric codes like 99 for missing value is a recipe for all kinds of errors.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    
    clear
    
    input int year byte(month mish age) double cpsidp byte empsame
    
    2008 1 1 25 20080100006601 99
    
    2008 2 2 25 20080100006601  2
    
    2008 3 3 25 20080100006601  2
    
    2008 4 4 25 20080100006601  2
    
    2008 1 1 26 20080100006602 99
    
    2008 2 2 27 20080100006602  2
    
    2008 3 3 27 20080100006602  2
    
    2008 4 4 27 20080100006602  2
    
    2008 4 4 22 20080100006606 99
    
    2008 1 1 16 20080100007203 99
    
    2008 2 2 16 20080100007203 99
    
    2008 3 3 16 20080100007203 99
    
    2008 4 4 16 20080100007203 99
    
    2008 1 1 21 20080100007204 99
    
    2008 2 2 21 20080100007204 99
    
    2008 3 3 22 20080100007204 99
    
    2008 4 4 22 20080100007204 99
    
    2008 1 1 18 20080100008002 99
    
    2008 2 2 18 20080100008002 99
    
    2008 3 3 18 20080100008002 99
    
    2008 4 4 18 20080100008002 99
    
    2008 1 1 19 20080100008003 99
    
    2008 2 2 19 20080100008003 99
    
    2008 3 3 19 20080100008003 99
    
    2008 4 4 20 20080100008003 99
    
    2008 1 1 21 20080100008004 99
    
    2008 2 2 22 20080100008004 99
    
    2008 3 3 22 20080100008004 99
    
    2008 4 4 22 20080100008004 99
    
    2008 1 1 30 20080100010701 99
    
    2008 2 2 31 20080100010701  2
    
    2008 3 3 31 20080100010701  2
    
    2008 4 4 31 20080100010701  2
    
    2008 1 1 26 20080100010702 99
    
    2008 2 2 26 20080100010702  2
    
    2008 3 3 27 20080100010702  2
    
    2008 4 4 27 20080100010702  2
    
    2008 1 1 32 20080100010703 99
    
    2008 2 2 32 20080100010703  2
    
    2008 3 3 32 20080100010703  2
    
    2008 4 4 32 20080100010703  2
    
    2008 1 1 20 20080100016703 99
    
    2008 2 2 20 20080100016703  2
    
    2008 3 3 20 20080100016703  2
    
    2008 4 4 20 20080100016703 99
    
    2008 1 1 22 20080100016704 99
    
    2008 2 2 22 20080100016704  2
    
    2008 3 3 22 20080100016704 99
    
    2008 4 4 22 20080100016704 99
    
    2008 1 1 18 20080100020104 99
    
    2008 2 2 18 20080100020104 99
    
    2008 1 1 32 20080100022101 99
    
    2008 2 2 32 20080100022101  2
    
    2008 1 1 16 20080100022605 99
    
    2008 2 2 16 20080100022605 99
    
    2008 3 3 16 20080100022605 99
    
    2008 4 4 17 20080100022605 99
    
    2008 2 2 19 20080100028903 99
    
    2008 3 3 19 20080100028903 99
    
    2008 4 4 19 20080100028903 99
    
    2008 1 1 28 20080100032901 99
    
    2008 2 2 29 20080100032901  2
    
    2008 3 3 29 20080100032901  2
    
    2008 4 4 29 20080100032901  2
    
    2008 1 1 21 20080100033801 99
    
    2008 2 2 22 20080100033801  2
    
    2008 3 3 22 20080100033801  2
    
    2008 4 4 22 20080100033801  2
    
    2008 1 1 20 20080100033802 99
    
    2008 1 1 26 20080100033803 99
    
    2008 2 2 26 20080100033803 99
    
    2008 3 3 26 20080100033803 99
    
    2008 4 4 26 20080100033803 99
    
    2008 3 3 20 20080100033804 99
    
    2008 4 4 20 20080100033804 99
    
    2008 1 1 20 20080100038203 99
    
    2008 2 2 20 20080100038203 99
    
    2008 3 3 20 20080100038203 99
    
    2008 4 4 20 20080100038203 99
    
    2008 1 1 23 20080100038801 99
    
    2008 2 2 23 20080100038801 99
    
    2008 3 3 23 20080100038801 99
    
    2008 4 4 23 20080100038801 99
    
    2008 1 1 28 20080100038802 99
    
    2008 2 2 28 20080100038802  2
    
    2008 3 3 28 20080100038802  2
    
    2008 4 4 28 20080100038802  2
    
    2008 3 3 16 20080100043102 99
    
    2008 4 4 16 20080100043102 99
    
    2008 1 1 18 20080100050902 99
    
    2008 2 2 18 20080100050902 99
    
    2008 1 1 29 20080100051101 99
    
    2008 2 2 29 20080100051101  2
    
    2008 3 3 29 20080100051101  2
    
    2008 4 4 29 20080100051101  2
    
    2008 1 1 32 20080100051102 99
    
    2008 2 2 32 20080100051102  2
    
    2008 3 3 32 20080100051102  2
    
    2008 4 4 32 20080100051102  2
    
    2008 1 1 32 20080100062001 99
    
    end
    
    label values month month_lbl
    
    label def month_lbl 1 "January", modify
    
    label def month_lbl 2 "February", modify
    
    label def month_lbl 3 "March", modify
    
    label def month_lbl 4 "April", modify
    
    label values mish mish_lbl
    
    label def mish_lbl 1 "One", modify
    
    label def mish_lbl 2 "Two", modify
    
    label def mish_lbl 3 "Three", modify
    
    label def mish_lbl 4 "Four", modify
    
    label values age age_lbl
    
    label def age_lbl 16 "16", modify
    
    label def age_lbl 17 "17", modify
    
    label def age_lbl 18 "18", modify
    
    label def age_lbl 19 "19", modify
    
    label def age_lbl 20 "20", modify
    
    label def age_lbl 21 "21", modify
    
    label def age_lbl 22 "22", modify
    
    label def age_lbl 23 "23", modify
    
    label def age_lbl 25 "25", modify
    
    label def age_lbl 26 "26", modify
    
    label def age_lbl 27 "27", modify
    
    label def age_lbl 28 "28", modify
    
    label def age_lbl 29 "29", modify
    
    label def age_lbl 30 "30", modify
    
    label def age_lbl 31 "31", modify
    
    label def age_lbl 32 "32", modify
    
    label values empsame empsame_lbl
    
    label def empsame_lbl 2 "Yes", modify
    
    label def empsame_lbl 99 "NIU", modify
    
    format cpsidp %20.0f
    
    recode empsame (1 = 0) (2 = 1) (99 = .), gen(same_employer)
    label define boolean    0    "No"    1    "Yes"
    label values same_employer boolean
    
    by cpsidp (mish), sort: gen byte yes_to_no = same_employer == 0 & same_employer[_n-1] == 1
    by cpsidp: egen any_yes_to_no = max(yes_to_no)
    Of course, since ther are no "No" responses to empsame in your example, both of these newly created variables are 0 for every observation.


    Comment


    • #3
      Great, thanks!

      What would be the way to three new variables to see employment change between waves, (i.e., between waves 1 & 2, 2 &3, and 3 & 4)? That's why I wanted to see if I could do a wide layout, so as to have three new columns indicating if an individual has switched jobs between each waves as opposed to at all (though that variable will still be useful in my analysis). I wouldn't want the variable 'mish' anymore or empsame, as I only care about the change in waves, and can get rid of the observations that are for the same person (merge by cpsidp). Although age changes in the dataset example I gave, I have another variable which keeps the age recorded in the first wave.
      year month age cpsidp any_yes_to_no change1to2 change2to3 change3to4
      I hope that makes sense! Thank you very much.
      Last edited by Sebastian Lara; 13 Jan 2018, 15:37.

      Comment


      • #4
        So, starting from where the code left off before, just do:

        Code:
        keep cpsidp mish yes_to_no any_yes_to_no
        drop if mish == 1 // NO CHANGE POSSIBLE IN FIRST WAVE
        reshape wide yes_to_no, i(cpsidp) j(mish)
        forvalues i = 2/4 {
            rename yes_to_no`i' changed_`=`i'-1'_to_`i'
        }

        Comment

        Working...
        X