Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Panel dataset

    Hi everyone! I am working with a panel survey dataset that has two time periods. The first time period has a variable- sweights, across villages/towns (variable- IDPSU) which were defined during the sampling process. However, for the second time period, there are missing values. Since the individuals were re-interviewed in the second round of the survey, I want the second time period to have the same values as the first. Before that, I need to check if there are any new values for IDPSU in the second time period. A sample of my dataset is as follows:
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long IDPSU float time double SWEIGHT
     10505 0  5485.06
     11204 0  2627.93
     11204 1        .
     11303 1        .
     20103 1        .
     30111 0  2651.22
     30707 1        .
     31003 0  2651.22
     31702 1        .
     31703 1        .
     40006 0  2651.22
     51202 1        .
     60504 0   1510.9
     60506 1        .
     61704 1        .
     61811 0  1970.48
     61906 1        .
     71003 0  3122.56
     80504 1        .
     80506 0  4544.97
     80601 1        .
     80611 0  2450.21
     80804 0  1743.46
     81402 1        .
     81406 1        .
     81504 0  6576.17
     81506 1        .
     82003 0  4365.67
     82009 0  2450.21
     83201 1        .
     83207 1        .
     90508 0  1637.63
     93403 1        .
     94102 1        .
     94502 1        .
     94908 0  5517.84
     96503 1        .
     96803 0  7040.17
    100203 0  9208.41
    101402 0  7890.09
    101403 1        .
    101701 1        .
    102601 0  2738.81
    103205 0  6999.92
    103212 1        .
    103503 0  2738.81
    110002 0  1811.89
    110002 0  1811.89
    190103 1        .
    190602 0  5665.89
    190609 1        .
    190907 0 10091.31
    191011 0  4051.64
    191112 0  4051.64
    191705 1        .
    200208 0  2738.81
    210708 1        .
    220702 0  3880.68
    221012 0  3951.82
    221401 1        .
    230206 0  3951.82
    230906 0  4769.44
    231703 0  6600.27
    231707 0  1926.59
    232003 1        .
    232107 0  1106.11
    232107 0  1106.11
    232307 0  4331.54
    232703 0  4025.77
    232805 0  9251.27
    232806 0  5884.07
    240710 1        .
    240718 0  4170.95
    240805 0  2399.94
    241505 0  4170.95
    241904 0  5095.63
    270704 0  5258.26
    270802 1        .
    271803 0  3058.23
    271806 1        .
    272108 1        .
    272602 1        .
    273005 0  3600.81
    273007 1        .
    273108 1        .
    273201 1        .
    280302 1        .
    281303 0 13597.28
    281305 0  8054.43
    282011 0  4637.38
    290208 0  3042.74
    290536 1        .
    290603 1        .
    291002 0  1087.28
    291802 1        .
    291808 0  3892.85
    291813 1        .
    292732 1        .
    331201 1        .
    331217 0  5234.43
    end
    time=1 is the second time period and time=0 is the first time period.
    I need help with these two things:
    1) Determine if there are new observations for IDPSU in the second time period.
    2) Assign values to SWEIGHT variable for the second time period as per given for the first time period
    Last edited by Yatharth Garg; 07 May 2021, 11:41. Reason: Correction

  • #2
    There are multiple observations for same IDPSU in the example data. Please check if that is correct. The code which can help you do
    1)
    Code:
     bysort IDPSU:gen new = 1 if IDPSU[_n] == IDPSU[_n+1] & time[_n] != time[_n+1]
    2)
    Code:
     bysort IDPSU:replace SWEIGHT = SWEIGHT[_n-1] if dup[_n-1] == 1

    Comment


    • #3
      Hi Mr Choudhary,
      Apologies but the first code is generating a different variable than what I need. I have listed the 'new' variable below. It sometimes has missing value for the same IDPSU's that it assigns value '1' for, irrespective of time. About your second code, again values of SWEIGHT depend upon IDPSU for the first time period. I am not sure if the code is working in this case. Could you perhaps have another solution which may help?
      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input long IDPSU float time double SWEIGHT float new
      10201 0 4752.38 1
      10201 1       . 1
      10201 0 4752.38 .
      10201 0 4752.38 1
      10201 1       . .
      10201 1       . 1
      10201 0 4752.38 1
      10201 1       . .
      10201 1       . 1
      10201 0 4752.38 1
      10201 1       . 1
      10201 0 4752.38 .
      10201 0 4752.38 1
      10201 1       . 1
      10201 0 4752.38 1
      10201 1       . .
      10201 1       . 1
      10201 0 4752.38 1
      10201 1       . .
      10201 1       . 1
      10201 0 4752.38 .
      10201 0 4752.38 .
      10201 0 4752.38 .
      10201 0 4752.38 1
      10201 1       . 1
      10201 0 4752.38 .
      10201 0 4752.38 1
      10201 1       . 1
      10201 0 4752.38 1
      10201 1       . 1
      10201 0 4752.38 1
      10201 1       . .
      10201 1       . .
      10201 1       . .
      10201 1       . .
      10201 1       . .
      10201 1       . 1
      10201 0 4752.38 .
      10201 0 4752.38 1
      10201 1       . .
      10201 1       . .
      10201 1       . .
      10201 1       . .
      10201 1       . 1
      10201 0 4752.38 1
      10201 1       . 1
      10201 0 4752.38 1
      10201 1       . .
      10201 1       . 1
      10201 0 4752.38 .
      10201 0 4752.38 1
      10201 1       . 1
      10201 0 4752.38 1
      10201 1       . .
      10201 1       . .
      10201 1       . 1
      10201 0 4752.38 1
      10201 1       . 1
      10201 0 4752.38 .
      10201 0 4752.38 .
      10201 0 4752.38 1
      10201 1       . .
      10201 1       . 1
      10201 0 4752.38 1
      10201 1       . 1
      10201 0 4752.38 1
      10201 1       . 1
      10201 0 4752.38 .
      10201 0 4752.38 1
      10201 1       . 1
      10201 0 4752.38 .
      10201 0 4752.38 1
      10201 1       . 1
      10201 0 4752.38 1
      10201 1       . .
      10201 1       . .
      10201 1       . .
      10201 1       . .
      10201 1       . .
      10201 1       . .
      10201 1       . .
      10201 1       . 1
      10201 0 4752.38 .
      10201 0 4752.38 .
      10201 0 4752.38 1
      10201 1       . 1
      10201 0 4752.38 .
      10201 0 4752.38 .
      10201 0 4752.38 1
      10201 1       . 1
      10201 0 4752.38 .
      10201 0 4752.38 .
      10201 0 4752.38 .
      10201 0 4752.38 .
      10201 0 4752.38 .
      10201 0 4752.38 1
      10201 1       . 1
      10201 0 4752.38 1
      10201 1       . .
      10201 1       . .
      end

      Comment


      • #4
        The dataset shared in #1 is different from the dataset you have in #3. I mean the pattern is different. I have made some random changes in the data to ensure that it actually works on the full version of data that you might have
        Code:
        replace IDPSU = 10345 in 51/100
        replace SWEIGHT = 1234 if IDPSU == 10345 & SWEIGHT !=.
        drop new
        bysort IDPSU: gen new = 1 if time == 1
        bysort IDPSU: egen SWEIGHT1 = mean(SWEIGHT)
        bysort IDPSU: replace SWEIGHT = SWEIGHT1 if time == 1 & SWEIGHT == .
        Let me know if this solved your problem. This is not the most elegant solution but should help you achieve the intended result.

        Comment

        Working...
        X