Replacing column values with values of another variable by household id

danishussalam

Join Date: Jul 2014

Posts: 140
#1

Replacing column values with values of another variable by household id

31 Mar 2020, 19:58

Hi everyone,

I have 5 variables in my dataset namely, household id, wealth rank, and household ids of three friends for the household being interviewed.

Is there a way I can create 3 separate variables one for each friend that simply replaces the household id with the wealth rank of that household. For instance, HH 1 is friends with hh3, hh4 and hh6. I'd like to have 3 columns with wrank values 700, 100 and 300.

I feel like this is something very simple but I just can't get my head around it. Any help in this regard will be much appreciated.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(hh wrank fr1 fr2 fr3) 1 500 3 4 6 2 600 1 6 . 3 700 4 . . 4 100 5 2 . 5 200 1 . . 6 300 4 5 1 end
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

31 Mar 2020, 20:22

Code:

* Example generated by -dataex-. To install: ssc install dataex clear* input float(hh wrank fr1 fr2 fr3) 1 500 3 4 6 2 600 1 6 . 3 700 4 . . 4 100 5 2 . 5 200 1 . . 6 300 4 5 1 end reshape long fr, i(hh) j(_j) frame put hh wrank _j, into(friends) frlink m:1 fr _j, frame(friends hh _j) frget fr_wrank = wrank, from(friends) frame drop friends drop friends reshape wide fr fr@_wrank, i(hh) j(_j) order wrank, after(hh)

Added: Depending on what you will be doing next, you might consider skiping the -reshape wide- at the end and leaving the data in long layout. Most Stata analysis and data management commands work best with long data. Unless you are pretty sure that your subsequent work with this data involves the exceptions to that rule, you will make your life a lot easier if you leave it long.

Last edited by Clyde Schechter; 31 Mar 2020, 20:25.
1 like
Comment
danishussalam

Join Date: Jul 2014

Posts: 140
#3

31 Mar 2020, 21:02

Thanks, Clyde Schechter for both the code and the suggestion. My next steps involve making a cross-tabulation of wealth ranks for household (i-j) and further check how the composition of networks by wealth rank has changed as a result of an anti-poverty program. I believe my dataset is already in the long format as I have three years which appear in three rows and the columns have a bunch of variables that were recorded for each of the three years.

It's just the network questions that appear in the format I've shared above for each of the 3 years. My plan was to analyze the network information at the dyad level but I'm not sure. Any suggestions?

P.S. It's a massive dataset with 22,000 households for each year and I don't have complete network information. So it's possible that one of the friends mentioned may not have been interviewed.
Comment
danishussalam

Join Date: Jul 2014

Posts: 140
#4

01 Apr 2020, 07:00

Clyde Schechter - There seems to be a problem with the code above. After this code
frget fr_wrank = wrank, from(friends) , it gives an error saying that wrank not found even though wrank is in the data.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30119

01 Apr 2020, 12:11

Re #4:

There seems to be a problem with the code above. After this code

frget fr_wrank = wrank, from(friends) , it gives an error saying that wrank not found even though wrank is in the data.

I don't know what to tell you. I can't replicate this error. I had tested the code on my setup before posting it. And I've rechecked it just now, and it runs without error.

Code:

. * Example generated by -dataex-. To install: ssc install dataex
. clear*

. input float(hh wrank fr1 fr2 fr3)

            hh      wrank        fr1        fr2        fr3
  1. 1 500 3 4 6
  2. 2 600 1 6 .
  3. 3 700 4 . .
  4. 4 100 5 2 .
  5. 5 200 1 . .
  6. 6 300 4 5 1
  7. end

.
. reshape long fr, i(hh) j(_j)
(note: j = 1 2 3)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        6   ->      18
Number of variables                   5   ->       4
j variable (3 values)                     ->   _j
xij variables:
                            fr1 fr2 fr3   ->   fr
-----------------------------------------------------------------------------

.
. frame put hh wrank _j, into(friends)

. frlink m:1 fr _j, frame(friends hh _j)
  (6 observations in frame default unmatched)

. frget fr_wrank = wrank, from(friends)
(6 missing values generated)
  (1 variable copied from linked frame)

.
. frame drop friends

. drop friends

.
. reshape wide fr fr@_wrank, i(hh) j(_j)
(note: j = 1 2 3)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                       18   ->       6
Number of variables                   5   ->       8
j variable (3 values)                _j   ->   (dropped)
xij variables:
                                     fr   ->   fr1 fr2 fr3
                               fr_wrank   ->   fr1_wrank fr2_wrank fr3_wrank
-----------------------------------------------------------------------------

. order wrank, after(hh)

Re #3:
You describe your data as already being in long layout because you have longitudinal data, with separate observations for the same person in each year. I would call that a hybrid half-long layout. It is long with respect to years and wide with respect to network links.

I haven't done any network analysis, so I don't know which layout of your data is best. If you have used Stata's network commands before with this kind of layout and it works well, then, indeed, stick with it.

Comment

danishussalam

Join Date: Jul 2014

Posts: 140
#6

02 Apr 2020, 18:20

Clyde Schechter Apologies for asking the same question again but it turns out my actual dataset is a bit messier than the example I posted. I've tried to work around your code (it started working when I restarted Stata) but couldn't adjust it to the new setup of the data.

I've attached a smaller piece of the dataset here and would really appreciate it if I can a similar code that works. The situation is as follows:

a) hhno doesn't uniquely identify my observations so I have similar hhnos in spotno (village). There is no unique identifier in the dataset. It's possible to create one by combining branchid, spotno (village), and hhno which will be unique only if I restrict my data to one year.

b) Not every household id that is mentioned as a network was interviewed meaning the dataset doesn't have wealth rank on every person that is reported as a network.

I'd still want to replace the household ids mentioned in the food_borrow_ variable with the wealth rank of those ids and report a missing or 0 for those that were not found.

clear
input byte branchid int(spotno hhno year food_borrow_netid1 food_borrow_netid2 food_borrow_netid3) byte wealth_rank
1 1 46 2007 45 . . 6
1 1 46 2009 205 44 45 6
1 1 57 2007 56 . . 6
1 1 57 2009 202 93 47 6
1 1 85 2007 77 76 78 6
1 1 85 2009 86 21 90 6
1 1 86 2007 77 . . 6
1 1 86 2009 16 77 87 6
1 1 87 2007 77 . . 6
1 1 87 2009 77 86 90 6
1 2 6 2007 4 . . 2
1 2 6 2009 1 5 4 2
1 2 26 2007 23 24 25 5
1 2 26 2009 23 24 25 5
1 2 30 2007 31 . . 5
1 2 30 2009 30 31 36 5
1 2 36 2007 39 . . 5
1 2 36 2009 36 37 39 5
1 2 39 2007 38 . . 5
1 2 39 2009 38 40 37 5
1 2 54 2007 210 211 . 5
1 2 54 2009 50 55 56 5
1 2 80 2007 60 . . 5
1 2 80 2009 60 81 83 5
1 3 6 2009 10 11 13 6
1 3 9 2009 7 11 12 5
1 3 16 2009 14 15 13 6
1 3 17 2009 15 14 13 6
1 3 27 2009 21 24 20 6
1 3 41 2007 36 49 50 6
1 3 41 2009 36 49 40 6
1 3 46 2007 49 50 63 5
1 3 46 2009 49 50 63 5
1 3 47 2007 48 49 50 6
1 3 47 2009 48 49 50 6
1 3 51 2007 53 56 60 6
1 3 51 2009 53 65 60 6
1 3 52 2009 49 50 53 6
1 3 78 2009 77 82 81 4
1 3 98 2007 101 102 104 6
1 3 98 2009 93 94 2 6
1 4 2 2007 6 . . 4
1 4 12 2007 13 14 15 5
1 4 12 2009 13 14 15 5
1 4 12 2009 12 13 15 5
1 4 17 2007 12 . . 5
1 4 17 2009 14 15 16 5
1 4 33 2007 28 24 30 5
1 4 33 2009 37 32 30 5
1 4 34 2007 35 36 . 5
1 4 34 2009 35 36 31 5
1 4 44 2007 43 . 45 3
1 4 44 2009 43 46 45 3
1 4 46 2007 43 44 45 5
1 4 46 2009 46 45 44 5
1 4 51 2007 53 54 55 5
1 4 51 2009 53 54 55 5
1 4 56 2007 57 58 . 5
1 4 56 2009 54 55 57 5
1 4 64 2007 60 . . 5
1 4 71 2007 74 . . 5
1 4 71 2009 20 69 67 5
1 4 78 2007 77 74 . 4
1 4 78 2009 77 79 81 4
1 4 83 2007 82 77 78 5
1 4 83 2009 82 77 78 5
1 4 86 2007 118 . . 5
1 4 86 2009 118 90 89 5
1 4 90 2007 88 . . 3
1 4 90 2009 89 88 91 3
1 4 110 2009 73 111 112 1
1 4 120 2007 116 117 . 4
1 4 120 2009 116 112 111 4
1 4 130 2007 128 . . 3
1 4 130 2009 128 126 127 3
1 4 131 2007 134 135 . 5
1 4 131 2009 136 139 138 5
1 5 16 2009 17 18 28 5
1 5 24 2009 25 26 27 3
1 5 32 2007 28 29 . 5
1 5 32 2009 31 33 34 5
1 5 35 2007 36 38 . 5
1 5 35 2009 37 38 34 5
1 5 56 2009 57 58 67 5
1 5 76 2009 75 77 78 2
end
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30119

02 Apr 2020, 18:49

Actually, even branched spotno hhno and year do not uniquely identify the observations in your data:

Code:

     +-----------------------------------------------------------------------------+
     | branchid   spotno   hhno   year   food_b~1   food_b~2   food_b~3   wealth~k |
     |-----------------------------------------------------------------------------|
 44. |        1        4     12   2009         13         14         15          5 |
 45. |        1        4     12   2009         12         13         15          5 |
     +-----------------------------------------------------------------------------+

shows two observations that agree on those variables, but contain contradictory information on the other variables. You must clean up your data set before you try to take this any farther. Performing complicated linkage operations on inconsistent data sets will only magnify their inconsistencies. Garbage in, garbage out, as they say. You need to chase down, in your real data set, all such problems and resolve them before proceeding.

Code:

duplicates tag branched spotno hhno year, gen(flag)
browse if flag

will show you all the offending observations. How you resolve them is not something I can help you with: it will depend on your understanding of this particular data set.

For the purposes of illustrating revised code that you can use for the original question once you clean up the data, I have removed one of these offending observations from your example.

Code:

clear*
* Example generated by -dataex-. To install: ssc install dataex
clear
input byte branchid int(spotno hhno year food_borrow_netid1 food_borrow_netid2 food_borrow_netid3) byte(wealth_rank flag)
1 1  46 2007  45   .   . 6 0
1 1  46 2009 205  44  45 6 0
1 1  57 2007  56   .   . 6 0
1 1  57 2009 202  93  47 6 0
1 1  85 2007  77  76  78 6 0
1 1  85 2009  86  21  90 6 0
1 1  86 2007  77   .   . 6 0
1 1  86 2009  16  77  87 6 0
1 1  87 2007  77   .   . 6 0
1 1  87 2009  77  86  90 6 0
1 2   6 2007   4   .   . 2 0
1 2   6 2009   1   5   4 2 0
1 2  26 2007  23  24  25 5 0
1 2  26 2009  23  24  25 5 0
1 2  30 2007  31   .   . 5 0
1 2  30 2009  30  31  36 5 0
1 2  36 2007  39   .   . 5 0
1 2  36 2009  36  37  39 5 0
1 2  39 2007  38   .   . 5 0
1 2  39 2009  38  40  37 5 0
1 2  54 2007 210 211   . 5 0
1 2  54 2009  50  55  56 5 0
1 2  80 2007  60   .   . 5 0
1 2  80 2009  60  81  83 5 0
1 3   6 2009  10  11  13 6 0
1 3   9 2009   7  11  12 5 0
1 3  16 2009  14  15  13 6 0
1 3  17 2009  15  14  13 6 0
1 3  27 2009  21  24  20 6 0
1 3  41 2007  36  49  50 6 0
1 3  41 2009  36  49  40 6 0
1 3  46 2007  49  50  63 5 0
1 3  46 2009  49  50  63 5 0
1 3  47 2007  48  49  50 6 0
1 3  47 2009  48  49  50 6 0
1 3  51 2007  53  56  60 6 0
1 3  51 2009  53  65  60 6 0
1 3  52 2009  49  50  53 6 0
1 3  78 2009  77  82  81 4 0
1 3  98 2007 101 102 104 6 0
1 3  98 2009  93  94   2 6 0
1 4   2 2007   6   .   . 4 0
1 4  12 2007  13  14  15 5 0
1 4  12 2009  13  14  15 5 1
1 4  17 2007  12   .   . 5 0
1 4  17 2009  14  15  16 5 0
1 4  33 2007  28  24  30 5 0
1 4  33 2009  37  32  30 5 0
1 4  34 2007  35  36   . 5 0
1 4  34 2009  35  36  31 5 0
1 4  44 2007  43   .  45 3 0
1 4  44 2009  43  46  45 3 0
1 4  46 2007  43  44  45 5 0
1 4  46 2009  46  45  44 5 0
1 4  51 2007  53  54  55 5 0
1 4  51 2009  53  54  55 5 0
1 4  56 2007  57  58   . 5 0
1 4  56 2009  54  55  57 5 0
1 4  64 2007  60   .   . 5 0
1 4  71 2007  74   .   . 5 0
1 4  71 2009  20  69  67 5 0
1 4  78 2007  77  74   . 4 0
1 4  78 2009  77  79  81 4 0
1 4  83 2007  82  77  78 5 0
1 4  83 2009  82  77  78 5 0
1 4  86 2007 118   .   . 5 0
1 4  86 2009 118  90  89 5 0
1 4  90 2007  88   .   . 3 0
1 4  90 2009  89  88  91 3 0
1 4 110 2009  73 111 112 1 0
1 4 120 2007 116 117   . 4 0
1 4 120 2009 116 112 111 4 0
1 4 130 2007 128   .   . 3 0
1 4 130 2009 128 126 127 3 0
1 4 131 2007 134 135   . 5 0
1 4 131 2009 136 139 138 5 0
1 5  16 2009  17  18  28 5 0
1 5  24 2009  25  26  27 3 0
1 5  32 2007  28  29   . 5 0
1 5  32 2009  31  33  34 5 0
1 5  35 2007  36  38   . 5 0
1 5  35 2009  37  38  34 5 0
1 5  56 2009  57  58  67 5 0
1 5  76 2009  75  77  78 2 0
end


reshape long food_borrow_netid, i(branchid spotno year hhno) j(_j)

frame put branchid spotno hhno year wealth_rank food_borrow_netid _j, into(friends)
frlink m:1 branchid spotno year food_borrow_netid _j, ///
    frame(friends branchid spotno year hhno _j)
frget food_borrow_netid_wealth_rank = wealth_rank, from(friends)

frame drop friends
drop friends

reshape wide food_borrow_netid food_borrow_netid@_wealth_rank, ///
    i(branchid spotno year hhno) j(_j)
order wealth_rank, after(hhno)

Comment

danishussalam

Join Date: Jul 2014

Posts: 140
#8

03 Apr 2020, 08:18

Clyde Schechter Thanks a bunch!
Comment
Romalpa Akzo

Join Date: Oct 2017

Posts: 369
#9

03 Apr 2020, 08:46

Cleaning up data, as advised by Professor Clyde, should be the very first step that you must do before moving on with any further steps.

When you are sure that branchid, spotno, hhno, and year have uniquely identify the observations, -rangestat- could give you a concise way to go.

Code:

*ssc install rangestat forval i = 1/3 { rangestat v`i'= wealth_rank, interval(hhno food_borrow_netid`i' food_borrow_netid`i') by(branchid spotno year) replace v`i' =. if food_borrow_netid`i' ==. }
1 like
Comment
danishussalam

Join Date: Jul 2014

Posts: 140
#10

03 Apr 2020, 16:39

Romalpa Akzo Thank you so much! works like a charm.
Comment

Announcement