Replicate Values from one column to the other in an n-dimension array

Enzo Wu

Join Date: Aug 2021

Posts: 9
#1

Replicate Values from one column to the other in an n-dimension array

23 Aug 2021, 03:22

Dear Statalists,

I am a new Mata user and attempt replicating values from one column to the other in an n-dimension array.

here's the summary of my data and code

DATA：

There is a 32142*85 dataset in MATA called psid.
The first column is id.
second to 42nd column are represent a variable called "RWH" for 41 years.
43rd to 83rd column are represent a variable called "FamNo" for 41 years.
84th column is income and 85th column is parent income.

PROBLEM：
I am trying to extract a value from column 84 through some conditions, and replicate the value to column 85.
It should be extracted a numerous of values from column 84; however column 85 is still empty.

CODE：

Code:

mata: psid = st_data( (1::32142), ( "indivID", "RelationWithHead*", "FamNo*", "IncomeofHead_mean", "parentinc", "test" ) ) ; max_sample = 32142 ; head = 10 ; kid = 30 ; temp = 0 ; RWH = ( 2::42 ) ; FamNo = ( 43::83 ) ; for ( n = 1; n <= max_sample; n++ ) { for ( RWH = 2; RWH <= 5; RWH++ ) { if ( psid[ n, RWH ] == head ) for ( FamNo = 43; FamNo <= 46; FamNo++ ) { for ( m = 1; m <= max_sample; m++ ) { if ( psid[ n, FamNo ] == psid[ m, FamNo ] ) { if ( psid[ m, RWH ] == kid ) { psid[ m, 85 ] = psid[ n, 84 ] ; } } } } } else { break ; } } } end

Hope anyone could help or give any advice!!

Wish you a good luck!!!

Last edited by Enzo Wu; 23 Aug 2021, 03:27.
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

23 Aug 2021, 08:11

I think you're unlikely to find much help on this. What you are attempting to do in Mata is much more easily, and appropriately, accomplished in Stata using the Stata tools designed for dealing with panel data like the PSID. Many members are reluctant to help someone with an inappropriate way of accomplishing a task. If you ultimately need data in Mata for your analysis, you are much better off transforming your raw data in Stata and then bringing the appropriate variables into Mata.
Comment

Niels Henrik Bruun

Join Date: Aug 2014
Posts: 555

23 Aug 2021, 23:19

Although I agree with William Lisowski in that it is easy to do in Stata, it is also quite easy to do in Mata.
Hopefully I've grabbed the main idea of your code in the following example:

Code:

cls
mata mata clear

mata: // settings
  N = 20
  colnbr = 5
  head = 10
  kid = 30
end  

mata: // data
  rseed(123)
  id = 1::N
  rwh = runiformint(N, colnbr, 8, head)
  famno = runiformint(N, colnbr, 25, kid)
end

mata: // solution = all rows where head in rwh row and kid in famno row
  answer = J(N,1,.)
  for(n=1;n<=N;n++) answer[n] = anyof(rwh[n, .], head) * anyof(famno[n,.], kid)
end

Summary of my code being:

Code:

:   id, rwh, famno, answer
         1    2    3    4    5    6    7    8    9   10   11   12
     +-------------------------------------------------------------+
   1 |   1    9   10   10    8    8   28   27   26   28   28    0  |
   2 |   2    9    9    9   10    8   29   26   29   30   30    1  |
   3 |   3   10   10   10    9    8   29   25   30   25   28    1  |
   4 |   4    8    8   10   10    9   29   30   28   27   28    1  |
   5 |   5    8   10   10    9   10   27   28   27   30   28    1  |
   6 |   6   10    8    8    9    8   26   30   28   26   28    1  |
   7 |   7    8   10    8    9    8   25   26   30   30   30    1  |
   8 |   8   10    8    9    9    8   30   28   30   29   30    1  |
   9 |   9   10   10   10    9   10   28   28   25   27   27    0  |
  10 |  10   10    8    9    9    8   26   30   28   26   28    1  |
  11 |  11    9   10    9   10    8   30   26   25   28   27    1  |
  12 |  12    8   10    9    9    9   26   30   30   27   28    1  |
  13 |  13    9    9    9   10   10   27   26   26   29   29    0  |
  14 |  14    9    8    8    9   10   26   30   26   26   27    1  |
  15 |  15    8    9   10    9    8   26   28   29   30   29    1  |
  16 |  16   10   10    8    9    8   29   28   30   28   30    1  |
  17 |  17   10    9    9    8   10   30   25   25   29   25    1  |
  18 |  18    9    9    8    8    8   28   26   25   25   25    0  |
  19 |  19    8   10   10    9    9   29   25   28   26   29    0  |
  20 |  20    8    9   10   10    9   27   30   30   27   29    1  |
     +-------------------------------------------------------------+

Often you have to think quite differently from Stata and most other coding languages.
One key feature in Mata is matrices and the accompanying functions.

Kind regards

nhb

Comment

Enzo Wu

Join Date: Aug 2021

Posts: 9
#4

23 Aug 2021, 23:23

Originally posted by William Lisowski View Post

I think you're unlikely to find much help on this. What you are attempting to do in Mata is much more easily, and appropriately, accomplished in Stata using the Stata tools designed for dealing with panel data like the PSID. Many members are reluctant to help someone with an inappropriate way of accomplishing a task. If you ultimately need data in Mata for your analysis, you are much better off transforming your raw data in Stata and then bringing the appropriate variables into Mata.

I do appreciate for your response. In fact, I did the analysis in Stata initially but it took a long time (about weeks) for looping the data.

I think your suggestion is pretty practical !!
Could you please offer some examples or any information related which would be more helpful and comprehensive?

Thanks again,
Enzo
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#5

24 Aug 2021, 11:10

If you were looping the data as you did in post #1, where you had nested loops running across all the observations, then I am unsurprised that it took a long time.

In general, that is not as productive an approach with Stata as it is with some alternative systems for statistical analysis. If indeed you come to this task with a background in another language, you need to work hard to unlearn the techniques that were effective there. Otherwise, you are like the traveller who responds to being in a country with an unfamiliar language by speaking their native language more slowly and more loudly.

Here is some advice on reading to increase your general knowledge of Stata and your specific knowledge in support of analyzing panel data. Perhaps you have done some of this already. A little later a second post will present sample code.

When I began using Stata in a serious way, I started, as have others here, by reading my way through the Getting Started with Stata manual relevant to my setup. Chapter 18 then gives suggested further reading, much of which is in the Stata User's Guide, and I worked my way through much of that reading as well. All of these manuals are included as PDFs in the Stata installation and are accessible from within Stata - for example, through the PDF Documentation section of Stata's Help menu.

The objective in doing the reading was not so much to master Stata - I'm still far from that goal - as to be sure I'd become familiar with a wide variety of important basic techniques, so that when the time came that I needed them, I might recall their existence, if not the full syntax, and know how to find out more about them in the help files and PDF manuals.

Stata supplies exceptionally good documentation that amply repays the time spent studying it - there's just a lot of it. The path I followed surfaces the things you need to know to get started in a hurry and to work effectively.

Stata also supples YouTube videos, if that's your thing.

Specifically, the Stata Data Management Reference Manual PDF describes a number of the tools you will need to work with your longitudinal PSID data. Among them is the reshape command.

You have your data organized as it came from PSID, in what is called here a "wide" layout, with one observation per individual, and repeated observations of each variable for all the waves of the survey. By contrast, a "long" layout would have, for each individual, one observation for each wave the individual appears in. The experienced users here generally agree that, with few exceptions, Stata makes it much more straightforward to accomplish complex analyses using a long layout of your data rather than a wide layout of the same data. The reshape command is the tool that transforms wide layouts into long layouts, and long layouts into wide layouts.

In particular a long layout of the data is the basis of all the tools used for working with longitudinal data described in the Stata Longitudinal-Data/Panel-Data Reference Manual PDF, which you should review carefully before setting out to analyze longitudinal data using Stata.

You should definitely understand that Stata has been used for effective analysis for decades now, including on surveys like PSID. This would not be the case if such analyses required weeks to run. And it would not be the case if it required writing Mata code to speed the process. You need to ask yourself, and Statalist, "what is it about analyzing this data that I am not understanding?"
Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

24 Aug 2021, 12:05

In this example, I have made up four waves of PSID data, and made up my own problem similar to yours.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int(var1 var2 var3 var4 var5 var6 var7 var8 var9 var10)
1 10 10 10 10 108 214 327  46 1
2 20 10 20 20 108 817  62 123 2
3 30 30 30 10 108 817  62 429 2
4  .  . 10 10   .   .  62 123 1
end

rename var1 ID
rename var2-var5 RWH#, addnumber   // RWH1-RWH4
rename var6-var9 FamNo#, addnumber // FamNo1-FamNo4
rename var10 Gender

In the first wave, there is a single household, consisting of

ID=1, a male Reference Person (RP), previously called "head", RWH=10
ID=2, a female Spouse/Partner (SP), previously called "wife", RWH=20
ID=3, a female Child, RWH=30

In the second wave, the RP and SP separate into two households.

ID=1, the male RP in the first household
ID=2, the female RP in the second household
ID=3, the female Child in the second household

In the third wave, the second household is joined by a male who becomes the RP

ID=1, the male RP in the first household
ID=2, the female SP in the second household
ID=3, the female Child in the second household
ID=4, the male RP in the second household

In the fourth wave, the female child leaves home and forms a third household

ID=1, the male RP in the first household
ID=2, the female SP in the second household
ID=3, the female Child in the third household
ID=4, the male RP in the second household

We know the Gender of each individual (1=male, 2=female) and wish to create a variable that gives, for each individual in each wave, the Gender of the RP for the household the individual is a member of.

Code:

reshape long RWH FamNo, i(ID) j(Wave)
drop if missing(FamNo)
order Wave, first
sort Wave FamNo RWH
by Wave FamNo: egen GenderRP = max(cond(RWH==10,Gender,.))
list, sepby(Wave FamNo)

Code:

. reshape long RWH FamNo, i(ID) j(Wave)
(j = 1 2 3 4)

Data                               Wide   ->   Long
-----------------------------------------------------------------------------
Number of observations                4   ->   16          
Number of variables                  10   ->   5           
j variable (4 values)                     ->   Wave
xij variables:
                     RWH1 RWH2 ... RWH4   ->   RWH
               FamNo1 FamNo2 ... FamNo4   ->   FamNo
-----------------------------------------------------------------------------

. drop if missing(FamNo)
(2 observations deleted)

. order Wave, first

. sort Wave FamNo RWH

. by Wave FamNo: egen GenderRP = max(cond(RWH==10,Gender,.))

. list, sepby(Wave FamNo)

     +---------------------------------------------+
     | Wave   ID   RWH   FamNo   Gender   GenderRP |
     |---------------------------------------------|
  1. |    1    1    10     108        1          1 |
  2. |    1    2    20     108        2          1 |
  3. |    1    3    30     108        2          1 |
     |---------------------------------------------|
  4. |    2    1    10     214        1          1 |
     |---------------------------------------------|
  5. |    2    2    10     817        2          2 |
  6. |    2    3    30     817        2          2 |
     |---------------------------------------------|
  7. |    3    4    10      62        1          1 |
  8. |    3    2    20      62        2          1 |
  9. |    3    3    30      62        2          1 |
     |---------------------------------------------|
 10. |    3    1    10     327        1          1 |
     |---------------------------------------------|
 11. |    4    1    10      46        1          1 |
     |---------------------------------------------|
 12. |    4    4    10     123        1          1 |
 13. |    4    2    20     123        2          1 |
     |---------------------------------------------|
 14. |    4    3    10     429        2          2 |
     +---------------------------------------------+

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#7

24 Aug 2021, 14:37

In the example above, the description for the fourth wave should have read
ID=3, the female RP in the third household
Comment
Enzo Wu

Join Date: Aug 2021

Posts: 9
#8

08 Apr 2022, 08:36

@Niels Henrik Bruun
I really appreciated your response and please forgive me for the late reply.
Your solution helped me on the project and worked well eventually.
Comment

Announcement