Variable Management

Aamir Malik

Join Date: Jan 2020

Posts: 48
#1

Variable Management

21 Jan 2020, 19:02

> I got the following two variables in my dataset: > > Number1 Number2 > 000752 > 000752 > 48239P 000752 > 000752 > 000752 > 000752 > 89351Q 893895 > 893895 > 893895 > 893895 > 893895 > .... > > > I want to fill up all the empty cells so that it looks like the following: > > Number1 Number2 > 48239P 000752 > 48239P 000752 > 48239P 000752 > 48239P 000752 > 48239P 000752 > 48239P 000752 > 89351Q 893895 > 89351Q 893895 > 89351Q 893895 > 89351Q 893895 > 89351Q 893895 > > Each Number1 has a corresponding Number2.
Tags: categorical, loop, panel data
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

22 Jan 2020, 07:56

Welcome to Statalist.

Did you perhaps not take a look at your post after it was posted? You are expecting a lot of effort of other members to figure out what you presented.

With regard to the example data below, I believe you have variables number1 and number2 and wish to replace the contents of number1 with new1. You seem to show that for any given value of number2, the same value of number1 will apply to all observations.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input str6(number2 number1 new1) "000752" "" "48239P" "000752" "" "48239P" "000752" "48239P" "48239P" "000752" "" "48239P" "000752" "" "48239P" "000752" "" "48239P" "893895" "89351Q" "89351Q" "893895" "" "89351Q" "893895" "" "89351Q" "893895" "" "89351Q" "893895" "" "89351Q" end

The following seems to do what you want.

Code:

. generate seq = _n . by number2 (number1), sort: replace number1 = number1[_N] (9 real changes made) . sort seq . list seq number1 number2, sepby(number1) +-------------------------+ | seq number1 number2 | |-------------------------| 1. | 1 48239P 000752 | 2. | 2 48239P 000752 | 3. | 3 48239P 000752 | 4. | 4 48239P 000752 | 5. | 5 48239P 000752 | 6. | 6 48239P 000752 | |-------------------------| 7. | 7 89351Q 893895 | 8. | 8 89351Q 893895 | 9. | 9 89351Q 893895 | 10. | 10 89351Q 893895 | 11. | 11 89351Q 893895 | +-------------------------+

Now, a word of advice to improve your future posts. We want to help you solve your problems, but you need to help us understand your problems. Please take a few moments to review the Statalist FAQ linked to from the top of the page, as well as from the Advice on Posting link on the page you used to create your post. Note especially sections 9-12 on how to best pose your question. It's particularly helpful to copy commands and output from your Stata Results window and paste them into your Statalist post using code delimiters [CODE] and [/CODE], and to use the dataex command to provide sample data, as described in section 12 of the FAQ.
1 like
Comment

Aamir Malik

Join Date: Jan 2020
Posts: 48

22 Jan 2020, 15:31

Thank you William Lisowski for your kind reply and help. I am sorry for not posting well my data. Here again, I present the data with tabular form. Please look at this.

So I have 4 variables. Each subject (Subject ID) has multiple entries for Heart Rate, but their demographic information (Age, Gender) is only in the first row. When I do analysis of the Heart Rate, I am unable to have the results in the correct format with respect to the subject ID. Is there a way in STATA that I can fill the info of each subject in each row? I am struggling with this as I have a very large dataset. I will really appreciate your help.

Subject ID	Age	Gender	Heart Rate
1	54	F	67
1			75
2	34	F	56
3	57	M	69
3			90
3			111
3			67
3			56
3			76
4			67
4			94
4			68
5	21	F	56
5			74
6	39	M	73
6			79
6			84
6			75
6			48
7	45	M	67
7			59
7			62
8	31	M	68
9	19	F	74
9			71
9			104
10	11	M	96
10			67

Comment

William Lisowski

Join Date: Dec 2014
Posts: 10150

22 Jan 2020, 16:15

Here is example code. For clarity, I repost your example data as prepared by the dataex command.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input byte(subjectid age) str1 gender int heartrate
 1 54 "F"  67
 1  . ""   75
 2 34 "F"  56
 3 57 "M"  69
 3  . ""   90
 3  . ""  111
 3  . ""   67
 3  . ""   56
 3  . ""   76
 4  . ""   67
 4  . ""   94
 4  . ""   68
 5 21 "F"  56
 5  . ""   74
 6 39 "M"  73
 6  . ""   79
 6  . ""   84
 6  . ""   75
 6  . ""   48
 7 45 "M"  67
 7  . ""   59
 7  . ""   62
 8 31 "M"  68
 9 19 "F"  74
 9  . ""   71
 9  . ""  104
10 11 "M"  96
10  . ""   67
end

generate seq = _n
bysort subjectid (seq): replace age = age[1]
bysort subjectid (seq): replace gender = gender[1]
list, sepby(subjectid) abbreviate(12) noobs

Code:

. list, sepby(subjectid) abbreviate(12) noobs

  +--------------------------------------------+
  | subjectid   age   gender   heartrate   seq |
  |--------------------------------------------|
  |         1    54        F          67     1 |
  |         1    54        F          75     2 |
  |--------------------------------------------|
  |         2    34        F          56     3 |
  |--------------------------------------------|
  |         3    57        M          69     4 |
  |         3    57        M          90     5 |
  |         3    57        M         111     6 |
  |         3    57        M          67     7 |
  |         3    57        M          56     8 |
  |         3    57        M          76     9 |
  |--------------------------------------------|
  |         4     .                   67    10 |
  |         4     .                   94    11 |
  |         4     .                   68    12 |
  |--------------------------------------------|
  |         5    21        F          56    13 |
  |         5    21        F          74    14 |
  |--------------------------------------------|
  |         6    39        M          73    15 |
  |         6    39        M          79    16 |
  |         6    39        M          84    17 |
  |         6    39        M          75    18 |
  |         6    39        M          48    19 |
  |--------------------------------------------|
  |         7    45        M          67    20 |
  |         7    45        M          59    21 |
  |         7    45        M          62    22 |
  |--------------------------------------------|
  |         8    31        M          68    23 |
  |--------------------------------------------|
  |         9    19        F          74    24 |
  |         9    19        F          71    25 |
  |         9    19        F         104    26 |
  |--------------------------------------------|
  |        10    11        M          96    27 |
  |        10    11        M          67    28 |
  +--------------------------------------------+

If in your data there is another variable that tells you what order the observations belong in (like some sort of date, perhaps, or a visit number) you can use it instead of the sequence number I created for that purpose.

Comment

Aamir Malik

Join Date: Jan 2020

Posts: 48
#5

22 Jan 2020, 16:28

WOW. Thank you! This is very helpful.
1 like
Comment

Announcement