Extracting and using variables from different stata files of the same survey dataset

Nna Elue

Join Date: Jul 2021

Posts: 2
#1

Extracting and using variables from different stata files of the same survey dataset

13 Jul 2021, 19:52

Hi All,

I'm trying to perform a survey data analysis, but these data came in 35 different stata files. Problem is that I only need a few variables from each file.

I've tried merging, but it wasn't meaningful since the variables are different in different files, so, I was left with empty cells after merging.

Please, how can I successfully extract the specific variables (up to 12 variables from 8 different files) I need for my analysis without losing the survey's core characteristics (I learnt that for a survey analysis to produce reliable variance estimates, that the full sample size should be used for analysis.

Below are the examples of the two stata file and two of the variables I'd like to extract from each for use in a single analysis. For instance, college graduates are likely to be more informed to take care of their health, such that their chances of being physically disabled are limited.

Thank you for your time.

sect2_education.dta
What is the highest educational level [NAME] completed?

sect3_health.dta
Did [NAME] have to stop his/her usual activities because of this [ILLNESS/INJURY
Tags: categorical, data, interaction
Ken Chui

Join Date: Aug 2014

Posts: 1058
#2

14 Jul 2021, 06:36

Welcome to Statalist.

First off, I want to clarify that there are two major ways to combine data: merge is to add variables (new columns) and this is usually used if two set of different data were collect from the same people, and we want to combine that information. Append is to add cases (new rows) and this is usually used if have two sets of similar data collected from two different groups, and we will to create a longer data set to analyze the combined sample.

I'm assuming that you mean merge. Most of the multi-file survey distribution should use a common identification number. This ID number should exist in all the files in order for merging to happen.

You said that:

I've tried merging, but it wasn't meaningful since the variables are different in different files, so, I was left with empty cells after merging.

And this prompted me to think that perhaps a fuller understanding of how merging works is needed. When merging, it is common that the variables from the two data sets are different. Because of that we need that ID number to connect them. Try to use the command:

Code:

help merge

and browse the command syntax and examples to understand how that ID variable works.

In addition, please take some time to read the FAQ (http://www.statalist.org/forums/help) on how to improve a question that would be more likely to get useful answers. As of now we can't really provide much help because there were no sample data, no details on the survey structure, no display of the code that you used, and no error message, etc.
Comment
Nna Elue

Join Date: Jul 2021

Posts: 2
#3

14 Jul 2021, 18:51

Originally posted by Ken Chui View Post

Welcome to Statalist.

First off, I want to clarify that there are two major ways to combine data: merge is to add variables (new columns) and this is usually used if two set of different data were collect from the same people, and we want to combine that information. Append is to add cases (new rows) and this is usually used if have two sets of similar data collected from two different groups, and we will to create a longer data set to analyze the combined sample.

I'm assuming that you mean merge. Most of the multi-file survey distribution should use a common identification number. This ID number should exist in all the files in order for merging to happen.

You said that:

And this prompted me to think that perhaps a fuller understanding of how merging works is needed. When merging, it is common that the variables from the two data sets are different. Because of that we need that ID number to connect them. Try to use the command:

Code:

help merge

and browse the command syntax and examples to understand how that ID variable works.

In addition, please take some time to read the FAQ (http://www.statalist.org/forums/help) on how to improve a question that would be more likely to get useful answers. As of now we can't really provide much help because there were no sample data, no details on the survey structure, no display of the code that you used, and no error message, etc.

Thank you very much for your time. I'm sorry it's challenging for new users to understand codes. I used the help merge and watched some youtube videos for practice. I was able to attend to my problem using the codes below:

use mmm
merge m:1 hhid indiv using "D:\Desktop\Data\LSS\STATA\Household\sect4a1_labou r.dta", generate(pap)

merge m:1 hhid indiv using "D:\Desktop\Data\LSS\STATA\Household\sect4a1_labou r.dta", generate(pap)
(label zone_id already defined)
(label state_id already defined)
(label sector already defined)
(label lga already defined)

Result # of obs.

not matched 0
matched 116,320 (pap==3)

. save "mmm4"
file mmm4.dta saved
Comment

Announcement

Extracting and using variables from different stata files of the same survey dataset

Comment

Comment