Using Nested Data in Cross-Sectional Format

Carly Roman

Join Date: Jul 2019

Posts: 2
#1

Using Nested Data in Cross-Sectional Format

23 Jul 2019, 15:13

I have a level one variable called spid and a level two variable called opid. Using data from both spid and opid, I have created variables that characterize spid. Specifically, I used age from the one spid and ages from the 1-5 opid's listed for each spid to create dichotomous variables called peerintergen25 and information from the opid's to characterize them as family or friends to the spid and helpers or non-helpers to the spid. I also created a categorical variable that combines all of this information using the group command and subsequently created dummy variables of the different possible combinations from this categorical variable. For example, an spid with 2 opid's listed may have 1 opid with a 0 for peerintergen25, 1 for famorfriend, and 1 for helpsSP (similarly, it is a 1 in a categorical variable that has 8 combinations of these characteristics grouped together, and 1 in a dummy variable created of those 8 possible categories), while the other opid may have different combinations of these characteristics. I also used "egen spidflag = tag(spid)" to create a tag the first spid given that some spid's have 1 opid listed, and others have 5 opid's listed.

Without losing information from the different opid's, I would like to create variables simply at level one for each spid, which I cannot do using the spidflag (because it would only take information from the first opid). I have tried using the collapse command with (count) for variables that vary by opid (like peerintergen25, famorfriend, helpsSP, the 8-category variable, and the dummy variables of the 8-category variables) and (first) for variables that only apply to spid (like gender), but the output shows the same cross-tabs for every count variable created.

I realize I can use xtset and look at this as a panel, but I would like to use a regular logistic regression to understand the odds of having a 1 for peerintergen25, with the potential to stratify analyses by or control for whether or not the peerintergen25 0/1 status varies by famorfriend or helpsSP.
Tags: clustered, create variables, cross-sectional, nested, panel
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#2

23 Jul 2019, 21:01

After reading this post three times, I am unable to understand what your data consists of, what you have done with it, and what you want to do. Perhaps if I am sufficiently persistent, it will become clearer. But I also see that nobody has responded and your post has been up for about 6 hours now. I will be surprised if you get a helpful response to this post within a reasonable time frame.

I suggest you rework your question. First, show some example data, using the -dataex- command.* Next, show the code you have used to create your additional variables. Then show the unsuccessful code involving collapse that you tried, along with the output you got from it, and explain how that output differs from what you hoped to get. If you do those things, it is more likely that somebody will understand your situation and be able to give useful advice.

* If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment
Carly Roman

Join Date: Jul 2019

Posts: 2
#3

26 Jul 2019, 14:06

Thank you for the feedback. I think I have solved my issue - I created dummy variables to describe the characteristics of each opid listed for spid's. Though using collapse with the count command did not work (because it counted the 0's for opid's without those characteristics), I replaced all of the 0's in the dummy variables with missing data and used first.nm rather than count with the collapse command. This allows me to determine if spid individuals have any opid's with the specific characteristics (such as peerintergen25, famorfriend, helpsSP), collapsing the dataset into 1 row for each spid rather than as many rows as opid's for each spid.
Comment

Announcement

Using Nested Data in Cross-Sectional Format

Comment

Comment