generating a sample pseudo dataset

Olivier Ma

Join Date: Aug 2015

Posts: 13
#1

generating a sample pseudo dataset

29 Aug 2015, 01:11

when learning the basic Stata operations, I want to create a pseudo data set to play with. the data set (panel data) should be something like this:
year country growth

1991 UK 0.1

1991 US 0.2

1992 UK 0.3

1992 US 0.4

1993 UK 0.5

so I guess what I need to do for the three variables is:

for the year variable, create a number sequence and repeat it;
for the country variable, create a string macro and repeat it;
for the growth variable, create a series of random numbers with the random number generators.

This should be quite easy in other programming languages, but how could I manage this in Stata?

more generally, links about how to manage this kind of works are much appreciated. I googled, but nothing relevant came up. (might be I'm searching for the wrong key word)

Last edited by Olivier Ma; 29 Aug 2015, 01:14.
Tags: data, panel data, pseudo data

1 like

Nick Cox

Join Date: Mar 2014
Posts: 35698

29 Aug 2015, 01:26

Some technique.

Code:

set obs 6
egen year = seq(), from(1991) to(1993) block(2)
gen country = cond(mod(_n, 2), "UK", "US")
gen growth = 1 + rnormal()

Comment

Olivier Ma

Join Date: Aug 2015

Posts: 13
#3

29 Aug 2015, 01:36

Originally posted by Nick Cox View Post

Some technique.

Code:

set obs 6 egen year = seq(), from(1991) to(1993) block(2) gen country = cond(mod(_n, 2), "UK", "US") gen growth = 1 + rnormal()

exactly what I need, thanks! I'll look into the details of these functions
Comment
Olivier Ma

Join Date: Aug 2015

Posts: 13
#4

29 Aug 2015, 02:10

Originally posted by Nick Cox View Post

Some technique.

Code:

set obs 6 egen year = seq(), from(1991) to(1993) block(2) gen country = cond(mod(_n, 2), "UK", "US") gen growth = 1 + rnormal()

A tiny little follow-up question: if I have many more countries, say 50 or 100, is there a generic way to generate the country variable without refering to each and every country name?

I tried

Code:

local country_names "UK US DE FR" //all the country names generate country = cond(mod(_n, 4), `country_names')

but got an error

Code:

USUKDEFR not found r(111);

I should say that I understand why I got the error (replace the country_names macro with the actual country names and I will have the wrong syntax for the cond() function). just don't know how to get it right

Last edited by Olivier Ma; 29 Aug 2015, 02:15.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35698

29 Aug 2015, 02:41

Code:

local country_names "UK US DE FR" 
gen country = word("`country_names'", 1 + mod(_n-1, 4) )

More generally

Code:

local country_names "<list>" 
local nc : word count `country_names'
gen country = word("`country_names'", 1 + mod(_n-1, `nc') )

Comment

Olivier Ma

Join Date: Aug 2015

Posts: 13
#6

29 Aug 2015, 02:51

thanks NIck!
Comment
elizabeth nanziri

Join Date: Feb 2016

Posts: 1
#7

05 Feb 2016, 01:58

Dear All, I am a new member and I was trying to post this question but I failed. So I am using this space. Apologies if this is not permissible. I have been following the examples on constructing a pseudo panel in stata but the procedure is not yet clear. I have two cross-sections from the same population but sampled at two different times. I want to create a pseudo panel using age and location(9 districts). In each dataset, I have then constructed a categorical variable that combines those two variables. Am not sure whether am supposed to merge the datasets, in which I have to create a unique identifier on which to merge, on simply to append them. The latter does not seem to be the right format for panel data analysis if the pseudo panel is supposed to be an approximation of panel data. Can someone help me with the steps to follow? A guide to a pseudo panel handbook will also be highly appreciated.
Comment
Latha Kadalayil

Join Date: Aug 2021

Posts: 1
#8

23 Aug 2023, 15:54

This is my first post here, I think. I am a beginner trying to learn Stata coding the parsimonious way !! I would like to generate 12 date variables by the name fol_up_`i' where `i' takes values 8, 16, 24, ..., 96 (multiples of 8). These new variables are dates for 8-weekly follow up from rand_date (date of randomisation). Right now my data looks like this (dates made numeric and formatted)

trialID rand_date
1 01sep2022
2 01sep2022
3 01nov2022
4 01sep2022
5 01sep2022
6 01nov2022
7 01sep2022
8 01sep2022
9 01nov2022
10 01sep2022
11 01sep2022
12 01nov2022
13 01sep2022
14 01sep2022
15 01nov2022
16 01sep2022
17 01sep2022
18 01nov2022
19 01sep2022
20 01sep2022

Can someone help me get started please? Thanks
Comment

year	country	growth
1991	UK	0.1
1991	US	0.2
1992	UK	0.3
1992	US	0.4
1993	UK	0.5

Announcement

generating a sample pseudo dataset

Comment

Comment

Comment

Comment

Comment

Comment

Comment