Predictions

Barbora Sedova

Join Date: Apr 2017

Posts: 63
#1

Predictions

08 Aug 2018, 12:25

Dear all,
I have a simple First-Difference model with two time periods. I have two independent variables that capture average temperature and precipitation and a dependent variable on how many migrants a household has.

I create variables that capture the differences between the two periods. I use a Binary dependent variable (M): 1 if household increased its number of migrants, 0 else. And then I run a simple OLS, i.e. limited dependent variable model: reg M c.T c.P if URBAN==0, noconstant vce(cluster DISTRICT)

I would like to predict the total number of households in the population that increased their number of migrants as a result first of a) temperature variation b) precipitation variation c) both.

I know I need to use sampling weights.
But otherwise I am not sure how to do that.

My idea was the following:
run the model

predict the probabilities

generate a variable by multiplying the probability with the total number of migrants

total of the variable should be the nr. of displaced people in response to P and T

Is this a correct approach? How can I do it separately for the two variables?
Tags: None
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#2

08 Aug 2018, 15:37

This is not the best approach. First of all, because you have survey data, you should svyset the data. To advise on how to do this, we need to know more about the sampling design. From your regress statements, it appears that the primary sampling units (PSUs) were "districts, but I can't be sure. So please describe the design in detail.

Last edited by Steve Samuels; 08 Aug 2018, 15:39.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Barbora Sedova

Join Date: Apr 2017

Posts: 63
#3

09 Aug 2018, 01:31

Dear Steve, thank you for your reply. My unit of observation is a households and my treatment variables P and T are at the district level- I only look at rural households and the probability that they increase he number of migrants.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#4

09 Aug 2018, 02:28

Thanks for this useful information, but I'm asking about the sampling design: the process that led to selection of the study households. We can get at this in another way if I can look at study documentation. What's the name of the survey? Is there a web site? What countries? What are the time periods for your analysis?

Last edited by Steve Samuels; 09 Aug 2018, 02:31.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Barbora Sedova

Join Date: Apr 2017

Posts: 63
#5

09 Aug 2018, 02:48

Dear Steve, sorry for the confusion. I am using both roudns of the IHDS data: https://ihds.umd.edu/
The data should be representative for the whole India. I only look at the rural areas.

Last edited by Barbora Sedova; 09 Aug 2018, 02:59.
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#6

09 Aug 2018, 10:12

Thank you, Barbora. The documentation is not very clear about the identify of the sampling strata. I've written to the IDHS for clarification.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#7

09 Aug 2018, 11:14

While we wait, I'd like some clarification of your study question: You say first that you " would like to predict the total number of households" that increased the number of migrants, but then you say you want to estimate the number of displaced people.

So, is it households or people? It makes a difference, as IHDS-II did not interview all of the IHDS-I households. It apparently added HH and followed people into "split" households. So, I don't think that you can estimate the percent of HH with an increase in migrants, as the paired data are not complete. As you have the data, you can investigate this yourself.

The sampling weights will allow you to estimated weighted statistics for strata, probably districts. You should be able to estimate the number of average number of migrants per HH in each round and the proportion of HH with migrants in each round. You can get totals if you have independent estimates of the number of HH in the rural villages or strata.

Last edited by Steve Samuels; 09 Aug 2018, 11:16.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#8

09 Aug 2018, 14:09

I do have some more questions:

I see in the Village module, a question: VE11:VQ5 5.11:

From how many households have at least one member migrated out for seasonal work?

But I also see VJ7A: VW3 3.7A "

Did people come to this village from outside to work during the last year? [IF YES]: How many people came to work during the last year?

The codes for this are 1 for "Less than 20" and 0 for "More than 20".

So the questions:
1. What do you mean by "how many migrants a household has"?
2. How do you intend calculate the total number of migrants (displaced people)? Please be explicit about which variables/questions you will use to do the calculation.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#9

10 Aug 2018, 15:30

I did hear from IHDS. The original sampling strata are no longer relevant. I recommend that you creates pseudo strata that are the unique districts..

Code:

egen newstrat =group(STATEID DISTID) svyset PSUID [pw = WT], strata(newstrat) keep if urban == 1 // keep rural villages only

This will cluster on village. However if you plan on studying district level effects, you'll want to write a mixed model with district as the highest level.

Aside: I find it very tiresome to shift case back and forth from Stata commands (lower case only) to variable names (upper case) Therefore, before any analysis, I convert all variable names to lower case and save the result as a new data set. I suggest that you do the same. It will greatly speed up typing code.

Code:

rename *,lower

Last edited by Steve Samuels; 10 Aug 2018, 15:46.

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#10

11 Aug 2018, 07:36

To consider:

• These variables are in the Individual Codebook for 2012. I haven't looked at the 2005 Codebooks.
• In the Household Codebook for 2012, there is also a variable WT, which appears to be the HH weight + a variable INDWT, which appears to be the one to use if you want to weight up summaries of individuals in the household
• In the Individual & HH Codebooks for 2005, the design weights are nameds SWEIGHT

If you want to combine the data sets from the two rounds, as you apparently want to do, you should create a new weight variable (e.g. WTNEW) which is SWEIGHT for the 2005 data and WT for the 2012 data

According to the FAQ (https://ihds.umd.edu/faq-page#n227), if you want to link HH or individuals from the two rounds and do a panel study, use the 2005 SWEIGHT. I don't know what other variable names have changed.

Sources of information for both surveys:

Professor Sonalde Desai: [email protected]
Professor Reeve Vanneman: [email protected]

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
1 like
Comment
Barbora Sedova

Join Date: Apr 2017

Posts: 63
#11

15 Aug 2018, 04:27

Dear Steve,

many thanks for all the replies. I was indeed probably using the wrong weights. My analysis is a panel analysis at a household level where the treatment is at the district level.
I want to calculate first, how many households engaged into permanent migration (increase in the nr of non-resident members).
So if i use the SWEIGHT variable, does it give me the number of households or does it calculate the number of individuals. I am a bit confused.
Also about the stratification, how would you do it for my case? Just the way you described it above?

Many thanks.
Comment
Barbora Sedova

Join Date: Apr 2017

Posts: 63
#12

15 Aug 2018, 06:10

Ok I think it is as follows:
SWEIGHT can be used as a weight at hh level for a panel, as you said
In order to translate the overall nr of hh into overall nr of individuals, one needs to create a new wieght which multiplies SWEIGHT with number of hh members. This is not what I do in my analysis though. I only use SWEIGHT at hh level.

As regards the calculation fo the numebr of displace people, I intend to do the following:
1) run my model described in the original add
2) predict the probabilitity
3) multiply the probabilities with the total size of rural population -> number of households that engaged into migration
4) multiply the number obtained in 3) with the avergae increase in nr. of migrants per hh.

Does that make sense?

What I am still puzzeled about is stratification....
Does it make sense the way you suggested it if my level fo observation is a hh??
Comment
Barbora Sedova

Join Date: Apr 2017

Posts: 63
#13

15 Aug 2018, 06:28

Also why do you suggest clustering at the village level:
svyset PSUID [pw = WT], strata(newstrat)???
Comment
Steve Samuels

Join Date: Mar 2014

Posts: 1786
#14

16 Aug 2018, 15:22

1. The 2005 rural survey was stratified, but the strata are not identified. Certainly State can stand as a stratum; I'm not sure about District, so that could be omitted from the definition; i.e. just use strata(State Variable)
2. Villages were the primary sampling units, as this document states on page 2:

Villages and urban blocks (comprising of 150-200 households) formed the primary sampling unit (PSU) from which the households were selected.

Note that when you estimate the numbers of migrants in each HH in 2005 and 2012 and the per HH difference, you should use the 2005 household sampling weight SWEIGHT.

Good luck!

Steve Samuels
Statistical Consulting
[email protected]

Stata 14.2
Comment
Sara George

Join Date: Oct 2022

Posts: 31
#15

28 Oct 2022, 21:08

Is the command svyset PSUID [pw = SWEIGHT] if you use SWEIGHT.
Im getting this after I run. What does this mean?

svyset PSUID [pw = SWEIGHT]

pweight: SWEIGHT
VCE: linearized
Single unit: missing
Strata 1: <one>
SU 1: PSUID
FPC 1: <zero>
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment