Merging two datasets with specific dates of start and end of period within a year

Mario Ferri

Join Date: Jul 2019

Posts: 190
#46

06 Mar 2020, 17:28

Originally posted by Clyde Schechter View Post

The relationship between the 6 categories and the three dichotomous variables is not entirely clear. It looks to me like this:
6-levels weak v strong single v multiple non-old v old

1 1 0 ?

2 1 1 ?

3 1 1 ?

4 0 0 ?

5 0 1 ?

6 ? ? 1

In the above table, ? signifies that from the descriptions given of the 6 levels and the three dichotomies, the dichotomy is indeterminate, indeed not applicable, in that level. Note also that the original levels 2 and 3 are not distinguishable in the new three dichotomy classification. Note also that the third dichotomy only takes on values 1 and ?, which is problematic. Perhaps there are aspects of levels 1 through 5 that qualify for a determinate value of old vs non-old that I am overlooking?

Perhaps I am misunderstanding the 6-levels and the three dichotomies--and feel free to correct my interpretation where it is wrong.

Anyway, taking this at face value, you could code this as follows:

Code:

label define weak_strong 0 "Weak" 1 "Strong" .n "N/A" forvalues i = 1/2 { gen weak_strong`i':weak_strong = inlist(t`i', 1, 2, 3) replace weak_strong`i' = .n if t`i' == 6 } label define single_multiple 0 "Single" 1 "Multiple" .n "N/A" forvalues i = 1/2 { gen single_multiple`i':single_multiple = inlist(t`i', 2, 3, 5) replace single_multilpe`i' = .n if t`i' == 6 } label define old_non_old 0 "Non-Old" 1 "Old" .n "N/A" forvalues i = 1/2 { gen old_non_old`i':old_non_old = (t`i' == 6) mvencode old_non_old`i', mv(.n = 0) }

Note: code not tested. Beware of typos or other errors.

I have written this code and seems to be what I am looking. It is tested and works. This is for t1. t2 is no longer needed.

Code:

generate leadership =t1 recode leadership (1/3=1) (4/6=0) generate animal_group =t1 recode animal_group( 1 4 6 =0) (2/3 5=1) generate old=t1 recode old (1/5=0) (6=1)

On last thing I will need to collapse these variables to the country-year, which then weighting them by the percentage of the calendar year makes sense. They should be added in the collapse code line given in #36

Last edited by Mario Ferri; 06 Mar 2020, 17:32.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#47

06 Mar 2020, 17:51

They should be added in the collapse code line given in #36

Not just to the -collapse- command but also to the -foreach- loop immediately before that which calculates the weighted values.
Comment
Mario Ferri

Join Date: Jul 2019

Posts: 190
#48

06 Mar 2020, 18:21

Originally posted by Clyde Schechter View Post

Not just to the -collapse- command but also to the -foreach- loop immediately before that which calculates the weighted values.

Excellent! Thank you whole-hardly Clyde!
Comment
Mario Ferri

Join Date: Jul 2019

Posts: 190
#49

07 Mar 2020, 17:20

Originally posted by Clyde Schechter View Post

So, take the code in #36 and stop just before the -collapse- command. Then:

Code:

gen byte n_governments = 1
collapse (max) growth1 growth2 (count) n_governments (first) ID weighted_*, by(country ts) gen byte more_than_one_govt = (n_governments > 1)

Why? There is seldom any need to do this in Stata. If you are going to do some kind of regression and you want to include the number of governments as a discrete predictor, there is no need to create indicator ("dummy") variables for this purpose. Use factor-variable notation instead. (-help fvvarlist-)

Code:

regression_command outcome ...i.n_governments …

Stata will create "virtual" indicator variables and use them in the regression.

The codes works fine and I have managed to crated what I wanted. One thing ,nonetheless , when I collapsed everything the variable more_than_one_govt returned 0 in all their values ,where it was not supposed to be like this. It was supposed to provide values 1 or >1

Also some sting variables were not collapsed returning erros as well (type mismatch r(109) ). Any idea why this happened?
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#50

07 Mar 2020, 18:10

Well, when I apply the code to the example data you have provided so far, none of these errors are encountered. There is really no way that n_governments, after -collapse- can come out zero if you are following the code I showed. And none of the variables listed in the -collapse- command are string variables except for country--but that's OK because country only appears in the -by()- option, where strings are allowed.

So your actual data must have changed (pretty radically, in fact) from what you were working with previously. Please show current example data and we can try to figure it out. Also please show the exact code you ran and the exact output including error messages--it is possible that you have modified the code I proposed in ways that are not compatible with your data.
Comment

Mario Ferri

Join Date: Jul 2019
Posts: 190

#51

07 Mar 2020, 19:57

Here is sample of my data after I have run the weighting code

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long ts str48 country float(growth1 growth2 ID weighted_1 weighted_2)
1990 "Australia"          .         . 1 1        0
1990 "Australia" -.25998396  .8315406 1 1        0
1991 "Australia"          .         . 1 1        0
1991 "Australia"  1.3334774 2.1083677 1 1        0
1992 "Australia"  3.2923484 3.4185865 1 1        0
1993 "Australia"          .         . 1 1        0
1993 "Australia"   3.715279 2.6728065 1 1        0
1994 "Australia"  1.6375794 1.2805543 1 1        0
1995 "Australia"  -.9972533 -.9632453 1 1        0
1996 "Australia"          .         . 1 1 .8082192
1996 "Australia"   2.770518  3.154481 1 1 .8082192
1997 "Australia"   3.889898 3.9895246 1 1        1
end

I run this code for a panel

Code:

gen byte n_governments = 1

collapse (max)  growth1 growth2  n_governments (first)  ID weighted_*, by(country ts )

gen byte more_than_one_govt = (n_governments > 1)

and I get

----------------------- copy starting from the next line -----------------------

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long ts str48 country float(growth1 growth2 ID weighted_1 weighted_2) byte(n_governments more_than_one_govt)
1990 "Australia" -.25998396  .8315406 1 1        0 1 0
1991 "Australia"  1.3334774 2.1083677 1 1        0 1 0
1992 "Australia"  3.2923484 3.4185865 1 1        0 1 0
1993 "Australia"   3.715279 2.6728065 1 1        0 1 0
1994 "Australia"  1.6375794 1.2805543 1 1        0 1 0
1995 "Australia"  -.9972533 -.9632453 1 1        0 1 0
1996 "Australia"   2.770518  3.154481 1 1 .8082192 1 0
1997 "Australia"   3.889898 3.9895246 1 1        1 1 0
end

I 've got the desired results when I run once the code for a single country but not for panel.

Note that I have many variable to include in the collapse command but that does not play any role I guess

Any help is appreciated.

Thank you

Last edited by Mario Ferri; 07 Mar 2020, 19:59.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#52

07 Mar 2020, 21:17

So, with regard to always getting more_than_one_government = 0, this is happening because you misapplied the code shown in #39. You have the -collapse- calculating the max of n_governments, but the code there says, correctly, that you want to collapse to the count of n_governments. So your collapse command should be:

Code:

collapse (max) growth1 growth2 (count) n_governments (first) ID weighted_*, by(country ts )

I 've got the desired results when I run once the code for a single country but not for panel.

Then why do you now post example data with only a single country? How can anyone troubleshoot the problem when you post data that does not reproduce the problem?

Note that I have many variable to include in the collapse command but that does not play any role I guess

Well, it probably does play a role in the problem of getting type mismatch messages from -collapse-. It must mean that you are including some string variables there--which is not allowed (and isn't meaningful anyway). So review your full collapse command and see which variables are string variables there. Then get rid of them. You cannot collapse string variables. If the string variables are constant within groups of observations defined by country and ts, then you can carry them into the collapsed data set by including them in the -by()- option. If the string variables vary among observations within country and ts, then you have to eliminate them, or, before -collapse- select a single correct value for the entire country#ts group and then add the string variable to the -by()- option.
Comment
Mario Ferri

Join Date: Jul 2019

Posts: 190
#53

08 Mar 2020, 09:44

Originally posted by Clyde Schechter View Post

So, with regard to always getting more_than_one_government = 0, this is happening because you misapplied the code shown in #39. You have the -collapse- calculating the max of n_governments, but the code there says, correctly, that you want to collapse to the count of n_governments. So your collapse command should be:

Code:

collapse (max) growth1 growth2 (count) n_governments (first) ID weighted_*, by(country ts )

Then why do you now post example data with only a single country? How can anyone troubleshoot the problem when you post data that does not reproduce the problem?

Well, it probably does play a role in the problem of getting type mismatch messages from -collapse-. It must mean that you are including some string variables there--which is not allowed (and isn't meaningful anyway). So review your full collapse command and see which variables are string variables there. Then get rid of them. You cannot collapse string variables. If the string variables are constant within groups of observations defined by country and ts, then you can carry them into the collapsed data set by including them in the -by()- option. If the string variables vary among observations within country and ts, then you have to eliminate them, or, before -collapse- select a single correct value for the entire country#ts group and then add the string variable to the -by()- option.

There was an error with the dataset which I managed to solve manually. The code works extremely well!

Thank you for the help with writing the code. I am grateful for your help because you helped me solve an important issue I was facing.

I really appreciate it and mostly your time spent on my requests.. I am really indebted to you!

Again ,thank you a wholeheartedly for the code and work!

Mario Ferri
Comment
Mario Ferri

Join Date: Jul 2019

Posts: 190
#54

10 Mar 2020, 10:28

Last thing which just came up while testing my hypothesis. I will need to include some additional new incoming variables for the next period pm and gv weighted as in #36 and incoming for the t1 dichotomous variables for the next period as in the code I created in #46 and then include everything with all the other variables in the the collapse code in #52. Thanks and hope to be the last inquire on this tread.
Comment
Mario Ferri

Join Date: Jul 2019

Posts: 190
#55

15 Mar 2020, 11:40

Originally posted by Clyde Schechter View Post

So, with regard to always getting more_than_one_government = 0, this is happening because you misapplied the code shown in #39. You have the -collapse- calculating the max of n_governments, but the code there says, correctly, that you want to collapse to the count of n_governments. So your collapse command should be:

Code:

collapse (max) growth1 growth2 (count) n_governments (first) ID weighted_*, by(country ts )

Then why do you now post example data with only a single country? How can anyone troubleshoot the problem when you post data that does not reproduce the problem?

Well, it probably does play a role in the problem of getting type mismatch messages from -collapse-. It must mean that you are including some string variables there--which is not allowed (and isn't meaningful anyway). So review your full collapse command and see which variables are string variables there. Then get rid of them. You cannot collapse string variables. If the string variables are constant within groups of observations defined by country and ts, then you can carry them into the collapsed data set by including them in the -by()- option. If the string variables vary among observations within country and ts, then you have to eliminate them, or, before -collapse- select a single correct value for the entire country#ts group and then add the string variable to the -by()- option.

Last thing which just came up while testing my hypothesis. I will need to include some additional new incoming variables for the next period pm and gv weighted as in #36 and incoming for the t1 dichotomous variables for the next period as in the code I created in #46 and then include everything with all the other variables in the the collapse code in #52. Thanks and hope to be the last inquire on this tread
That means at time T0 for the expected pm and gv in the next period at time t1 and expected dichotomous variables in $46 at time T0 for the expected at time T1, assuming people adapt their behavior for the next period .

Many thanks
Comment

6-levels	weak v strong	single v multiple	non-old v old
1	1	0	?
2	1	1	?
3	1	1	?
4	0	0	?
5	0	1	?
6	?	?	1

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment