to Add

Steve Bereznai

Join Date: Jun 2022

Posts: 16
#1

to Add

04 Jun 2022, 04:38

Hello
I have a question.
My merge file is huge ( obseravtions are 163 718). I had an event. Before the event were 2 surveys (1 Wave and 2 Wave), after the event were 2 surveys (4 Wave and 5 Wave) It looks like:

Mergeid Wave

AT-01 1

AT-01 4

AT-02 1

AT-02 2

AT-02 5

BF-05 1

BF-05 2

CZ-44 2

CZ-44 4

CZ-45 5

My condition: I need only these observations what at least one time was participated in 1 or 2 wave (before the evenet) and at least one time was participated in 4 or 5 wave (after the event).
E.g. AT-01 is needed, because one time in 1 wave and one time in 4
AT-02 is needed, beacuse two times in before the event and one time after the event
BF-05 is not needed, because two times before the evenet and 0 times after the event
CZ-44 is needed, because one time before the evenet and one time after the event
CZ-45 is not needed, beacues 0 time in before event and one time after event

My task: drop these observations which are not needed.

Anybody has idea, how to do that?
Tags: margins, simple

Nick Cox

Join Date: Mar 2014
Posts: 36058

04 Jun 2022, 04:48

Somebody yesterday had about 160 million observations...

So, you want a positive count for wave 1 or 2 and also for wave 4 or 5. Your data example leaves ambiguous whether your identifier is a string variable or a numeric variable with value labels. But code could be the same either way. Please note our longstanding request to use dataex.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str5 mergeid byte wave
"AT-01" 1
"AT-01" 4
"AT-02" 1
"AT-02" 2
"AT-02" 5
"BF-05" 1
"BF-05" 2
"CZ-44" 2
"CZ-44" 4
"CZ-45" 5
end

.bysort mergeid : egen cond1 = total(wave == 1 | wave == 2)

.by mergeid : egen cond2 = total(wave == 4 | wave == 5)

.gen wantec = cond1 & cond2


. l, sepby(mergeid)

     +-----------------------------------------+
     | mergeid   wave   cond1   cond2   wanted |
     |-----------------------------------------|
  1. |   AT-01      1       1       1        1 |
  2. |   AT-01      4       1       1        1 |
     |-----------------------------------------|
  3. |   AT-02      1       2       1        1 |
  4. |   AT-02      2       2       1        1 |
  5. |   AT-02      5       2       1        1 |
     |-----------------------------------------|
  6. |   BF-05      1       2       0        0 |
  7. |   BF-05      2       2       0        0 |
     |-----------------------------------------|
  8. |   CZ-44      2       1       1        1 |
  9. |   CZ-44      4       1       1        1 |
     |-----------------------------------------|
 10. |   CZ-45      5       0       1        0 |
     +-----------------------------------------+

keep if wanted

Comment

Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#3

04 Jun 2022, 09:12

Somebody yesterday had about 160 million observations...

It's funny how discipline changes things. I don't know what discipline OP works in, but I was in an interview to be a research analyst this past Wednesday, and they asked me if I was comfortable working with big datasets. I asked them to define "big" since this is relative to your field. I usually work with 100,000+ observations in some form, but this to a psychologist would (presumably) be gigantic, and to others in.... I don't know, other disciplines, to be puny.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#4

04 Jun 2022, 09:33

To be fair, I looked again at the post I had in mind, and the person really said 60 million.

When I started teaching statistics, it was a rule of thumb around here that a reasonable size of dataset was 20 to 30 observations, as being fair for hand calculations with at most a small electronic calculator or (a bit later) as what each student might fairly be expected to type in to whatever software was being used.
Comment
Jared Greathouse

Join Date: Sep 2021

Posts: 2172
#5

04 Jun 2022, 09:50

"Type in"? I don't mean to sound like this, but did CSV files exist in those days?

I couldn't imagine manually typing in data-points.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 36058
#6

04 Jun 2022, 09:59

Typing at a keyboard was easy -- compared with punching your own cards (routine when I started with **the** University computer) or paper tape (routine for many contemporaries).

I still type in small datasets from books or papers.

Now consider early text file editors in which you couldn't see the file in question while you were editing it (and printing it out wasn't always trivial either).
1 like
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10482
#7

04 Jun 2022, 10:14

My econometrics instructor when I was pursuing my PhD (now retired) explained truncation in the following way: The secretary is inputting data from a sheet of paper and then she accidentally spills ink on the bottom half of the sheet, completely losing the bottom half of the data (the data is ordered in some way, e.g., individuals heights). That is right truncation. I am not that old (but barely in my 30's), but I get it since we used to use fountain pens and ink when I was younger and spilling fountain pen ink was a very common thing. I also remember typewriters as computers became mainstream (in my view) in the late 1980's / early 1990's. Very soon, I suspect that the truncation story will draw blank stares from a new generation of students.
1 like
Comment

Steve Bereznai

Join Date: Jun 2022
Posts: 16

05 Jun 2022, 13:58

Originally posted by Nick Cox View Post

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input str5 mergeid byte wave
"AT-01" 1
"AT-01" 4
"AT-02" 1
"AT-02" 2
"AT-02" 5
"BF-05" 1
"BF-05" 2
"CZ-44" 2
"CZ-44" 4
"CZ-45" 5
end

.bysort mergeid : egen cond1 = total(wave == 1 | wave == 2)

.by mergeid : egen cond2 = total(wave == 4 | wave == 5)

.gen wantec = cond1 & cond2


. l, sepby(mergeid)

+-----------------------------------------+
| mergeid wave cond1 cond2 wanted |
|-----------------------------------------|
1. | AT-01 1 1 1 1 |
2. | AT-01 4 1 1 1 |
|-----------------------------------------|
3. | AT-02 1 2 1 1 |
4. | AT-02 2 2 1 1 |
5. | AT-02 5 2 1 1 |
|-----------------------------------------|
6. | BF-05 1 2 0 0 |
7. | BF-05 2 2 0 0 |
|-----------------------------------------|
8. | CZ-44 2 1 1 1 |
9. | CZ-44 4 1 1 1 |
|-----------------------------------------|
10. | CZ-45 5 0 1 0 |
+-----------------------------------------+

keep if wanted

Hello
It works, but
It works if I write manually the observations.
I have already have a merge file with 163 718 observations.
How can I use my merge file?

I tried it, but did not work
input str5 mergeid byte wave
use "dataset9.dta", clear
end

Thank you

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment