Selecting specific rows from imported excel spreadsheet data

Akib Khan

Join Date: Jun 2016

Posts: 15
#1

Selecting specific rows from imported excel spreadsheet data

06 Jun 2016, 13:11

Dear All,

I have imported data from an excel spreadsheet which looks like this:
table 1

a b c d e

1 3 1 3 5

4 4 4 4 7

5 2 5 2 2

6 2 6 2 3

table 2

t u y q p

Note that this is only an excerpt but the pattern repeats itself in the data-set. I admit that the format looks a bit weird but this is what I get after converting the pdf to excel!

Now my query is: How do I code to keep data only from the red-shaded rows? Variable names are not a concern for this exercise. Please assume that they are var1, var 2, and so on.

Thanks in advance!

Pleasure,

Akib

Last edited by Akib Khan; 06 Jun 2016, 13:29.
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

06 Jun 2016, 13:25

Hello Akib,

Welcome to the Stata Forum.

Unfortunately, I cannot see any color in #1.

But I believe you may wish to take a look at - cellrange - option.

Something like: cellrange(start :end)

Best,

Marcos

Best regards,

Marcos
Comment
Akib Khan

Join Date: Jun 2016

Posts: 15
#3

06 Jun 2016, 13:34

Originally posted by Marcos Almeida View Post

Hello Akib,

Welcome to the Stata Forum.

Unfortunately, I cannot see any color in #1.

But I believe you may wish to take a look at - cellrange - option.

Something like: cellrange(start :end)

Best,

Marcos

Sorry, Marcos. Edited the post to show the colors. I am aware of the 'cellrange' option. However, given that this pattern repeats itself (and the number of rows b/w 'table1' and 'table2' isn't identical across the occurrences), that option isn't sufficient for my purpose.
Comment

Marcos Almeida

Join Date: Apr 2014
Posts: 4047

06 Jun 2016, 13:58

Ok, now I can see it.

Please check whether such a spreadsheet (plus commands in Stata) would do the trick for you:

Click image for larger version

Name: mytest.png
Views: 1
Size: 19.9 KB
ID: 1344222

Code:

. import excel "D:\Dados\Documents\Statistics\Exercises\File_stata.xlsx", sheet("Plan1") cellrange(A2:E6)

. list

     +-------------------+
     | A   B   C   D   E |
     |-------------------|
  1. | 2   3   4   6   3 |
  2. | 3   2   5   7   4 |
  3. | 4   3   4   8   5 |
  4. | 5   4   5   5   3 |
  5. | 6   5   5   4   6 |
     +-------------------+

Best regards,

Marcos

Comment

ashar ata

Join Date: Nov 2014

Posts: 29
#5

06 Jun 2016, 13:59

Akib-
Do the data look like this in stata or in excel?
If its in stata and the names a, b, c etc and t, u y, etc and the string table1, table2 do not matter i.e. you can delete them, then all you need to do is sort the databy the column that has word "table1" in it and then delete all the string data as it will sort and accumulate on top or bottom of your datasheet.

If the data look like this in excel then you can do the same thing in excel as well.

Or do you want to import each table separately into stata?

Thanks
Ashar
Comment

Akib Khan

Join Date: Jun 2016
Posts: 15

06 Jun 2016, 14:24

Dear Marcos and Ashar,

Thanks so much, first of all, for your suggestions. I feel that I should have been more explicit about my data structure. Let me post a better example:

var1	var2	var3	var4	var5
table 1
a	b	c	d	e
1	3	1	3	5
4	4	4	4	7
5	2	5	2	2
6	2	6	2	3
table 2
t	u	y	q	p
1	3	1	3	5
4	4	4	4	7
5	2	5	2	2
6	2	6	2	3
table 3
s	k	b	g	r
1	3	1	3	5
4	4	4	4	7
5	2	5	2	2
6	2	6	2	3
table 1
a	b	c	d	e
1	3	1	3	5
4	4	4	4	7
5	2	5	2	2
6	2	6	2	3
table 2
t	u	y	q	p
1	3	1	3	5
4	4	4	4	7
5	2	5	2	2
6	2	6	2	3
table 3
s	k	b	g	r
1	3	1	3	5
4	4	4	4	7
5	2	5	2	2
6	2	6	2	3

Firstly, this is what the data looks like within STATA and as before, please note that this pattern repeats itself hundreds of times within the data-set. Now, I need to keep the red-shaded observations i.e., rows that lie b/w 'table1' and 'table2' (excluding that containing alphabets a, b, c, ...). What would the code look like?

Thanks again, in advance!

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30117

06 Jun 2016, 14:44

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str7 var1 str1(var2 var3 var4 var5)
"table 1" ""  ""  ""  ""
"a"       "b" "c" "d" "e"
"1"       "3" "1" "3" "5"
"4"       "4" "4" "4" "7"
"5"       "2" "5" "2" "2"
"6"       "2" "6" "2" "3"
"table 2" ""  ""  ""  ""
"t"       "u" "y" "q" "p"
"1"       "3" "1" "3" "5"
"4"       "4" "4" "4" "7"
"5"       "2" "5" "2" "2"
"6"       "2" "6" "2" "3"
"table 3" ""  ""  ""  ""
"s"       "k" "b" "g" "r"
"1"       "3" "1" "3" "5"
"4"       "4" "4" "4" "7"
"5"       "2" "5" "2" "2"
"6"       "2" "6" "2" "3"
"table 1" ""  ""  ""  ""
"a"       "b" "c" "d" "e"
"1"       "3" "1" "3" "5"
"4"       "4" "4" "4" "7"
"5"       "2" "5" "2" "2"
"6"       "2" "6" "2" "3"
"table 2" ""  ""  ""  ""
"t"       "u" "y" "q" "p"
"1"       "3" "1" "3" "5"
"4"       "4" "4" "4" "7"
"5"       "2" "5" "2" "2"
"6"       "2" "6" "2" "3"
"table 3" ""  ""  ""  ""
"s"       "k" "b" "g" "r"
"1"       "3" "1" "3" "5"
"4"       "4" "4" "4" "7"
"5"       "2" "5" "2" "2"
"6"       "2" "6" "2" "3"
end

gen obs_no = _n
by obs_no, sort: gen table = var1 if strpos(var1, "table")
replace table = table[_n-1] if missing(table)
keep if table == "table 1"
gen first_char = substr(var1, 1, 1)
drop if strpos("`c(alpha)'", first_char)

should do it, I think.

In this instance, I think you had good reason to present your data in tabular form because you wanted to color-code the observations you needed to keep. But, in general, going forward, please use -dataex- to post sample data. (-ssc install dataex-, -help dataex-).

By the way, your explanation contradicts your color-coding. You said in words that you wanted to keep tables 1 and 2, but you red-coded only table 1. The code above keeps only table 1. If you actually want both tables 1 and 2, then replace -keep if table == "table 1"- with -keep if inlist(table, "table 1", "table 2")-.

Last edited by Clyde Schechter; 06 Jun 2016, 14:46.

Comment

Akib Khan

Join Date: Jun 2016
Posts: 15

12 Jul 2016, 05:32

Originally posted by Clyde Schechter View Post

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input str7 var1 str1(var2 var3 var4 var5)
"table 1" "" "" "" ""
"a" "b" "c" "d" "e"
"1" "3" "1" "3" "5"
"4" "4" "4" "4" "7"
"5" "2" "5" "2" "2"
"6" "2" "6" "2" "3"
"table 2" "" "" "" ""
"t" "u" "y" "q" "p"
"1" "3" "1" "3" "5"
"4" "4" "4" "4" "7"
"5" "2" "5" "2" "2"
"6" "2" "6" "2" "3"
"table 3" "" "" "" ""
"s" "k" "b" "g" "r"
"1" "3" "1" "3" "5"
"4" "4" "4" "4" "7"
"5" "2" "5" "2" "2"
"6" "2" "6" "2" "3"
"table 1" "" "" "" ""
"a" "b" "c" "d" "e"
"1" "3" "1" "3" "5"
"4" "4" "4" "4" "7"
"5" "2" "5" "2" "2"
"6" "2" "6" "2" "3"
"table 2" "" "" "" ""
"t" "u" "y" "q" "p"
"1" "3" "1" "3" "5"
"4" "4" "4" "4" "7"
"5" "2" "5" "2" "2"
"6" "2" "6" "2" "3"
"table 3" "" "" "" ""
"s" "k" "b" "g" "r"
"1" "3" "1" "3" "5"
"4" "4" "4" "4" "7"
"5" "2" "5" "2" "2"
"6" "2" "6" "2" "3"
end

gen obs_no = _n
by obs_no, sort: gen table = var1 if strpos(var1, "table")
replace table = table[_n-1] if missing(table)
keep if table == "table 1"
gen first_char = substr(var1, 1, 1)
drop if strpos("`c(alpha)'", first_char)

Thanks so much! It worked beautifully!

Announcement