Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Selecting specific rows from imported excel spreadsheet data

    Dear All,

    I have imported data from an excel spreadsheet which looks like this:
    table 1
    a b c d e
    1 3 1 3 5
    4 4 4 4 7
    5 2 5 2 2
    6 2 6 2 3
    table 2
    t u y q p
    Note that this is only an excerpt but the pattern repeats itself in the data-set. I admit that the format looks a bit weird but this is what I get after converting the pdf to excel!

    Now my query is: How do I code to keep data only from the red-shaded rows? Variable names are not a concern for this exercise. Please assume that they are var1, var 2, and so on.

    Thanks in advance!

    Pleasure,

    Akib
    Last edited by Akib Khan; 06 Jun 2016, 13:29.

  • #2
    Hello Akib,

    Welcome to the Stata Forum.

    Unfortunately, I cannot see any color in #1.

    But I believe you may wish to take a look at - cellrange - option.

    Something like: cellrange(start :end)

    Best,

    Marcos
    Best regards,

    Marcos

    Comment


    • #3
      Originally posted by Marcos Almeida View Post
      Hello Akib,

      Welcome to the Stata Forum.

      Unfortunately, I cannot see any color in #1.

      But I believe you may wish to take a look at - cellrange - option.

      Something like: cellrange(start :end)

      Best,

      Marcos

      Sorry, Marcos. Edited the post to show the colors. I am aware of the 'cellrange' option. However, given that this pattern repeats itself (and the number of rows b/w 'table1' and 'table2' isn't identical across the occurrences), that option isn't sufficient for my purpose.

      Comment


      • #4
        Ok, now I can see it.

        Please check whether such a spreadsheet (plus commands in Stata) would do the trick for you:

        Click image for larger version

Name:	mytest.png
Views:	1
Size:	19.9 KB
ID:	1344222

        Code:
        . import excel "D:\Dados\Documents\Statistics\Exercises\File_stata.xlsx", sheet("Plan1") cellrange(A2:E6)
        
        . list
        
             +-------------------+
             | A   B   C   D   E |
             |-------------------|
          1. | 2   3   4   6   3 |
          2. | 3   2   5   7   4 |
          3. | 4   3   4   8   5 |
          4. | 5   4   5   5   3 |
          5. | 6   5   5   4   6 |
             +-------------------+
        Best regards,

        Marcos

        Comment


        • #5
          Akib-
          Do the data look like this in stata or in excel?
          If its in stata and the names a, b, c etc and t, u y, etc and the string table1, table2 do not matter i.e. you can delete them, then all you need to do is sort the databy the column that has word "table1" in it and then delete all the string data as it will sort and accumulate on top or bottom of your datasheet.

          If the data look like this in excel then you can do the same thing in excel as well.

          Or do you want to import each table separately into stata?

          Thanks
          Ashar

          Comment


          • #6
            Dear Marcos and Ashar,

            Thanks so much, first of all, for your suggestions. I feel that I should have been more explicit about my data structure. Let me post a better example:
            var1 var2 var3 var4 var5
            table 1
            a b c d e
            1 3 1 3 5
            4 4 4 4 7
            5 2 5 2 2
            6 2 6 2 3
            table 2
            t u y q p
            1 3 1 3 5
            4 4 4 4 7
            5 2 5 2 2
            6 2 6 2 3
            table 3
            s k b g r
            1 3 1 3 5
            4 4 4 4 7
            5 2 5 2 2
            6 2 6 2 3
            table 1
            a b c d e
            1 3 1 3 5
            4 4 4 4 7
            5 2 5 2 2
            6 2 6 2 3
            table 2
            t u y q p
            1 3 1 3 5
            4 4 4 4 7
            5 2 5 2 2
            6 2 6 2 3
            table 3
            s k b g r
            1 3 1 3 5
            4 4 4 4 7
            5 2 5 2 2
            6 2 6 2 3
            Firstly, this is what the data looks like within STATA and as before, please note that this pattern repeats itself hundreds of times within the data-set. Now, I need to keep the red-shaded observations i.e., rows that lie b/w 'table1' and 'table2' (excluding that containing alphabets a, b, c, ...). What would the code look like?

            Thanks again, in advance!

            Comment


            • #7
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input str7 var1 str1(var2 var3 var4 var5)
              "table 1" ""  ""  ""  ""
              "a"       "b" "c" "d" "e"
              "1"       "3" "1" "3" "5"
              "4"       "4" "4" "4" "7"
              "5"       "2" "5" "2" "2"
              "6"       "2" "6" "2" "3"
              "table 2" ""  ""  ""  ""
              "t"       "u" "y" "q" "p"
              "1"       "3" "1" "3" "5"
              "4"       "4" "4" "4" "7"
              "5"       "2" "5" "2" "2"
              "6"       "2" "6" "2" "3"
              "table 3" ""  ""  ""  ""
              "s"       "k" "b" "g" "r"
              "1"       "3" "1" "3" "5"
              "4"       "4" "4" "4" "7"
              "5"       "2" "5" "2" "2"
              "6"       "2" "6" "2" "3"
              "table 1" ""  ""  ""  ""
              "a"       "b" "c" "d" "e"
              "1"       "3" "1" "3" "5"
              "4"       "4" "4" "4" "7"
              "5"       "2" "5" "2" "2"
              "6"       "2" "6" "2" "3"
              "table 2" ""  ""  ""  ""
              "t"       "u" "y" "q" "p"
              "1"       "3" "1" "3" "5"
              "4"       "4" "4" "4" "7"
              "5"       "2" "5" "2" "2"
              "6"       "2" "6" "2" "3"
              "table 3" ""  ""  ""  ""
              "s"       "k" "b" "g" "r"
              "1"       "3" "1" "3" "5"
              "4"       "4" "4" "4" "7"
              "5"       "2" "5" "2" "2"
              "6"       "2" "6" "2" "3"
              end
              
              gen obs_no = _n
              by obs_no, sort: gen table = var1 if strpos(var1, "table")
              replace table = table[_n-1] if missing(table)
              keep if table == "table 1"
              gen first_char = substr(var1, 1, 1)
              drop if strpos("`c(alpha)'", first_char)
              should do it, I think.

              In this instance, I think you had good reason to present your data in tabular form because you wanted to color-code the observations you needed to keep. But, in general, going forward, please use -dataex- to post sample data. (-ssc install dataex-, -help dataex-).

              By the way, your explanation contradicts your color-coding. You said in words that you wanted to keep tables 1 and 2, but you red-coded only table 1. The code above keeps only table 1. If you actually want both tables 1 and 2, then replace -keep if table == "table 1"- with -keep if inlist(table, "table 1", "table 2")-.

              Last edited by Clyde Schechter; 06 Jun 2016, 14:46.

              Comment


              • #8
                Originally posted by Clyde Schechter View Post
                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input str7 var1 str1(var2 var3 var4 var5)
                "table 1" "" "" "" ""
                "a" "b" "c" "d" "e"
                "1" "3" "1" "3" "5"
                "4" "4" "4" "4" "7"
                "5" "2" "5" "2" "2"
                "6" "2" "6" "2" "3"
                "table 2" "" "" "" ""
                "t" "u" "y" "q" "p"
                "1" "3" "1" "3" "5"
                "4" "4" "4" "4" "7"
                "5" "2" "5" "2" "2"
                "6" "2" "6" "2" "3"
                "table 3" "" "" "" ""
                "s" "k" "b" "g" "r"
                "1" "3" "1" "3" "5"
                "4" "4" "4" "4" "7"
                "5" "2" "5" "2" "2"
                "6" "2" "6" "2" "3"
                "table 1" "" "" "" ""
                "a" "b" "c" "d" "e"
                "1" "3" "1" "3" "5"
                "4" "4" "4" "4" "7"
                "5" "2" "5" "2" "2"
                "6" "2" "6" "2" "3"
                "table 2" "" "" "" ""
                "t" "u" "y" "q" "p"
                "1" "3" "1" "3" "5"
                "4" "4" "4" "4" "7"
                "5" "2" "5" "2" "2"
                "6" "2" "6" "2" "3"
                "table 3" "" "" "" ""
                "s" "k" "b" "g" "r"
                "1" "3" "1" "3" "5"
                "4" "4" "4" "4" "7"
                "5" "2" "5" "2" "2"
                "6" "2" "6" "2" "3"
                end
                
                gen obs_no = _n
                by obs_no, sort: gen table = var1 if strpos(var1, "table")
                replace table = table[_n-1] if missing(table)
                keep if table == "table 1"
                gen first_char = substr(var1, 1, 1)
                drop if strpos("`c(alpha)'", first_char)
                should do it, I think.

                In this instance, I think you had good reason to present your data in tabular form because you wanted to color-code the observations you needed to keep. But, in general, going forward, please use -dataex- to post sample data. (-ssc install dataex-, -help dataex-).

                By the way, your explanation contradicts your color-coding. You said in words that you wanted to keep tables 1 and 2, but you red-coded only table 1. The code above keeps only table 1. If you actually want both tables 1 and 2, then replace -keep if table == "table 1"- with -keep if inlist(table, "table 1", "table 2")-.

                Thanks so much! It worked beautifully!

                Comment

                Working...
                X