Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I create a dataset of observations from a table

    Hey there SL,

    This might be pretty basic but I am struggling to do this or find an answer in the forums.

    I have information on a sample of several thousands of patients, but I have it in the form of a 9x4 table (9 values of category A as rows, 4 variables of category B as columns, and then the frequency of patient to each pairing of A:B). I needed to convert this into a dataset of observations so I can run tests on it.

    Example, I have this:

    __B1 B2 B3 B4
    A1 5 20 15 5
    A2 ....
    A3
    A4

    And I need to have this:

    ___varA varB
    ID1 1. 1
    ID2 1. 1
    ID3 1 1
    ID4 1 1
    ID5 1 1
    ID6 1 2
    ....

    Is there any way of converting the table into a list other than by hand? The only similar thing I could find was the cci command that directly does a chi2 from a inputed table, but it only works for 2x2 and I am restricted to chi2.

    Thank you!




    Last edited by Miguel Rato; 24 May 2023, 12:43.

  • #2
    The first step really depends in detail on the layout of this table. Assuming it is in a plain text file and is tab or comma separated, and that the B's appear in the very first row of the table, you can do something like this:
    Code:
    import delimited using name_of_table_file, varnames(1)
    Then you need to verify in the Data Browser that the result you have looks like a table. Also, what shows up as the name of the first variable? If it is __b1, then you need to give that a new variable name and rename the other b's. That would go like this:
    Code:
    rename __b1 a
    rename b# freq#, renumber(1)
    rename var5 freq4
    (If after the -import delmited- the name of the first variable is __, or something similar, just -rename- it to A.)

    The next step is a reshape and cleaning up the A variable:
    Code:
    reshape long freq, i(a) j(b)
    destring a, ignore("A") replace
    That should give you what you are looking for.

    The main problem is that I can't be sure from your description of the original table how it will look after -import delimited-, and if I have that wrong, the rest of the code may not get what you want. If what I have suggested here does not work, please post back and use the -dataex- command to show what you get from -import delimited-, and then we can figure out how to go from there. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    Comment


    • #3
      Dear Clyde,

      Thank you so much for your help. I'm sorry I wasn't clear in my explanation. Maybe if I show you a mock example:

      Imagine I have this:

      Click image for larger version

Name:	Screenshot 2023-05-24 at 22.27.35.png
Views:	1
Size:	97.6 KB
ID:	1714835


      I need to have, instead of that, a dataset that has 727 observations (the sum of all), that has one variable (id) that can range from a to e, and another variable V that ranges from X to Z (or a variable X that is either 0 or 1, one Y that is either 0 or 1, and one Z that is either 0 or 1). I need each subject represented as frequency on the boxes of the table to be its own observation, that has a pair of values (a:e) and (x:z).

      Sorry if I'm not making myself clear, and thank you for the help!

      Best,

      Miguel


      Comment


      • #4
        Now I see what you have and what you want.
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input str1 id float(X Y Z)
        "a" 10 59 90
        "b" 20 80 90
        "c" 10 78 10
        "d" 40 90 10
        "e" 50 80 10
        end
        
        rename (X Y Z) freq=
        reshape long freq, i(id) j(V) string
        
        expand freq
        drop freq
        
        sort id V
        In the future, please avoid using screenshots to show data examples. First, they often are not readable on the Forum (although in this case it is). More important, even when readable, while they give a visual impression of the data, they do not reveal metadata, which can be important for writing correct code, and there is no way to import them into Stata to test out code. In this case the problem is clear enough and the code required simple enough that it is possible to work with this. But usually it is not. The most effective way to show example data is with the -dataex- command. In #2, I explained how to find and use the -dataex- command--please always use that in the future when showing data. You can see the output that -dataex- produces in my code above. When it runs in Stata., it produces a faithful replica of the example data, and code can then be run and tested on that.

        Comment


        • #5
          Perfect.
          Thank you so much.

          Comment

          Working...
          X