Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cleaning dataset and aligning data table

    Hello, I'm new to stata and need to figure out how to clean an excel dataset using stata. My original dataset looks like this:
    (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
    no ses sdi tr_sdi bg_sdi tdi tr_tdi bg_tdi adi tr_adi bg_adi
    VARIABLES gender race charlson no controls no controls no controls no controls no controls no controls no controls no controls no controls
    gender = 1, Male 0.0230***
    (0.00326)
    RECODE of rti_race (Research Triangle Institute (RTI) Race Code) = 2, Black 0.1676***
    (0.00683)
    RECODE of rti_race (Research Triangle Institute (RTI) Race Code) = 3, Hispanic 0.2041***
    (0.00850)
    RECODE of rti_race (Research Triangle Institute (RTI) Race Code) = 4, Asian 0.1061***
    (0.01413)
    RECODE of rti_race (Research Triangle Institute (RTI) Race Code) = 5, Other 0.0956***
    (0.01761)
    chronic pulmonary disease 0.0472***
    (0.00394)
    rheumatic disease 0.0461***
    (0.00812)
    Any malignacy including lymphoma and lukemia, except malignant neoplasm of skin 0.0015
    (0.00564)
    metastatic cancer/metastatic solid tumor -0.0038
    (0.01462)
    hiv/aids 0.0820
    (0.06783)
    5 quantiles of raw_sdi = 2 0.0212***
    (0.00516)
    5 quantiles of raw_sdi = 3 0.0232***
    (0.00503)
    5 quantiles of raw_sdi = 4 0.0445***
    (0.00486)
    5 quantiles of raw_sdi = 5 0.1118***
    (0.00489)
    5 quantiles of raw_sdi = 2 0.0223***
    (0.00527)
    5 quantiles of raw_sdi = 3 0.0356***
    (0.00540)
    5 quantiles of raw_sdi = 4 0.0705***
    (0.00580)
    5 quantiles of raw_sdi = 5 0.1656***
    (0.00697)
    5 quantiles of raw_sdi = 2 0.0188***
    (0.00548)
    5 quantiles of raw_sdi = 3 0.0373***
    (0.00567)
    5 quantiles of raw_sdi = 4 0.0719***
    (0.00605)
    5 quantiles of raw_sdi = 5 0.1537***
    j
    and I need to figure out how to align the data and create a yes table of controls using stata like this
    (1) (2) (3) (4) (5) (6) (7) (8)
    SES Measure no ses no ses SDI TDI ADI ADI_MOD SVI SVI_MOD
    zcta zcta zcta zcta zcta zcta
    quintile quintile quintile quintile quintile quintile
    Controls
    Age cubic polynomial
    Gender
    Gender*age cubic polynomial
    Race/ethnicity
    Gender*race/ethnicity
    Reduced Charlson comorbidities
    Poverty
    SES Quintile
    SES Quintile = 2 0.0153*** 0.0148*** 0.0208*** 0.0203*** 0.0053*** 0.0140***
    SES Quintile = 3 0.0318*** 0.0261*** 0.0429*** 0.0422*** 0.0196*** 0.0311***
    SES Quintile = 4 0.0532*** 0.0375*** 0.0620*** 0.0615*** 0.0455*** 0.0534***
    SES Quintile = 5 0.0893*** 0.0658*** 0.0874*** 0.0908*** 0.0831*** 0.0902***
    Age cubic polynomial
    age 0.0093*** 0.0081***
    Gender
    Male 0.0231*** 0.0255***
    Race/ethnicity
    Black 0.1214*** 0.1223***
    Hispanic 0.1021*** 0.1127***
    Asian 0.0943*** 0.1044***
    Other 0.0795*** 0.0869***
    Reduced Charlson comorbidities
    chronic pulmonary disease 0.1104***
    rheumatic disease 0.0469***
    Any malignacy including lymphoma and lukemia, except malignant neoplasm of skin 0.0451***
    metastatic cancer/metastatic solid tumor 0.0072***
    hiv/aids 0.0961***
    Poverty Quintile
    Poverty Quintile = 2
    Poverty Quintile = 3
    Poverty Quintile = 4
    Poverty Quintile = 5
    I'm confused on how to align the data and substitute the variable names. Any help much appreciated
    Last edited by Haley Cha; 26 Jul 2023, 18:28. Reason: Did not figure it out

  • #2
    Hi Haley, welcome to the forum.

    This doesn't look like a dataset - it looks like regression results, with stars indicating significance, and it looks like in some cases, standard errors in parentheses. Can you say a bit more about what you're trying to do here?

    Comment


    • #3
      I'm not sure if the data are regression results - I was just given it in excel format. The job is to basically align the quintile results and clean the worksheet, substituting variable names and constructing a yes table for controls. The second graph is what the ultimate format should look like, but I'm not sure how to use Stata to rename the variables and align the data.

      Comment


      • #4
        If you want code for this, please generate a data example using the dataex command. You'll probably want to give us the entire dataset. Wrap the data example in CODE tags (see the hashtag symbol in the editor) when you paste the data into a post. Right now it looks like you don't even have valid column names. That said:

        the job is to basically align the quintile results and clean the worksheet,
        This should be very straightforward to do by hand in excel. You've basically already done this just by generating the second example. Does this need to be automated with Stata for some reason? We can help you if you need given a data example, but it just seems easier to do by hand.

        I'm not sure how to use Stata to rename the variables
        Ditto. Why not just paste the variable names you've already written down here into excel?

        constructing a yes table for controls.
        What is a "yes table"?

        Comment

        Working...
        X