Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • New variables using longitudinal data

    May you assist with syntax for longitudinal data.

    1. Within each record "personid" I would like to compute "total episodes of homes in each year", where each record is a unique episode = "tot_episode_YEAR"

    2. I also would like to compute the total number of unique home types for each individual in each year: 2015;2016 etc = "tot_home_type" and values appear in each row of data. This is based on "home_type"

    See below the final output and variables:
    personid date_episode home_type tot_episode_YEAR tot_home_type
    1 4-Nov-15 rough 2 1
    1 20-Nov-15 rough 2 1
    1 23-Feb-16 transit 3 2
    1 23-May-16 episodic 3 2
    1 6-Nov-16 transit 3 2
    1 10-Mar-17 private 4 4
    1 6-Jun-17 rough 4 4
    1 9-Aug-17 transit 4 4
    1 15-Dec-17 private 4 4
    2 6-Jun-15 transit 2 2
    2 8-May-15 private 2 2
    2 6-Nov-16 rough 1 1

  • #2
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte personid float date_episode str9 home_type
    1 20396 "rough "   
    1 20412 "rough "   
    1 20507 "transit "
    1 20597 "episodic "
    1 20764 "transit "
    1 20888 "private "
    1 20976 "rough "   
    1 21040 "transit "
    1 21168 "private "
    2 20245 "transit "
    2 20216 "private "
    2 20764 "rough "   
    end
    format %td date_episode
    
    gen year = yofd(date_episode)
    
    by personid year, sort: gen tot_episode_year = _N
    
    by personid year home_type, sort: gen home_type_flag = (_n == 1)
    by personid year: egen tot_home_type = total(home_type_flag)
    Note: for tot_home_type this code produces results that differ from your calculation in #1. For personid 1 in 2017, you show tot_home_type = 4. But looking at the home type variable, there are only three distinct home types: private (x2), rough, and transit. My code counts this as 3, which I believe is what you actually want.

    Based on your questions here and in an earlier post today, I think you would greatly benefit from reading the PDF manual sections on -by- and -egen-. These are key elements to the solution of many common data management problems. More generally, if you have not done so previously, read the Getting Started [GS] and User's Guide [U] sections of the PDF manual. They will give you an overview of the most core Stata commands. It's a fairly long read and you won't remember all the details--but you will come away from it being able to resolve most basic data management and analysis questions either completely on your own, or with a quick read of the help file for a command or two to refresh your memory on the specifics of the syntax. The time you invest in doing this will be amply repaid.

    In the future, when showing data examples, please use the -dataex- command to do so, as I have here. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

    When asking for help with code, always show example data. When showing example data, always use -dataex-.

    Comment


    • #3
      Thank you so much for the detailed, accurate and very useful response.
      1. First, yes I made a mistake on my calculations since I did this manually with the few examples.
      2. I will definitely read the manual. Can I download this from STATA's help command/ website where can I get this resource?
      3. I will also use the dataex as recommended. I have just installed it now.

      Lastly, I realized one omission in my question. How can I modify the syntax to exclude records where home_type=0 i.e. home type is "not stated" in my calculation of "tot_home_type" ;

      Thanks once again.

      Comment


      • #4
        Re 2. To get to the manual, launch Stata. Click on Help in the bar at the top of the page, and select PDF Documentation from the menu that drops down. (In version 16 it's at the top of the menu, in earlier versions it was at or near the bottom.) Your PDF reader will launch and open the documentation. You can then navigate through it the way you would any other PDF to get to the appropriate sections.

        Re Lastly:

        Code:
        by personid year home_type, sort: gen home_type_flag = (_n == 1)
        by personid year: egen tot_home_type = total(cond(home_type != 0, home_type_flag, .))

        Comment


        • #5
          New post following from the above:
          This is longitudinal data based on one individual and episodes of homelessness over time
          I would like some help to to replace "episode90" variable with a new variable "episodeNEW"
          "homeless" variable is based on any recorded episode from "ethos_1 rough, ethos_2 temp....
          "episode90" is a variable created based on recorded homeless episode, but with a difference of 90 days or more from previous episode, i.e. episode90="episode_length>=90 & homeless==1"
          However if an episode is recorded first time i.e. "homeless==1" FOR THAT particular "year" even if episode_length<90 days, please TAG this as episodeNEW shown.
          Input str9 episode_length float date_housing float year str9 episode90 str9 episodeNEW str9 homeless str9 ethos_1rough str9 ethos_2emerg str9 ethos_3temp str9 ethos_6institution

          Episode_length Date_housing Year episode90 episodeNEW homeless ethos_1rough ethos_2emerg ethos_3temp ethos_6institution
          . 13-Jan-14 2 0 0 0 0 0 0 0
          11 24-Jan-14 2 0 0 0 0 0 0 0
          108 12-May-14 2 0 0 0 0 0 0 0
          36 17-Jun-14 2 0 0 0 0 0 0 0
          220 23-Jan-15 2 0 0 0 0 0 0 0
          1 24-Jan-15 2 0 0 0 0 0 0 0
          10 3-Feb-15 2 0 0 0 0 0 0 0
          6 9-Feb-15 2 0 0 0 0 0 0 0
          20 1-Mar-15 2 0 0 0 0 0 0 0
          1 2-Mar-15 2 0 0 0 0 0 0 0
          8 10-Mar-15 2 0 0 0 0 0 0 0
          2 12-Mar-15 2 0 0 0 0 0 0 0
          34 15-Apr-15 2 0 0 0 0 0 0 0
          1 16-Apr-15 2 0 0 0 0 0 0 0
          1 17-Apr-15 2 0 0 0 0 0 0 0
          1 18-Apr-15 2 0 0 0 0 0 0 0
          0 18-Apr-15 2 0 0 0 0 0 0 0
          3 21-Apr-15 2 0 1 1 0 0 1 0
          4 25-Apr-15 2 0 0 0 0 0 0 0
          1 26-Apr-15 2 0 0 0 0 0 0 0
          1 30-Oct-15 2 0 0 0 0 0 0 0
          22 21-Nov-15 3 0 0 0 0 0 0 0
          5 26-Nov-15 3 0 0 0 0 0 0 0
          56 21-Jan-16 3 0 1 1 0 1 0 1
          26 16-Feb-16 3 0 0 1 1 0 0 0
          16 3-Mar-16 3 0 0 0 0 0 0 0
          3 6-Mar-16 3 0 0 1 1 0 1 0
          120 6-Jul-16 3 1 1 1 0 0 1 0
          3 6-Mar-16 3 0 0 1 0 0 1 0
          2 8-Mar-16 3 0 0 1 0 0 1 0
          0 10-Mar-16 3 0 0 1 1 0 1 0
          2 10-Mar-16 3 0 0 1 1 0 0 1



          Comment


          • #6
            The example data you show is not -dataex- output. It looks like perhaps you start to post -dataex- output, but then replaced it with output from -list-. Please post back with actual -dataex- output as example data. Refer to #2 for information on finding the -dataex- program.

            Also please clarify the following:
            1. Suppose a person has episodes on 1-Jan-16, 1-Mar-16, and 1-May-16. Each of the last two episodes is within 90 days of the preceding one. But 1-May-16 is beyond 90 days from 1-Jan-16. So is this one episode that goes beyond 90 days, so that only the 1-Jan-16 episode is considered "new", or does the 1-May-16 episode qualify as "new?"

            2. You have "year" in quotation marks. Does this refer to the variable named year in your data set? Or to the calendar year of the date_housing variable?

            Comment


            • #7
              Thanks Clyde. I think I didnt quite understand how to use dataex and hope this time I have used it correctly. The variable "year" replaced with "delay10" below which is not the year extracted from "date_housing". In response to your question above, if 1-Jan-16 is the first record or the 1st record in that year where homeless==1, then this is considered "new", any other episodes from there would be counted as "new" if they are 90 or more days from the previous record. So in the example above, only 1-Jan-16 is the only "new episode".

              Data shown below:

              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input float(episode_length date_housing delay10 episode90 homeless_all ethos_1rough ethos_2emerg ethos_6institution)
                . 19736 1 . 0 0 0 0
               11 19747 1 . 0 0 0 0
              108 19855 1 . 0 0 0 0
               36 19891 1 . 0 0 0 0
              220 20111 2 . 0 0 0 0
                1 20112 2 . 0 0 0 0
               10 20122 2 . 0 0 0 0
                6 20128 2 . 0 0 0 0
               20 20148 2 . 0 0 0 0
                1 20149 2 . 0 0 0 0
                8 20157 2 . 0 0 0 0
                2 20159 2 . 0 0 0 0
               34 20193 2 . 0 0 0 0
                1 20194 2 . 0 0 0 0
                1 20195 2 . 0 0 0 0
                1 20196 2 . 0 0 0 0
                0 20196 2 . 0 0 0 0
                3 20199 2 0 1 0 0 0
                4 20203 2 . 0 0 0 0
                1 20204 2 . 0 0 0 0
                1 20205 2 . 0 0 0 0
                1 20206 2 . 0 0 0 0
                1 20207 2 . 0 0 0 0
                1 20208 2 . 0 0 0 0
                6 20214 2 . 0 0 0 0
                1 20215 2 . 0 0 0 0
               10 20225 2 . 0 0 0 0
                3 20228 2 . 0 0 0 0
                3 20231 2 . 0 0 0 0
                1 20232 2 . 0 0 0 0
                8 20240 2 0 1 1 0 0
                2 20242 2 0 1 0 1 0
                6 20248 2 . 0 0 0 0
               64 20312 2 . 0 0 0 0
               26 20338 2 0 1 1 0 0
               11 20349 2 . 0 0 0 0
               11 20360 2 0 1 0 1 0
                2 20362 2 . 0 0 0 0
                0 20375 2 0 1 0 0 0
               13 20375 2 0 1 0 0 0
                1 20376 2 0 1 0 1 0
                1 20377 2 0 1 0 1 0
                5 20382 2 0 1 0 1 0
                1 20383 2 0 1 0 1 0
               30 20413 3 0 1 0 1 0
                5 20418 3 0 1 0 1 0
               56 20474 3 0 1 1 0 0
               26 20500 3 0 1 1 0 0
               16 20516 3 . 0 0 0 0
                0 20519 3 0 1 1 0 0
                0 20519 3 0 1 1 0 0
                3 20519 3 0 1 1 0 0
                2 20521 3 0 1 1 0 0
                0 20523 3 0 1 1 1 0
                2 20523 3 0 1 1 1 0
                0 20523 3 0 1 1 1 0
                2 20525 3 0 1 0 0 1
                3 20528 3 0 1 0 0 1
                1 20529 3 0 1 1 0 0
                4 20533 3 0 1 0 0 0
                4 20537 3 0 1 1 0 0
                5 20542 3 0 1 1 0 0
                0 20543 3 0 1 0 1 0
                1 20543 3 . 0 0 0 0
               11 20554 3 0 1 0 1 0
                0 20563 3 0 1 1 1 0
                9 20563 3 0 1 1 1 0
                1 20564 3 0 1 1 0 0
                2 20566 3 0 1 0 1 0
                3 20569 3 0 1 1 1 0
                1 20570 3 0 1 1 0 0
                7 20577 3 0 1 1 0 0
                8 20585 3 0 1 1 0 0
                8 20593 3 0 1 1 0 0
               10 20603 3 0 1 0 0 0
                2 20605 3 0 1 1 0 0
                1 20606 3 0 1 1 0 0
                3 20609 3 . 0 0 0 0
                1 20610 3 0 1 1 0 0
                1 20611 3 0 1 0 0 0
                3 20614 3 0 1 0 0 1
               14 20628 3 . 0 0 0 0
                2 20630 3 0 1 1 0 0
                1 20631 3 0 1 0 0 1
                1 20632 3 . 0 0 0 0
                1 20633 3 0 1 0 0 1
                2 20635 3 0 1 1 0 0
                4 20639 3 0 1 0 0 1
                1 20640 3 0 1 0 0 1
                1 20641 3 0 1 1 0 0
                0 20642 3 0 1 0 1 0
                1 20642 3 0 1 0 1 0
               11 20653 3 0 1 0 1 0
                0 20656 3 . 0 0 0 0
                3 20656 3 . 0 0 0 0
                2 20658 3 0 1 0 0 0
                1 20659 3 . 0 0 0 0
                2 20661 3 0 1 0 0 0
                1 20662 3 0 1 0 0 1
                2 20664 3 0 1 1 0 0
              end
              format %d date_housing
              label values delay10 delay10
              label def delay10 1 "0-1 year", modify
              label def delay10 2 "1-2 years", modify
              label def delay10 3 "2-3 years", modify

              Comment


              • #8
                Code:
                gen int year = year(date_housing)
                
                by year (date_housing), sort: gen byte episode_new = ///
                    (_n == 1) | (date_housing - date_housing[_n-1] > 90)
                is, I think, what you are looking for.

                If your full data set contains more than one person, then I would imagine you will want to do this separately for each person. If the variable person_id identifies people, then the code is exactly the same except that -by year (date_housing)...- becomes -by person_id year (date_housing)...-

                Comment


                • #9
                  Dear Sir
                  Thanks for the code above. However, it didnt generate the data I wanted, I think what may be missing in the code is adding "if homeless_all==1" because we are interested in getting episode_new only for those who had a homelessness episode i.e. homeless_all==1. So I am not sure where I can add this to the code.

                  Comment


                  • #10
                    Yes, sorry, I forgot about the homeless_all == 1 condition. Actually, using -if homeless_all == 1- isn't quite the right fix here, because that will leave all the other observations with missing value for episode_new. Since this should be a 0/1 variable, we have to tack -&homeless_all == 1- onto the end of the logical expression. Do also note that an additional set of parentheses enclosing the original logical expression is required:

                    Code:
                    by year (date_housing), sort: gen byte episode_new = ///
                        ((_n == 1) | (date_housing - date_housing[_n-1] > 90)) & homeless_all == 1

                    Comment


                    • #11
                      Thanks very much

                      Comment


                      • #12
                        Dear Sir

                        I would like some assistance on how to generate 2 variables from the data below:

                        1. Number of days from "indexdate" to first admission date "dr_admdate" of a mental health diagnosis i.e. "mh_any==1"

                        2. Number of days from "indexdate" to the last date of admission i.e. last record in "dr_admdate"

                        The variables are determined for each person: unique id = "personid". and "indexdate" is fixed for each individual. The sequence of records variable is "seq_vaed"


                        Code:
                        * Example generated by -dataex-. To install: ssc install dataex
                        clear
                        input double personid long indexdate int dr_admdate float(mh_any seq_vaed)
                         2337 19689 19701 0  1
                         2337 19689 20214 0  2
                         2337 19689 20842 0  3
                         2337 19689 20930 0  4
                         2337 19689 20932 0  5
                         2337 19689 21145 0  6
                         2337 19689 21189 0  7
                         2337 19689 21215 0  8
                         3172 19501 19847 0  1
                         3172 19501 20062 1  2
                         3172 19501 20499 0  3
                         3172 19501 20937 1  4
                         3172 19501 20938 1  5
                         3172 19501 21274 1  6
                         3653 19512 19577 0  1
                         3653 19512 19919 0  2
                         3653 19512 19920 0  3
                         3653 19512 20509 0  4
                         3653 19512 20907 0  5
                         3653 19512 21128 0  6
                         3653 19512 21324 0  7
                         3653 19512 21354 0  8
                         5625 19397 19401 1  1
                         5625 19397 19582 1  2
                         5625 19397 19624 1  3
                         5625 19397 19656 1  4
                         5625 19397 20172 1  5
                         5625 19397 20935 1  6
                        10045 19481 21023 0  1
                        11758 19780 19784 1  1
                        11758 19780 19789 1  2
                        11758 19780 19793 1  3
                        11758 19780 19825 1  4
                        11758 19780 19825 1  5
                        11758 19780 19826 1  6
                        11758 19780 19828 1  7
                        11758 19780 19952 1  8
                        11758 19780 20003 1  9
                        11758 19780 20137 1 10
                        11758 19780 20164 1 11
                        11758 19780 20164 1 12
                        11758 19780 20166 1 13
                        11758 19780 20171 1 14
                        11758 19780 20185 1 15
                        11758 19780 20189 1 16
                        11758 19780 20226 1 17
                        11758 19780 20453 1 18
                        11758 19780 20468 1 19
                        11758 19780 20776 1 20
                        14193 20020 21170 0  1
                        17206 19451 20682 0  1
                        17206 19451 21080 0  2
                        17353 19885 20220 0  1
                        19252 20074 21267 1  1
                        19949 19926 20004 1  1
                        19949 19926 20065 0  2
                        19949 19926 20394 0  3
                        19949 19926 20879 0  4
                        19949 19926 21047 0  5
                        22038 19813 20821 1  1
                        23662 19431 19846 0  1
                        23662 19431 20570 1  2
                        23662 19431 21034 0  3
                        23662 19431 21096 0  4
                        23662 19431 21167 0  5
                        24787 19820 19904 1  1
                        24787 19820 19942 0  2
                        24787 19820 19963 0  3
                        24787 19820 19981 0  4
                        24787 19820 20277 0  5
                        24787 19820 21046 0  6
                        24787 19820 21070 0  7
                        24787 19820 21072 0  8
                        24787 19820 21105 0  9
                        26920 19702 21186 1  1
                        27533 20030 20294 0  1
                        27533 20030 21251 0  2
                        27533 20030 21357 0  3
                        28539 19875 20199 0  1
                        28539 19875 20584 1  2
                        28539 19875 21186 0  3
                        28548 19967 19984 1  1
                        28548 19967 20136 1  2
                        28548 19967 20139 1  3
                        28548 19967 20265 1  4
                        28548 19967 20364 1  5
                        28548 19967 20552 0  6
                        28548 19967 20556 1  7
                        28548 19967 20587 1  8
                        28548 19967 20588 1  9
                        28548 19967 20963 1 10
                        28548 19967 20994 0 11
                        28548 19967 21003 1 12
                        28548 19967 21180 0 13
                        28548 19967 21207 1 14
                        28548 19967 21227 0 15
                        28818 19362 19766 0  1
                        28818 19362 20829 0  2
                        28818 19362 21235 0  3
                        28818 19362 21238 0  4
                        end
                        format %d indexdate
                        format %td dr_admdate

                        Comment


                        • #13
                          Code:
                          by personid (seq_vaed), sort: egen int first_mh_adm_date = ///
                              min(cond(mh_any, dr_admdate, .))
                          gen wanted1 = first_mh_adm_date - indexdate
                          by personid (seq_vaed): gen wanted2 = dr_admdate[_N] - indexdate
                          Note: There are some personid's for whom there are no records with mh_any == 1. For these, wanted1 is created as a missing value.

                          Comment


                          • #14
                            Thanks so much.

                            Comment


                            • #15
                              Is there a STATA manual to learn about some of this syntax for longitudinal data e.g.
                              "dr_admdate[_N]"
                              "min(cond(mh_any, dr_admdate, .))"

                              Comment

                              Working...
                              X