Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating a sample pseudo dataset

    when learning the basic Stata operations, I want to create a pseudo data set to play with. the data set (panel data) should be something like this:
    year country growth
    1991 UK 0.1
    1991 US 0.2
    1992 UK 0.3
    1992 US 0.4
    1993 UK 0.5
    so I guess what I need to do for the three variables is:

    for the year variable, create a number sequence and repeat it;
    for the country variable, create a string macro and repeat it;
    for the growth variable, create a series of random numbers with the random number generators.

    This should be quite easy in other programming languages, but how could I manage this in Stata?

    more generally, links about how to manage this kind of works are much appreciated. I googled, but nothing relevant came up. (might be I'm searching for the wrong key word)
    Last edited by Olivier Ma; 29 Aug 2015, 01:14.

  • #2
    Some technique.

    Code:
    set obs 6
    egen year = seq(), from(1991) to(1993) block(2)
    gen country = cond(mod(_n, 2), "UK", "US")
    gen growth = 1 + rnormal()

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Some technique.

      Code:
      set obs 6
      egen year = seq(), from(1991) to(1993) block(2)
      gen country = cond(mod(_n, 2), "UK", "US")
      gen growth = 1 + rnormal()
      exactly what I need, thanks! I'll look into the details of these functions

      Comment


      • #4
        Originally posted by Nick Cox View Post
        Some technique.

        Code:
        set obs 6
        egen year = seq(), from(1991) to(1993) block(2)
        gen country = cond(mod(_n, 2), "UK", "US")
        gen growth = 1 + rnormal()

        A tiny little follow-up question: if I have many more countries, say 50 or 100, is there a generic way to generate the country variable without refering to each and every country name?

        I tried
        Code:
        local country_names "UK US DE FR" //all the country names
        generate country = cond(mod(_n, 4), `country_names')
        but got an error
        Code:
        USUKDEFR not found
        r(111);
        I should say that I understand why I got the error (replace the country_names macro with the actual country names and I will have the wrong syntax for the cond() function). just don't know how to get it right
        Last edited by Olivier Ma; 29 Aug 2015, 02:15.

        Comment


        • #5
          Code:
          local country_names "UK US DE FR" 
          gen country = word("`country_names'", 1 + mod(_n-1, 4) )
          More generally

          Code:
          local country_names "<list>" 
          local nc : word count `country_names'
          gen country = word("`country_names'", 1 + mod(_n-1, `nc') )

          Comment


          • #6
            thanks NIck!

            Comment


            • #7
              Dear All, I am a new member and I was trying to post this question but I failed. So I am using this space. Apologies if this is not permissible. I have been following the examples on constructing a pseudo panel in stata but the procedure is not yet clear. I have two cross-sections from the same population but sampled at two different times. I want to create a pseudo panel using age and location(9 districts). In each dataset, I have then constructed a categorical variable that combines those two variables. Am not sure whether am supposed to merge the datasets, in which I have to create a unique identifier on which to merge, on simply to append them. The latter does not seem to be the right format for panel data analysis if the pseudo panel is supposed to be an approximation of panel data. Can someone help me with the steps to follow? A guide to a pseudo panel handbook will also be highly appreciated.

              Comment


              • #8
                This is my first post here, I think. I am a beginner trying to learn Stata coding the parsimonious way !! I would like to generate 12 date variables by the name fol_up_`i' where `i' takes values 8, 16, 24, ..., 96 (multiples of 8). These new variables are dates for 8-weekly follow up from rand_date (date of randomisation). Right now my data looks like this (dates made numeric and formatted)

                trialID rand_date
                1 01sep2022
                2 01sep2022
                3 01nov2022
                4 01sep2022
                5 01sep2022
                6 01nov2022
                7 01sep2022
                8 01sep2022
                9 01nov2022
                10 01sep2022
                11 01sep2022
                12 01nov2022
                13 01sep2022
                14 01sep2022
                15 01nov2022
                16 01sep2022
                17 01sep2022
                18 01nov2022
                19 01sep2022
                20 01sep2022

                Can someone help me get started please? Thanks

                Comment

                Working...
                X