Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filling empty cells down the column

    Hello everyone,

    I have a huge dataset that i imported into stata. I have a country variable that looks like below. My problem is finding a way to let stata fill all the empty spaces between United Kingdom and Australia with "United Kingdom" and filling the space between Australia and the next country with "Australia" and so on and so forth. Thanks in advance for the help.

    Country
    United Kingdom
    United Kingdom
    United Kingdom
    United Kingdom
    United Kingdom
    .
    .
    .
    .
    .
    .
    .
    Australia
    .
    .


  • #2
    This is an FAQ:

    http://www.stata.com/support/faqs/da...issing-values/

    Your should please read and respect the FAQ Advice. There is a link at top left, or you can look at http://www.statalist.org/forums/help

    Section 3 explains various places you should look for solutions before posting, including the FAQs.

    Section 6 explains our strong preference for full real names. Many members are reluctant to support posters who do not respect that. It also explains how to change your identifier.

    Comment


    • #3
      Thank you Nick i got the solution in the FAQ and i have emailed the admins to effect the change of name.

      Comment


      • #4
        Dear Dr. Cox (and everyone),

        I read your reply above about handling missing values on Dec 20, 2014 (the first link in that post). Super helpful. Thanks very much!

        May I please ask a few follow-up questions about my pooled cross sectional student cohort data?

        Each student is observed across a different total number of years (based on their enrollment and graduation time), within the same 9-year span. I constructed a first-year cumulative GPA variable based on year and GPA variables recorded by semester. As can be expected, within each block of observations for a given student, only one cell is filled. I then populated all observations for this student with the same GPA value.

        Is this the correct way to prepare my data? After reading your post, I feel like it is correct, but I am not very sure and I am still confused about two things.

        1) How does having repeating values on the GPA (as an outcome variable) affect the calculation of variance and standard error in a regression analysis? After populating the column, I used egen and tag to assign a 1 to only one observation out of the multiple for a given student. Then I did summarizing GPA, using all the repeating observations (a much larger n). I also did summarizing GPA if tag==1 (using the actual total of all unique individuals). The mean and SD do differ. I guess if I have a perfected balanced panel (or maybe I should call it cross section clusters) where each student has the same number of observations, then the repeating values shouldn't lead to different mean and SD weather I use the larger n (with repeating values) or just unique individuals. But I don't know how to think further for my real data at hand. Could you please share some thoughts?

        2) I don't think I need vce cluster standard error since every student is observed one point in time for their respective first-year GPA. However, because I am not clear exactly what differences those repeating values on the outcome (GPA) make, I don't have a conclusive thought. Could you please talk a bit about selecting the right variance estimator in my data situation ?

        Thank you so very much for your help, Dr. Cox, and potential answers from anyone else.

        My best,
        Difei

        Comment

        Working...
        X