Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • beginner question: graph mean age as a function of age

    Hello. I usually use Matlab for my own research but am using Stata for a student project, and I'm finding it harder than I thought to do pretty basic stuff. I'd really appreciate it if someone could answer this admittedly very elementary question.

    I'm using a data set (from the Current Population Survey) with about 140,000 observations. I'd like to create a graph with age on the x-axis, and the average earnings of that age group on the y-axis.

    There are a couple of steps I'd like to take after that, like doing different graphs for different education levels, but my guess is that such extensions won't be too hard once I have the basic scheme right.

    In a way, what I'd really like to do is create a much smaller data set that just contains the different ages in the first column, and then the mean earnings for each age in the second column. Actually, I've even done this, using the command
    "mean totalearn, over(age)"
    But that just gives a table that Stata prints in plain text. I don't know how to turn it into a graph, or how to export it properly to some other graph-making program like Excel.

    Thank you!
    James

  • #2
    Making some progress. Figured out that I could copy the table properly into Excel using the "copy table" command. This helps. But it would be nice to know how to do the graph in Stata as well.

    Comment


    • #3
      Code:
      sysuse nlsw88, clear
      graph bar (mean) wage, over(age) ytitle("Average hourly wage, USD") note("NLSW88 data")

      Comment


      • #4
        Thank you for the reply. This is quite good. Is there a similar command for line graphs? For some reason replacing "bar" or "graph bar" in your code with "line" or "twoway" doesn't seem to work.

        Anyway, I'm finding it not as easy to extend this as I had hoped. Two problems below, one with an ad hoc solution, and one without a solution:

        First, I can't append an "if" statement to what I'm doing, either using the table method or the one you give. For example,
        graph bar (mean) totalearn, over(age) if male==1
        gives the error message
        option if not allowed
        This is actually not so bad, as I've already started to work around the problem, i.e. by creating a separate variable e.g. called "MALEEARNINGS", using code like this:
        gen MALEEARNINGS = 0
        replace MALEEARNINGS = totalearn if MALE==1
        replace MALEEARNINGS = . if MALE==0
        Then I can do a graph or a table of MALEEARNINGS over age.

        However, what I have no idea how to figure out is how to use weights. The CPS includes weights to make the sampling more representative, and it would be great to make use of those. But again,
        graph bar (mean) totalearn, over(age) using suppweight
        gives the error message
        option using not allowed
        Am I missing something simple here, or is it actually quite difficult to use weights in this type of analysis?

        Thank you!
        James



        Last edited by James Green-Armytage; 21 Nov 2014, 00:01.

        Comment


        • #5
          James: We prefer full real names here, please, i.e. including family name. This is explicit in the FAQ Advice,

          To continue Sergiy's example

          Code:
          sysuse nlsw88, clear
          egen mean = mean(wage), by(age)
          line mean age, sort ytitle("Average hourly wage, USD") note("NLSW88 data")
          We all struggle with new software. I struggle with MATLAB on the odd occasions that I use it. But your errors arise mostly because you are just guessing at syntax; that won't work most of the time.

          For example, your problem with if is that if qualifiers (not if statements) must be inserted before the options (with an exception not pertinent here). See the schematic syntax diagram at the start of almost any help.

          Weights can be applied to many graphs, but the weights must be (functions of) variables in the dataset, and certainly not in external files which is what using implies.

          Your syntax creating a new variable could be condensed to

          Code:
          gen MALEEARNINGS = totalearn if MALE == 1
          Note that this not only works but gives a valid example of the if qualifier in use. Here is another example:

          Code:
          sysuse nlsw88, clear
          egen mean_married = mean(wage) if married == 1, by(age)
          line mean_married age, sort ytitle("Average hourly wage if married, USD") note("NLSW88 data")

          Comment


          • #6
            Thank you for the information. I'll absorb that for a while. I have learned I think a decent amount for one night.

            I'm happy to change my user name to my real name, which is James Green-Armytage. Is this possible? Or should I delete this account and start a new one?

            Comment


            • #7
              James: changing your registered name is easy -- just hit the "Contact Us" on RHS of the blue bar at bottom of the page, and make your request. It'll get pretty fast. (More information in the Forum FAQ -- hit the black bar at the top of the page.) thanks

              Comment

              Working...
              X