Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How would you organise this data?

    Hi!

    I'm finding myself a bit stuck with little experience in stata. I want to find out how the number of children in a household affects household income but can't quite figure out what graph or table would best illustrate this relationship? I tried creating a scatterplot but it doesn't look so good since number of children is a discrete variable. I'd ideally like to create either a table with number of children and (1-10) corresponding to different income levels (preferably also between 1-10). Another alternative which I've tried is to create a pie chart with how many children are born in percent, but this doesn't help me since it doesn't take income into consideration.

    Thanks in advance!
    Cheers
    /Jonas

  • #2
    In the auto dataset, the variable price is continuous and rep78 is categorical. A bar graph is a common way to visualize the relationship between a continuous and categorical variable. In your case, if there are few families with say more than 6 kids, you may want to merge these categories prior to graphing as the mean is influenced by small sample sizes.

    Code:
    sysuse auto, clear
    set scheme s1mono
    gr bar price, over(rep78) ytitle(Price)
    Click image for larger version

Name:	Graph.png
Views:	1
Size:	21.6 KB
ID:	1664695

    Comment


    • #3
      Jonas:
      welcome to this forum.
      Given that income is a continuus variable, you may want to go -regress-:
      Code:
      regress income i.children, vce(cluster family)
      See -fvvarlist- for -i.- notation.
      Last edited by Carlo Lazzaro; 15 May 2022, 10:28.
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Originally posted by Andrew Musau View Post
        In the auto dataset, the variable price is continuous and rep78 is categorical. A bar graph is a common way to visualize the relationship between a continuous and categorical variable. In your case, if there are few families with say more than 6 kids, you may want to merge these categories prior to graphing as the mean is influenced by small sample sizes.

        Code:
        sysuse auto, clear
        set scheme s1mono
        gr bar price, over(rep78) ytitle(Price)
        [ATTACH=CONFIG]n1664695[/ATTACH]
        Hi Andrew,

        Thanks for the response, I tried this command:[[ graph bar (median) tnfi (mean) numch (mean) hgcrev ]] tnfi being "total net family income" and numch being number of children in a household. The thing is I don't quite understand what the diagram represents, especially with price on the y-axis? How does stata combine two different variables, and what does the staples imply?

        Thanks again
        /Jonas

        Comment


        • #5
          In #1, you said that you wanted to show how income varies over the number of kids. I gave you an example of how price varies over repair record in the auto dataset. Assuming that you want to focus on median family income (which is less influenced by extreme observations in categories with small sample sizes), the code is

          Code:
          gr bar (median) tnfi, over(numch) blab(total, format("%10.0f")) b1title("Number of kids", size(medsmall)) ytitle("Median family income")

          The height of the bars represents the median income and each bar represents a level of number of kids: 1 kid, 2 kids, and so on.
          Last edited by Andrew Musau; 16 May 2022, 03:53.

          Comment

          Working...
          X