Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Frequency Weighted Scatter Plot

    Hello all,

    Relative stata newbie here so sorry if this is a basic question but I cannot seem to find a solution.

    I am working with a set of discrete variables and I would like to make a scatter plot in which the marker size is reflective of the relative frequency of the observations.

    For examples, a subset of my data might look like:

    (1,3)
    (2,2)
    (2,2)
    (3,1)
    (3,1)
    (3,1)

    So I would want the plot point at (2,2) to be twice the size of the plot point at (1,3) and two-thirds the size of the marker at (3,1).

    Thank you for any suggestions!

  • #2
    this is discussed in the help file; see
    Code:
    help scatter##remarks14

    Comment


    • #3
      When I code "scatter stage factor (fweight=wvar)" (my interpretation of the help resources) I receive an error that wvar is not found. So perhaps a more precise question is how do I set wvar to be the observation frequency
      Last edited by Ty Kennedy; 24 Oct 2022, 12:40.

      Comment


      • #4
        please read the FAQ and provide example data using -dataex- and posting within CODE blocks

        Comment


        • #5
          Very sorry if I've done this improperly.

          My code right now is simply
          Code:
          scatter stage mgmt || lfit stage mgmt
          However, due to the nature of my data, there are multiple observations at the same exact points, so the scatter plot is incomprehensible even though my results are statistically significant. Therefore, I want to show that my observations are clustered in a way that represents the fit line by making observations with higher frequency appear bigger in the scatter plot.

          My data look like this:
          Code:
          * Example generated by -dataex-. For more info, type help dataex
          clear
          input byte(mgmt stage capacity geo)
          2 0 1 0
          2 0 3 0
          1 0 1 0
          1 0 0 0
          1 0 1 0
          1 1 1 0
          0 0 0 0
          2 0 1 0
          1 0 1 0
          1 1 2 0
          1 2 2 0
          1 0 1 0
          2 0 0 0
          2 1 1 0
          0 0 3 0
          2 0 0 1
          1 1 3 1
          1 1 1 0
          3 0 3 1
          Last edited by Ty Kennedy; 24 Oct 2022, 16:37.

          Comment


          • #6
            One way to do what you're asking is as follows:

            Code:
            egen tag = tag(stage mgmt)
            bys stage mgmt: egen _freq = count(mgmt)
            scatter stage mgmt [fw = _freq] if tag || lfit stage mgmt
            But I'm not sure you'll be happy with the way your graph looks. You might also want to check out the community-contributed command binscatter, available via
            Code:
            ssc install binscatter

            Comment


            • #7
              I agree with Hemanshu Kumar and go further. On the face of it you have two categorical variables; stage at least is presumably ordered and perhaps mgmt is ordered too, but either way a linear regression seems inappropriate to me.

              We would need to see the full dataset but from your sample there is perhaps a complex association, or little association, and so putting a straight line through the data seems brave but not helpful. Perhaps you're under instruction to do that, and if so I disagree with your instructors.

              Code:
              * Example generated by -dataex-. For more info, type help dataex
              clear
              input byte(mgmt stage capacity geo)
              2 0 1 0
              2 0 3 0
              1 0 1 0
              1 0 0 0
              1 0 1 0
              1 1 1 0
              0 0 0 0
              2 0 1 0
              1 0 1 0
              1 1 2 0
              1 2 2 0
              1 0 1 0
              2 0 0 0
              2 1 1 0
              0 0 3 0
              2 0 0 1
              1 1 3 1
              1 1 1 0
              3 0 3 1
              end 
              
              tabplot stage mgmt , showval scheme(s1color) fcolor(blue*.3) lcolor(blue) yasis
              Here I used tabplot from the Stata Journal.

              Click image for larger version

Name:	tyler.png
Views:	1
Size:	18.7 KB
ID:	1686646

              Comment

              Working...
              X