Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Newbie question - Understanding the bysort command, theoretically

    Hi! I am new to this forum, have read the relevant FAQs, browsed through the forum, looked at STATA documentation, and yet, I am sure this question must be answered somewhere but I don't know where - so, sorry in advance for the newbie mistake.

    Essentially, I have a data set where it looks something like this (I made the actual numerical values up, but the actual has a lot more rows). I am using Stata 11.
    Culture Independent Variable Dependent Variable
    1 1.5 3
    2 2 4
    3 1 2
    3 5 4
    2 9 3
    2 8 5
    1 4 2
    3 3 2
    1 2 5
    I was hoping to do a correlation between the IV and DV, so ok, simple pwcorr IV DV, star (0.05).

    I then wanted to do a correlation only among those with the culture value == 1 (so that I can do a correlation between the IV and DV only for those belonging to a certain culture), and here is where I get stuck. I tried multiple variations of if statements, e.g. pwcorr IV DV, star (0.05) if culture == 1. or if culture ==1, pwcorr IV DV, star (0.05). they dont seem to work, but I try bysort (culture): pwcorr IV DV, star (0.05) and it does!

    so my questions are:
    a) I find if statements pretty confusing in Stata, and have been reading a bunch of articles online to no avail. Not sure if anyone can point me to a beginner friendly explanation on whether I can even use if statements (in the classic way) because Stata seems to have very constricted options for if
    b) why is there a need to do bysort for my data set - i.e., wouldn't by just work? couldn't Stata just correlate based on my by category? Why is the sort function necessary?

    Thank you so much! Much, much appreciated for a newbie.
    Jen

  • #2

    The if qualifier -- often needed and useful as soon as you start using Stata (*) -- is quite different from the if command -- rarely needed or useful before you get into Stata programming strong sense. That is the main reason for any confusion.

    Otherwise
    if is simpler than you fear.

    The if qualifier belongs in a command before any options are specified.


    Code:
    pwcorr IV DV if culture == 1, star (0.05) 
    is what you need. The help for pwcorr makes explicit that the allowed syntax is

    pwcorr [varlist] [if] [in] [weight] [, pwcorr_options]
    where the relevant details are that certain options are allowed -- after a comma -- and that an
    if qualifier is allowed and must go before the same comma if it was typed.

    bysort here is something of a red herring. The command you mention does what you want, but typically other stuff too, so it is somewhere between indirect and inefficient for your purpose.

    As for "a bunch of articles online" the unfortunately empty advice is to read them only when they are better or closer to your needs and level than Stata's own documentation. A while back it struck me that the explanation of by and bysort commands (not functions) in the official documentation was a little fragmented, so I wrote

    https://www.stata-journal.com/articl...article=pr0004

    I haven't read it for a while, but I think it should hold up fairly well over 18 years.

    The strategy I recommend for finding out in Stata is

    1. help -- here you know already which command you are using and its help does answer your question, although you may need help syntax as well.

    2.
    search in Stata using keywords. This finds e.g. StataCorp FAQs, material in the Stata Journal and so on.

    3. Google, or use some other search engine.

    Unfortunately many people seem to start with #3, which is a puzzling and often a poor choice. Many users do understand Stata well and explain it clearly -- otherwise Statalist could hardly work -- but there is a lot of rather scrappy material out there (and even some downright awful sources.

    (*) Our own FAQ Advice explains that the spelling is Stata.

    Comment

    Working...
    X