Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Regression and Graphs

    Hello Statalist community,
    This is my Code:
    Code:
    clear all
    cd "/Users/Yeshwin1/Desktop/fall 24/Econ 4550"
    use "lfs_2013.dta"
    append using "lfs_2014.dta"
    append using "lfs_2015.dta"
    append using "lfs_2016.dta"
    append using "lfs_2017.dta"
    append using "lfs_2018.dta"
    append using "lfs_2019.dta"
    append using "lfs_2020.dta"
    append using "lfs_2021.dta"
    append using "lfs_2022.dta"
    append using "lfs_2023.dta"
    
    keep if lfsstat==1 | lfsstat==2
    keep if !missing(survyear, survmnth, sex, hrlyearn)
    
    keep if survyear==2013 & !missing(naics_18) |survyear==2014 & !missing(naics_18)| survyear==2015 & !missing(naics_18)| survyear==2016 & !missing(naics_18) | survyear==2017 & !missing(naics_21)|survyear==2018 & !missing(naics_21) | survyear==2019 & !missing(naics_21)| survyear==2020 & !missing(naics_21)| survyear==2021 & !missing(naics_21)| survyear==2022 & !missing(naics_21)| survyear==2023 & !missing(naics_21)
    
    gen date= ym(survyear, survmnth)
    format date %tm
    sort date
    
    
    drop if naics_18==1 | naics_18==2 | naics_18==3 | naics_18==10 | naics_18==13 | naics_18==14 | naics_18==15 | naics_18==16 |naics_21==1 | naics_21==4 | naics_21==12 | naics_21==16 | naics_21==17 | naics_21==18 | naics_21==19 | naics_21==5 
    
    egen wagePublic=mean(hrlyearn) if naics_18==9 |naics_18==18 |naics_21==11 | naics_21==21, by(sex date)
    egen wagePrivate=mean(hrlyearn) if naics_18!=9 | naics_18!=18 |naics_21!=11 | naics_21!=21, by(sex date)
    gen incomePublic = wagePublic*1
    gen incomePrivate = wagePrivate*1
    separate incomePublic, by(sex)
    separate incomePrivate, by(sex)
    line incomePublic1 incomePublic2  incomePrivate1 incomePrivate2 date, legend(subtitle("Average Hourly Wage") order(1 "Public Sector Male" 2 "Public SectorFemale" 3 "Private Sector Male" 4 "Private Sector Female"))
    I would appreciate your help with a couple of things:

    1) Can I add a line of division to the following graph? (I want to add a vertical line to it at the 2018m8 point)
    2) How do I add a Y axis label that says "CAD" and change X-axis labels to just the year without the m1..m12, using code?
    3) Instead of the actual data I want to generate 4 lines of best fit on the same graph, is that possible?
    Click image for larger version

Name:	WageGraph.jpg
Views:	1
Size:	335.9 KB
ID:	1733899

  • #2
    1) Can I add a line of division to the following graph? (I want to add a vertical line to it at the 2018m8 point)
    Add a -xline(`=tm2018m8')- option to the -line- command.

    2) How do I add a Y axis label that says "CAD" and change X-axis labels to just the year without the m1..m12, using code?
    Add a -ytitle("CAD")- option.
    Add the option -ylabel(, format(%tmCCYY))

    3) Instead of the actual data I want to generate 4 lines of best fit on the same graph, is that possible?
    Add
    Code:
    || lfit incomePublic1 date || lfit incomePublic2 date || lfit incomePrivate1 date || lfit incomePrivate2 date

    Comment


    • #3
      Hi Clyde, Thank you for getting back to me. The Y-axis and X-axis labels worked.

      But
      Code:
       
       || lfit incomePublic1 date || lfit incomePublic2 date || lfit incomePrivate1 date || lfit incomePrivate2 date
      does not work, it returns
      Code:
       | is not a valid command name
      Code:
      lfit incomePublic1 date
      Returns, "too many variables specified"

      Code:
      xline(`=tm2018m8')
      Returns "tm2018m8 not found"

      How should I proceed? Thank you again

      Comment


      • #4
        I think I wasn't clear about how to use the || lfit... part. It's not a separate command. It's an extension of the existing -line- command. As for the -xline()- part, I made a mistake in that. Sorry about that. Here's how it all should fit together:
        Code:
        line incomePublic1 incomePublic2  incomePrivate1 incomePrivate2 date, ///
            legend(subtitle("Average Hourly Wage") order(1 "Public Sector Male" ///
            2 "Public SectorFemale" 3 "Private Sector Male" 4 "Private Sector Female")) ///
            xline(`=tm(2018m8)') ytitle("CAD") ///
            || lfit incomePublic1 date || lfit incomePublic2 date ///
            || lfit incomePrivate1 date || lfit incomePrivate2 date

        Comment


        • #5
          On a variety of grounds, ranging from statistical to economic, fitting straight lines directly seems less likely to be helpful here than working on log scale.

          The graphs don't show good linear patterns except as a lousy first approximation. All are convex down.

          At least, you should try that first. To avoid intercepts that make no sense, I would work with a shifted origin:

          Code:
          gen myDate = date - ym(2018. 1) 
          
          foreach v in incomePublic1 incomePublic2 incomePrivate1 incomePrivate2 { 
                poisson `v' myDate, vce(robust)
                predict `v'_Poi
          }
          You've then a choice between plotting on log scale or plotting on conventional scale.

          Comment


          • #6
            Originally posted by Clyde Schechter View Post
            I think I wasn't clear about how to use the || lfit... part. It's not a separate command. It's an extension of the existing -line- command. As for the -xline()- part, I made a mistake in that. Sorry about that. Here's how it all should fit together:
            Code:
            line incomePublic1 incomePublic2 incomePrivate1 incomePrivate2 date, ///
            legend(subtitle("Average Hourly Wage") order(1 "Public Sector Male" ///
            2 "Public SectorFemale" 3 "Private Sector Male" 4 "Private Sector Female")) ///
             xline(`=tm(2018m8)') ytitle("CAD") ///
            || lfit incomePublic1 date || lfit incomePublic2 date ///
            || lfit incomePrivate1 date || lfit incomePrivate2 date
            Thank you so much Clyde!!! I was able to get it to work. Am I am really sorry for all the questions, I am extremely new at STATA

            This is what my Code was:
            Code:
            line incomePublic1 incomePublic2  incomePrivate1 incomePrivate2 date, title("Hourly Earnings by Sex (January, 2013- June, 2023)") legend(order(1 "Public Sector Male" 2 "Public Sector Female" 3 "Private Sector Male" 4 "Private Sector Female")) ytitle(" Log Wage") xlabel(, format(%tmCCYY)) xline(`=tm(2019m1)') xtitle("Pre-Legislation                    Post-Legislation") || qfit  incomePublic1 date || qfit incomePublic2 date ///
            || qfit incomePrivate1 date || qfit incomePrivate2 date
            Click image for larger version

Name:	LogWage.jpg
Views:	1
Size:	420.1 KB
ID:	1734081



            A few follow up questions that I would appreciate greatly if you could help me with:
            1) How would I make the xline dotted
            2)Is there any way for me to obtain the following dotted lines (including the corresponding values on the y-axis using code?
            Click image for larger version

Name:	LogWage copy.jpeg
Views:	1
Size:	422.0 KB
ID:	1734082

            Comment


            • #7
              In #5
              Code:
                
               ym(2018. 1) 
              should be
              Code:
              ym(2018, 1)

              Comment


              • #8
                To get the line at 2018m8 dotted, change the -xline()- specification to -xline(`=tm(2018m8)', lpattern(dot))-

                Is there any way for me to obtain the following dotted lines (including the corresponding values on the y-axis using code?
                -line- (and all other -twoway- graphs in Stata) supports a -yline()- option that works exactly the same way as the -xline()- option. I don't know what the value of y that anchor these lines is supposed to be, so I'll leave it to you. But once you have those values in hand, it works exactly like -xline()-.

                Comment

                Working...
                X