Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Descriptive analysis scatterplots - panel data

    Background of question
    I am an economics student, currently writing my bachelor thesis, and quite inexperienced with Stata. I would be grateful for any help!

    The purpose of my research is to analyse the drivers of export sophistication of Malaysian exports.

    The dependent variable is the natural logarithm of the export sophistication index, more specifically the export sophistication of Malaysian exports to 171 countries.

    The independent variables are:
    • Foreign Direct Investment (FDI) proxied by the stock and flow of FDI inflow, FDIS and FDIF respectively
    • Research and Development (R&D) proxied by Gross Domestic Expenditure on R&D as a percentage of GDP and Number of researchers per thousand in the labour force, GDE and RES respectively
    Control variables are Malaysia’s GDP per capita PPP (current international $) proxying for the level of economic development (GDPc); Malaysia’s total population proxying for the country size (POPc); Malaysia’s gross enrolment ratio of the tertiary education segment proxying for Malaysia’s human capital (HCc); and the rule of law proxying for Malaysia's institutional quality (INSc).

    Important here is that the data for the independent and control variables do not vary between the countries (id), only throughout the years since the data is specific to Malaysia.


    My question
    Whilst doing the descriptive analysis, I have encountered problems plotting the dependent against the independent variables. I simply used the scatter command. My aim is to check for the regression assumptions of linearity and homoscedasticity, but unfortunately, I am not able to draw any conclusions from the graphs.
    I presume this is due to the fact that the data is the same throughout the ids…
    Please find the graphs attached.

    Please let me know if you need any clarification, I would be grateful for any advice/hint.
    Kind regards,
    Julie
    Attached Files

  • #2
    My aim is to check for the regression assumptions of linearity and homoscedasticity
    You can check under the term "regression postestimation diagnostic plots, and they are:

    rvfplot residual-versus-fitted plot
    avplot added-variable plot
    avplots all added-variables plots in one image
    cprplot component-plus-residual plot
    acprplot augmented component-plus-residual plot
    rvpplot residual-versus-predictor plot
    lvr2plot leverage-versus-squared-residual plot
    Best regards,

    Marcos

    Comment


    • #3
      Dear Marcos, thank you for your quick response!

      Unfortunately, the same pattern can be seen once I check the residuals (e.g. rvfplot residual-versus-fitted plot).

      Comment


      • #4
        Perhaps accounting for the (apparent) discrete nature of the data will help. The following uses Nick Cox's -tabplot- , Ben Jann's -heatplot-, and Michael Stepner's -binscatter- All be downloaded with -ssc inst tabplot- , ect.

        Code:
        clear
        set seed 12345
        set obs 10000
        matrix C = (1, .75 \ .75, 1)
        corr2data y1 y2, corr(C)  
        gen u1 = normal(y1)
        gen u2 = normal(y2)
        gen x1  = floor((45 -25 +1)*u1 + 25)
        gen x2  = floor((5 - 1+1)*u2 + 1)
        
        //Not too meaningful
        scatter x1 x2, name(gr1,replace)
        //Other options
        heatplot x1 x2,  xdiscrete ydiscrete name(gr2,replace)
        binscatter x1 x2, discrete name(gr3,replace)
        tabplot x1 x2, yreverse name(gr4,replace)

        Comment


        • #5
          It worked! Thank you so much!

          I used:
          binscatter x1 x2, discrete name(gr3,replace) Thank you and best regards,
          Julie

          Comment

          Working...
          X