Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Pooled cross section data

    Hi Stata users
    I am working with pooled cross section data (randomly sampled 26,000 SMEs over the last 10 years). The SMEs were surveyed every year to know about the status of their investment in R&D, sales, productivity etc. The responses were like 10,000 SMEs said "productivity increased" , 10,000 said "productivity decreased" 6,000 said " don't know". I want examine whether investment in R&D is correlated with productivity or sales. What type of regression model I would need to use for such data and how to convert Increased, Decreased, Don't know when using them in regression analysis. Do I need to use svy command and set up sampling design before doing any analysis of such data in Stata?
    I appreciate your advice.
    Thanks.

  • #2
    This sounds like panel data if you have more than one observation on a given SME. If you don't have panel data, then it going to be hard to interpret almost any analysis. Does R&D influence productivity or productivity influence R&D?

    With such categories, it is often best just to put in i.productivity in your regression. Stata will then automatically create the necessary dummy variables. The alternative (coding decrease as -1, no change as 0, and increase as 1) imposes a metric where there is none and so is less desirable. If productivity were your dv, then you would do ordinal logit or ordinal probit which assumes no change is above decrease but does not impose an amount on that above.

    Comment


    • #3
      Hi Phil, thanks for your reply and your suggestion is really helping me in understanding how to analyse the data.
      I am still unsure about the data structure. Is it panel or pooled cross section or count data? please see the data structure as it looks, attached. After looking at the data set, what you think of how to restructure them if I want to analyse the performance indicators and examine link between performance indicators with R&D or ICT usage (also in similar data structure) by SMEs.
      yes, it's a good question whether R&D cause productivity or vice versa.
      Thanks again. Look forward to hearing from you.
      Regards

      Attached Files

      Comment


      • #4
        Muhammad:
        you do not say if it's the same sample of SMEs the is followed up for 10 years (or it is not necessarily so).
        I would also take a look at the literature in your research field to be sure that the level of your dependent variable are usually considered ordered or simply nominal.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thanks Carlo. usually, the sample is almost the same but is not exactly the same, as described in methodology of the data set. This suggests that there might be a small fraction of the sample is changing over the years. Look forward to hearing from you.
          regards

          Comment


          • #6
            Muhammad:
            you do not seem to have a panel dataset, technically speaking.
            If your data were collected using survey methodology, you may want to consider:
            Code:
            svy: regress <depvar> <indepvar>
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Thanks Carlo. Isn't it a pooled cross-section data? or should I label it as count data?
              Regards

              Comment


              • #8
                Muhammad:
                I'm still not clear whether you have a surevy or else.
                That said, assuming that you have a survey, I should amend my previous reply a bit. since you do not have a continuous regerssand:
                Code:
                svy: logistic <depvar> <indepvar>, vce(cluster clustvar)
                or, if your regerssand is ordered:

                Code:
                svy: ologit <depvar> <indepvar>, vce(cluster clustvar)
                I would rule out a count data regression model, instead.
                Kind regards,
                Carlo
                (Stata 19.0)

                Comment


                • #9
                  Thanks Carlo for your reply. Regards

                  Comment


                  • #10
                    Hi Carlo, I need a little bit more clarification on how i can use ordered logit in stata. If you looked at my data file of performance indicators, as attached above, i have 4 different answers (decrease, increase, stay same, don't know) for each performance variable. Say, i only consider decrease, stay same and increase and denote them as -1, 0 +1. In that case I will have 3 -different values for each variable (sales, profitability, market share, productivity). How to use them in estimating ologit model?
                    I appreciate your advice.
                    Thanks

                    Comment

                    Working...
                    X