Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to standardize two different samples?


    Greetings to all. I'm new in the forum and I don't have the best level of English, but I'll try to ask my question.
    I'm making a regress to identify the drivers that affect the comments and likes in Facebook posts. I have data from some companies with a high number of followers and others with a low number. Of course, this implies that the like rate of companies with many followers is considerably higher. Could you recommend options to standardize or normalize the data? I would like to be able to compare both samples in the same model. I have thought about including the number of followers as an independent variable or as a interaction factor. But I'm not sure. I appreciate any recommendation.

  • #2
    I think this question is too general, and too dependent on content expertise, to be helpfully answered here as is. There are many ways of attempting to statistically adjust for imbalances in comparing different populations or groups, and their appropriateness depends on the specific situation and the specific type of analysis you have in mind. If you can pose a more specific example of the kind of analysis you are considering, you can probably get some help. Even then, however, you would be well advised to seek guidance from others with expertise in this content area and from whatever literature there is about it.

    For example, you say that "Of course, this implies that the like rate of companies with many followers is considerably higher." It may be true. But, is it? I wouldn't say "Of course," here as I can easily imagine it being false. Some sites may have a small but highly enthusiastic and dedicate group of followers, whereas others may be followed by large numbers of people who are just minimally interested in it. The latter might get lower like rates (though they might still have a larger raw number of likes.) Even if there is a general rule that sites with fewer followers tend to have lower like rates, what does that relationship look like quantitatively? Is it a linear relationship? Or are their diminishing returns? Or perhaps there is an explosive exponential relationship? Have others studied this question? What did they find? Are their findings applicable to the particular sites you are planning to study? Do you have some data that you can explore for issues like this?

    Comment


    • #3

      Thanks for your reply mister Clyde. You are right in your approach. Actually some brands can get more engagement despite their low like rate. Maybe I didn't use the right word. In this particular case, the companies with the largest number of followers obtain a higher rate of likes. I have been able to verify this by reviewing the data and even applying poisson and negative binomial regression models, the number of fans is significant. Since likes are a counting variable, Poisson models and negative binomial regression models are usually used in the literature. Although some of these papers mention problems of overdispersion in the data (sometimes there are posts with 0 likes and others with thousands), so far I have not found a solution to deal with imbalances in different population groups. What i can say is that when i modeling the data separating the samples (companies with many fans and companies with few fans) the results differ with respect to including all the data. In the latter case, it seems that the strategies most used by companies with many fans tend to be significant, however, when separating the samples these variables are no longer significant.

      Regarding the data. I still do not have the entire sample, but i'm worried about this problema that i have been encountering.

      I understand that my question is general and basic. However, i would appreciate any help in starting to explore alternatives.


      Comment


      • #4
        So, since you are analyzing a count variable outcome and plan to use -poisson- or -nbreg-, I would consider using the variable containing the number of followers in the -exposure()- option of those regression commands. That's precisely what it's there for.

        Comment


        • #5
          Thanks for your help!!

          Comment

          Working...
          X