Dear users,
I have a large dataset, where all observations can belong to one, two or three groups.
Most of these observations belong to only one group. But few of them can simultaneously belong to two, or even three distinct groups.
If an observation belongs to one group, it appears once in the dataset. If an observation belongs simultaneosly to two groups, it appears twice. And if an observation belongs to three groups, it appears three times.
Now, I wan to run a negative binomial regression, and I want to weight observations depending on their frequency of appeareance in the dataset. That is: if one observation appears twice, their weight in the regression should count as half. If one observation appears three times, their weight should count as 1/3.
I did this to create a variable counting the number of times that each observation appears in the dataset:
What I understood from my readings is that:
- For tabulations and descriptives I should add [pweight=freq]
- For regressions, I should add [fweight=freq]
Could you confirm me that I understood it correctly?
Thanks in advance for the help.
Regards,
Oscar
I have a large dataset, where all observations can belong to one, two or three groups.
Most of these observations belong to only one group. But few of them can simultaneously belong to two, or even three distinct groups.
If an observation belongs to one group, it appears once in the dataset. If an observation belongs simultaneosly to two groups, it appears twice. And if an observation belongs to three groups, it appears three times.
Now, I wan to run a negative binomial regression, and I want to weight observations depending on their frequency of appeareance in the dataset. That is: if one observation appears twice, their weight in the regression should count as half. If one observation appears three times, their weight should count as 1/3.
I did this to create a variable counting the number of times that each observation appears in the dataset:
Code:
. unique id_document Number of unique values of id_document is 6888177 Number of records is 7553910 . unique id_document dom_cat Number of unique values of id_document dom_cat is 7553910 Number of records is 7553910 . bys id_document: gen freq =_n . tab freq freq | Freq. Percent Cum. ------------+----------------------------------- 1 | 6,888,177 91.19 91.19 2 | 644,570 8.53 99.72 3 | 21,163 0.28 100.00 ------------+----------------------------------- Total | 7,553,910 100.00
What I understood from my readings is that:
- For tabulations and descriptives I should add [pweight=freq]
- For regressions, I should add [fweight=freq]
Could you confirm me that I understood it correctly?
Thanks in advance for the help.
Regards,
Oscar
Comment