Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Client Engagement Survey Data using Multivariate Regression

    Hi,

    So I have been assigned a project at work to figure out how different independent variables affect the dependent variable in question--namely customer satisfaction. I am trying to run a multivariate regression to accomplish this. I understand the basics of multivariate regression but need to figure out how my independent variables should be defined. The independent variables in question are assets under management, age, gender, features linked to account, and number of appointments in 2016. These variables all take on numerical values but some variables have are discrete and have limits. For example, features linked to account has a minimum of 0 and a maximum of 5. Also, number of appointments in 2016 goes from 0 to 12. Lastly, the customer satisfaction rating goes from 0 to 10 (dependent variable). I have already defined gender as a dummy variable, and recognize that assets under management and age are continuous variables. I am only wondering how I should define features on account, number of appointments, and satisfaction level. Since these variables take on only specific values, how should they be defined for proper multivariate regression analysis? Thank you for any help you can provide! Let me know if you would like a screenshot of my data.

    Best,
    Connor

  • #2
    You need some basic introduction to statistical analysis. There are a million texts and probably on-line resources to do this.

    Re your question. It makes a difference if the variable in the dv or rhs.

    The general problem with satisfaction measures (e.g, scales of 1 to 10) is that 2 (poor) is not twice 1 (very poor) and 10 (excellent) is not really 10 times 1 (very poor).

    In such a case, for a dependent variable, you properly would want to use ordinal logit or ordinal probit (ologit or oprobit in Stata). This assumes 2 is above 1 but not twice 1, etc. However, given your level of understanding, a standard regression is probably easier to work with and will probably give you reasonable answers.

    You don't say exactly how your features are coded. If they're different variables indicating different features, then including each feature as a dummy variable would work. If it is number of features, including the count would work (noting you're assuming all features have equal value in influencing satisfaction). Number of appointments is a count so there is no problem including it as a regressor. Note that doing so does assume satisfaction is linear in number of appointments (e.g., that moving from 1 to 2 appointments has the same influence as moving from 10 to 11).

    If you don't want to assume number of appointments has a linear influence, you can let Stata make dummies for each number of appointments (i.e., a dummy for 1 appointment, another for 2 appointments, etc.) assuming number of appointments doesn't go on forever. This is easy in Stata - in include the variable with i. in front of it. e.g. regress y i.x

    There is a separate issue that number of appointments may be influenced by satisfaction, what is called endogeneity. This is manageable, but not with your level of skills. You might check the results with and without including number of appointments.

    Comment


    • #3
      By the way, in future, you're more likely to get a helpful answer if you follow the FAQ on asking questions - provide Stata code in code delimiters, readable Stata output (tables become unreadable in variable spacing fonts), and sample data using dataex.

      Comment

      Working...
      X