MadSci Network: Other |
I'm attempting to perform a discriminant analysis on a fairly large data set (approx. 440 in my experimental group and 150 in my control group). I have a couple of problems that I need some help with. I know that multiple regressions depend on variables being continuous and normally distributed. I have a number of dichotomous (yes/no) variables that I would like to include in my analysis. I have coded them as dummy variables (0=no, 1=yes). The problem is that many of them are heavily skewed (80-95% of responses are "no"). One text I was reading suggested performing log or square root transformations. While this appears to correct the skew for continuous variables (I tested it out on "weight" which is one of the continuous variables in my data set), it doesn't have any effect on my dummy variables. Do you have any suggestions as to how I might fix this problem so that these variables come closer to normality? If not, are there any guidelines for how much a variable can deviate from normality without having a major impact on the overall analysis? I'm also wondering whether you have any suggestions as to how to deal with missing values? The analysis program I'm using (SPSS) removes all cases with missing values for any variable. This reduces my N considerably, even though each variable has only a few missing values.
Re: Normalization variables in regression analysis questio.n
Try the links in the MadSci Library for more information on Other.