Transformations > Box Cox Transformation
What is a Box Cox Transformation?
A Box Cox transformation is a way to transform non-normal dependent variables into a normal shape. Normality is an important assumption for many statistical techniques; if your data isn’t normal, applying a Box-Cox means that you are able to run a broader number of tests.
Running the Test
At the core of the Box Cox transformation is an exponent, lambda (λ), which varies from -5 to 5. All values of λ are considered and the optimal value for your data is selected; The “optimal value” is the one which results in the best approximation of a normal distribution curve. The transformation of Y has the form:
This test only works for positive data. However, Box and Cox did propose a second formula that can be used for negative y-values:
The formulae are deceptively simple. Testing all possible values by hand is unnecessarily labor intensive; most software packages will include an option for a Box Cox transformation, including:
- R: use the command boxcox(object, …).
- Minitab: click the Options box (for example, while fitting a regression model) and then click Box-Cox Transformations/Optimal λ.
|Common Box-Cox Transformations|
|Lambda value (λ)||Transformed data (Y’)|
|-3||Y-3 = 1/Y3|
|-2||Y-2 = 1/Y2|
|-1||Y-1 = 1/Y1|
|-0.5||Y-0.5 = 1/(√(Y))|
|0.5||Y0.5 = √(Y)|
|1||Y1 = Y|
**Note: the transformation for zero is log(0), otherwise all data would transform to Y0 = 1.
The transformation doesn’t always work well, so make sure you check your data after the transformation with a normal probability plot.
Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations, Journal of the Royal Statistical Society, Series B, 26, 211-252. Available online here.
Need help with a homework or test question? With Chegg Study, you can get step-by-step solutions to your questions from an expert in the field. If you rather get 1:1 study help, Chegg Tutors offers 30 minutes of free tutoring to new users, so you can try them out before committing to a subscription.
If you prefer an online interactive environment to learn R and statistics, this free R Tutorial by Datacamp is a great way to get started. If you're are somewhat comfortable with R and are interested in going deeper into Statistics, try this Statistics with R track.
Comments? Need to post a correction? Please post a comment on our Facebook page.