Transformation of Data

Note: It should be emphasized that transformation of data in statistics, if needed, must take place right at the beginning of the statistical analysis.
The validity of analysis of variance depends on certain important assumptions like normality of errors and random effects, independence of errors, homoscedasticity of errors and effects are additive. The analysis is likely to lead to faulty conclusions when some of these assumptions are violated. A very common case of violation is the assumption regarding the constancy of variance of errors. One of the alternatives in such cases is to go for a weighted analysis of variance wherein each observation is weighted by the inverse of its variance. For this, an estimate of the variance of each observation is to be obtained which may not be feasible always. Quite often, the data are subjected to certain scale transformations such that in the transformed scale, the constant variance assumption is realized. Some of such transformation of data in statistics can also correct for departures of observations from normality because unequal variance is many times related to the distribution of the variable also. Major aims of applying transformation of data in statistics are to bring data closer to normal distribution, to reduce relationship between mean and variance, to reduce the influence of outliers, to improve linearity in regression, to reduce interaction effects, to reduce skewness and kurtosis. Certain methods are available for identifying the transformation of data in statistics needed for any particular data set but one may also resort to certain standard forms of transformation of data depending on the nature of the data. Most commonly used transformation of data in the analysis of experimental data are Arcsine, Logarithmic and Square root. These transformations of data can be carried out using the following options.

Arcsine Transformation : Arcsine transformation of data is appropriate for the data on proportions, i.e., data obtained from a count and the data expressed as decimal fractions and percentages. The distribution of percentages is binomial and arcsine transformation of data makes the distribution normal. Since the role of Arcsine transformation of data is not properly understood, there is a tendency to transform any percentage using arc sine transformation. But only that percentage data that are derived from count data, such as % barren tillers (which is derived from the ratio of the number of non-bearing tillers to the total number of tillers) should be transformed and not the percentage data such as % protein or % carbohydrates, which are not derived from count data.
In the case of proportions, derived from frequency data, the observed proportion p can be changed to a new form 
This type of transformation of data is known as angular or arcsine transformation. However, when nearly all values in the data lie between 0.3 and 0.7, there is no need for such transformation. It may be noted that the angular transformation is not applicable to proportion or percentage data which are not derived from counts. For example, percentage of marks, percentage of profit, percentage of protein in grains, oil content in seeds, etc., can not be subjected to angular transformation. The angular transformation is not good when the data contain 0 or 1 values for p. The transformation in such cases is improved by replacing 0 with (1/4n) and 1 with [1-(1/4n)], before taking angular values, where n is the number of observations based on which p is estimated for each group.
ASIN gives the arcsine of a number. The arcsine is the angle whose sine is number and this number must be from -1 to 1. The returned angle is given in radians in the range to. To express the arcsine in degrees, multiply the result by 180/. For this go to the CELL where the transformation is required and write =ASIN (Give Cell identification for which transformation to be done)* 180*7/22 and press ENTER. Then copy it for all observations.
Example: ASIN (0.5) equals 0.5236 (/6 radians) and ASIN (0.5)* 180/PI equals 30 (degrees).

Logarithmic Transformation: Logarithmic transformation of data is suitable for the data where the variance is proportional to square of the mean or the coefficient of variation (S.D./mean) is constant or where effects are multiplicative. These conditions are generally found in the data that are whole numbers and cover a wide range of values. This is usually the case when analyzing growth measurements.For data of this nature, logarithmic transformation of data is recommended. It squeezes the bigger values and stretches smaller values. A simple plot of group means against the group standard deviation will show linearity in such cases. A good example is data from an experiment involving various types of insecticides. For the effective insecticide, insect counts on the treated experimental unit may be small while for the ineffective ones, the counts may range from 100 to several thousands. When zeros are present in the data, it is advisable to add 1 to each observation before making the transformation. The log transformation of data is particularly effective in normalizing positively skewed distributions. It is also used to achieve additivity of effects in certain cases.
LN gives the natural logarithm of a positive number.  Natural logarithms are based on the constant e (2.72). For this go the CELL where the transformation is required and write = LN(Give Cell Number for which transformation to be done) and press ENTER. Then copy it for all observations.
Example: LN(86) equals 4.45, LN(2.72) equals 1, LN(EXP(3)) Equals 3 and EXP(LN(4)) equals 4. Further, EXP returns e raised to the power of a given number, LOG returns the logarithm of a number to a specified base and LOG 10 returns the base-10 logarithm of a number.

Square Root Transformation: This transformation of data is appropriate for the data sets where the variance is proportional to the mean. Here, the data consists of small whole numbers, for example, data obtained in counting rare events. This data set generally follows the Poisson distribution and square root transformation approximates Poisson to normal distribution. If the original observations are brought to square root scale by taking the square root of each observation, it is known as square root transformation. This is appropriate when the variance is proportional to the mean as discernible from a graph of group variances against group means. Linear relationship between mean and variance is commonly observed when the data are in the form of small whole numbers (e.g., counts of wildlings per quadrat, weeds per plot, earthworms per square metre of soil, insects caught in traps, etc.). When the observed values fall within the range of 1 to 10 and especially when zeros are present, the transformation should be,
SQRT gives square root of a positive number. For this go to the CELL where the transformation is required and write = SQRT (Give Cell No. for which transformation to be done = 0.5) and press ENTER. Then copy it for all observations. However, if number is negative, SQRT return the #NUM ! error value.
Example: SQRT(16) equals 4, SQRT(-16) equals #NUM! and SQRT(ABS(-16)) equals 4.

Box-Cox Transformation: 
By now we know that if the relation between the variance of observations and the mean is known then this information can be utilize in selecting the form of the transformation.

We now elaborate on this point and show how it is possible to estimate the form of the required transformation from the data.

Box-Cox transformation is a power transformation of the original data.

Let yut is the observation pertaining to the uth plot, then the power transformation implies that we use yut’s as --- eq(1)
                
Box and Cox (1964) have shown how the transformation parameter l in eq(1) may be estimated simultaneously with the other model parameters (overall mean and treatment effects) using the method of maximum likelihood. The procedure consists of performing, for the various values of l, a standard analysis of variance on

is the geometric mean of the observations. The maximum likelihood estimate of l is the value for which the error sum of squares, say SSe(l), is minimum. Notice that we cannot select the value of l by directly comparing the error sum of squares from analysis of variance on yl because for each value of l the error sum of squares is measured on a different scale. Equation (A) rescales the responses so at error sums of squares are directly comparable.

Therefore, the l can be estimated in three different ways i.e. by minimizing these error sum of squares.

This is a very general transformation and the commonly used transformations follow as particular cases. The particular cases for different values of  are given below.


l
Transformation
1
No Transformation
½
Square Root
0
Log
-1/2
Reciprocal Square Root
-1
Reciprocal


If any one of the observations is zero then the geometric mean is undefined. In the expression A, geometric mean is in denominator so it is not possible to compute that expression. For solving this problem, we add a small quantity to each of the observations.

Once the transformation has been made, the analysis is carried out with the transformed data and all the conclusions are drawn in the transformed scale. However, while presenting the results, the means and their standard errors are transformed back into original units. While transforming back into the original units, certain corrections have to be made for the means. In the case of log transformed data, if the mean value is , the mean  value of the original units will be antilog (+ 1.15) instead of antilog (). If the square root transformation had been used, then the mean in the original scale would be antilog ((+ V())2 instead of    ()2 where V() represents the variance of . No such correction is generally made in the case of angular transformation. The inverse transformation for angular transformation would be p = (sin q)2.

Note: Examples discussed are for MS-Excel.