Author(s): Ram B Jain
Objective: Log-transformations are commonly used to normalize chemical data. However, log-transformations do not always normalize the data. Thus, the objective of this study was to recursively use Tukey’s exploratory techniques to erect fences towards the data extremes until normality or near normality was achieved for the data lying within these fences.
Design: Data from National Health and Nutrition Examination Survey for the period 2003–2004 for 27 variables were used to conduct this study. Some of the 27 variables included for this study were: serum folate, serum transferrin receptor, urinary perchlorate, serum polychlorobiphenyl (PCB) 44, PCB-28, PCB-87, and PCB-52. Tukey’s exploratory techniques were recursively used to erect fences towards the data extremes until normality or near normality was achieved for the data lying within these fences. Following this, robust techniques were used to estimate statistical parameters for the reduced data lying within these fences. The statistical properties of the reduced data so obtained were evaluated and compared with the original log-transformed data.
Setting: Cross-sectional data from National Health and Nutrition Examination Survey (NHANES) for the period 2003–2004 for 27 variables.
Subjects: 1790 to 8363 depending up on the variable of interest who participated in NHANES 2003-2004.
Results: The use of non-normal data for statistical analysis can lead to under- or over- estimation of the measures of central tendency (means and geometric means) depending upon the comparative mix and magnitude of the observations that are identified as potential outliers and trimmed from the lower and upper tails of the original distributions to achieve normality. The standard deviations are always over-estimated and the widths of the confidence intervals around the means are over-estimated. Additional insights into the demographic characteristics of those which were trimmed from extreme tails can be very valuable.
Conclusion: To obtain correct estimates of descriptive data, it is worthwhile to temporarily trim certain percent data (probably, < 5%) to achieve normality or near normality. An evaluation of these trimmed data can provide insight into the characteristics for a given variable of the persons who have too low or too high concentrations of the chemicals of interest.