While it is essential for every researcher to obtain data that is highly accurate, complete, representative and comparable, it is known that missing values, outliers and censored values are common characteristics of a water quality data-set. Random and systematic errors at various stages of a monitoring program tend to produce erroneous values, which complicates statistical analysis. For example, the central tendency statistics, particularly the mean and standard deviation, are distorted by a single grossly inaccurate data point. An error, which is initially identified and is later incorporated into a decision making tool, like a water quality index (WQI) or a model, could subsequently lead to costly consequences to humans and the environment.
Checking for erroneous and anomalous data points should be routine, and an initial stage of any data analysis study. However, distinguishing between a data-point and an error requires experience. For example, outliers may actually be results which might require statistical attention before a decision can be made to either discard or retain them. Human judgement, based on knowledge, experience and intuition thus continue to be important in assessing the integrity and validity of a given data-set. It is therefore essential for water resources practitioners to be knowledgeable regarding the identification and treatment of errors and anomalies in water quality data before undertaking an in-depth analysis.