The most frequently used statistics in geology: an RG discussion

Author:
pebbles_plot
(image of pebble masses plot from: arrowsmith410-598.asu.edu)
Dear friends, this post was originated from my question in ResearchGate network.

I teach basic hydrogeology at undergrad level at Dept of Geology. My lecture applies quantitative-statistical approaches, while most of the classes in the basic geology field uses more qualitative sense. I invite opinions from the RG community on what are the frequently used statistical analyses in geology?

The question received many responses from the following RG members:​

  • Prahar Iqbal from Indonesian Institute of Sciences
  • Jon Salmanton-Garcia from Instituto de Salud Carlos III

  • Aline Concha Dimas, independent researcher

  • Ketil Haarstad from Bioforsk Soil and Water Environment
  • Theodoros M. Tsapanos from Aristotle University of Thessaloniki, Dept of Geology, Greece

  • Alessandro Comunian from University of Milan, Dept of Earth Sciences

  • Evangelos Tziritis from Hellenic Agricultural Organization

  • Ahmed S. Elshall, independent researcher
Jon admitted that statistical analysis has been widely developed in the non-geology field, eg: psychology or anthropology and medicine, but he said that it would be very instant to notice the need of statistics in geology. One of the problem is how to classify the importance of many geological variables in a dataset.
Structural geology being the branch of geology that mainly uses statistics, specifically to analyse plane orientations (joints, fractures, faults) in a stereographic projection (or stereo-net) was mentioned by Aline. She also talked about granulometric analysis that uses histogram as its main tool to characterise sediments and particle’s history. Paleontology and micropaleontology also use a great deal of statistics, including principal components and cluster analysis.
Ketil proposed statistical moments to be the most common statistical procedure in geology. He gave link on his paper about comparing distribution test and interval estimators (Haarstad, K. 1996. A comparison of distribution tests and relevant point and interval estimators. Journal of Environmental Quality, 25, 3, 578-583, 1996).

Mathematical Statistics and Data Analysis must be one of the substantial courses in the geology departments, as mentioned by Theodoros. He noted the necessity to include: probability theory, random variables, limit theory, distributions, survey sampling, comparison of two samples, analysis of variance, Bayesian decision theory, cluster analysis, and factor analysis. He argued all of these statistics to be very important for students’ knowledge and their future career.

Evangelos pointed out that hydrogeochemistry and hydrogeology as the branch of geology frequently use statistical analyses, for example multivariate statistical methods (e.g. factor analysis, principal components analysis, cluster analysis) as well as regression analysis etc. He also mentioned aquifer vulnerability issues (still in the field of hydrogeology), the extensive use of promising methodology of artificial intelligence (Machine Learning) named Random Forest, which is currently still scarcely applied in geosciences.
In the various branch of statistics, geostatistics is still the most frequently used in earth science. Both researchers, Alessandro and Prahara, also mentioned two publications:
  1. fractal-based statistics as proposed in Bailey R and Smith D (2010, doi:10.3997/1365-2397.2010001) and
  2. connectivity metrics in hydrogeology in a paper by Renard P and Allard D (2013, doi:10.1016/j.advwatres.2011.12.001).
Ahmed suggested the following references:
  1. Jef Caers (2011), Modeling Uncertainty in the Earth Sciences, Wiley
  2. Martin Trauth et al. (2010), MATLAB® Recipes for Earth Sciences, Springer

As also mentioned by Ahmed, the 1st book contains modeling uncertainty in the Earth sciences as it covers key issues such as: Spatial and time aspect; large complexity and dimensionality; computation power; costs of ‘engineering’ the Earth; uncertainty in the modeling and decision process.

The 2nd book contains a wide range of applications of Matlab in geosciences, such as image processing in remote sensing, the generation and processing of digital elevation models, and the analysis of time series. This book introduces methods of data analysis in geosciences using MATLAB, such as basic statistics for univariate, bivariate and multivariate datasets, jackknife and bootstrap resampling schemes, processing of digital elevation models, gridding and contouring, geostatistics and kriging, processing and georeferencing of satellite images, digitizing from the screen, linear and nonlinear time-series analysis, and the application of linear time-invariant and adaptive filters.