A preliminary text classification of the precursory accelerating seismicity corpus: inference on some theoretical trends in earthquake predictability research from 1988 to 2018

انور ال محمد · August 14, 2019

https://link.springer.com/article/10.1007/s10950-019-09833-2

Text analytics based on supervised machine learning has shown great promise in a multitude of domains but has yet to be applied to seismology. We describe some common classifiers (Naïve Bayes, k-Nearest Neighbors, Support Vector Machines, and Random Forests) as well as the standard steps of supervised learning (training, validation of model parameter adjustments, and testing). To illustrate text classification on a seismological corpus, we use a hundred articles related to the topic of precursory accelerating seismicity, spanning from 1988 to 2010. This corpus was labelled by Mignan [Tectonophysics, 2011] with the precursor whether explained by critical processes (i.e., cascade triggering) or by other processes (such as signature of main fault loading). We investigate how the classification process can be automatized to help analyze larger corpora in order to better understand trends in earthquake predictability research. We find that the Naïve Bayes model performs best, in agreement with the machine learning literature for the case of small datasets, with cross-validation accuracies showing the model’s predictive ability for both binary classification (“critical process” or else) and a multiclass classification (“non-critical process,” “agnostic,” “critical process assumed,” “critical process demonstrated”). Prediction on a dozen of articles published since 2011 shows however a weak generalization, which can be explained, in part, by the empirical variance of the small training set. This preliminary study demonstrates the potential of supervised learning to reveal textual patterns in the seismological literature. Manual labelling remains essential but is made transparent by an investigation of Naïve Bayes keyword posterior probabilities.

Sign In

A preliminary text classification of the precursory accelerating seismicity corpus: inference on some theoretical trends in earthquake predictability research from 1988 to 2018

Recommended Posts

انور ال محمد 0

Share this post

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity