This book presents established and state-of-the-art methods in Language Technology (including text mining, corpus linguistics, computational linguistics, and natural language processing), and demonstrates how they can be applied by humanities scholars working with textual data. The landscape of humanities research has recently changed thanks to the proliferation of big data and large textual collections such as Google Books, Early English Books Online, and Project Gutenberg. These resources have yet to be fully explored by new generations of scholars, and the authors argue that Language Technology has a key role to play in the exploration of large-scale textual data. The authors use a series of illustrative examples from various humanistic disciplines (mainly but not exclusively from History, Classics, and Literary Studies) to demonstrate basic and more complex use-case scenarios. This book will be useful to graduate students and researchers in humanistic disciplines working with textual data, including History, Modern Languages, Literary studies, Classics, and Linguistics. This is also a very useful book for anyone teaching or learning Digital Humanities and interested in the basic concepts from computational linguistics, corpus linguistics, and natural language processing.
Barbara McGillivray is a Turing Research Fellow at the University of Cambridge and The Alan Turing Institute, UK. She has published two monographs,Methods in Latin Computational Linguistics (2014) andQuantitative Historical Linguistics. A corpus framework (2017).
Gábor Mihály Tóth is a Research Fellow at the Shoah Foundation and the Signal Analysis and Interpretation Laboratory (SAIL), Viterbi School of Engineering, University of Southern California, USA.
Chapter 1: Language Technology for the Humanities.- Chapter 2: Design of Text Resources and Tools.- Chapter 3: Frequency.- Chapter 4: Collocation.- Chapter 5: Word Meaning in Texts.- Chapter 6: Mining Textual Collections.- Chapter 7: Closing Remarks.