Marc Chagall, “Crossing of the Read Sea,” 1966, with grid overlay (source)


Anyone who picks up the ETCBC Hebrew data soon notices the lack of semantic categories. Things like lexical word classes, semantic functions/roles, or robust lexeme glosses are nowhere to be found. These kinds of categories are frequently used in grammatical studies. For example, lexical aspect categories like States, Activities, Accomplishments, and Achievements (cf. Vendler) play a key role in describing Biblical Hebrew verb semantics. These categories show, for example, why it is non-sensical to say things like: “This book was in Hebrew for five days,” since ‘BEING IN HEBREW’ is a kind of state, and such a state would not hold for only ‘five days’ (Harry Potter-type books not withstanding).

The reason the ETCBC lacks these categories can be summarized in a phrase: form-to-function. This is the center’s guiding methodology, which states that formal patterns must be fully described before moving on to functional meanings. This is a thoroughly text-immanent approach. The motivation is to describe objective data only; and that data can then inform our interpretation of the grammar and text. In implementing this principle, we at the ETCBC have generally taken “formal patterns” to mean syntactic patterns. In our textlinguistics approach, syntax is extended beyond the level of the clause to inter-clausal “discourse” syntax. This is reflected in the database structure (e.g. clause “mothers”). But this narrow focus leaves unanswered questions about the relationship between syntax and content. In focusing only on syntactic patterns, we have overlooked the potentiality of semantic patterns.

Firth famously summarized the concept of semantic patterning with the phrase “you shall know a word by the company it keeps.” In other words, lexemes occur around terms with semantic overlap. Think of examples such as “darkness” and “night”, “blood” and “red”, “kitchen” and “spatula.” These words likely occur frequently in close proximity with one another due to their common domain. To take the last example, “kitchen” and “spatula”: imagine if we were to query all other words that occur alongside “kitchen” in a statistically significant way. It is likely we would find other co-occurring terms such as “fork”, “ladle”, “tongs”, etc. These are all terms that could be grouped in the same semantic category as “spatula”. With some further work, we may even construct an entire semantic hierarchy of things found in the kitchen.

The governing dynamic behind these co-occurrence patterns is known as the distributional principle. The recognition of semantic distributions has given rise to a new, exciting field called empirical semantics (see Geeraerts 2010; Stefanowitsch 2010; Levshina 2014). This field is working on testing accepted semantic theories by measuring semantic patterns. Through this kind of rigorous testing, it is hoped that the field might develop semantic classes that derive from hard data rather than pure intuition. The application of tools developed in this field thus holds great promise for the ETCBC’s method of analyzing the Hebrew Bible.

When viewed as a single text, the Hebrew Bible itself assumes a kind of taxonomy of the world, and this taxonomy is projected through the collocations of lexemes and constructions. With computational tools we can rigorously catalogue and model these patterns. Though we have a limited corpus, and hence limited data, we can use the analysis of common terms to understand rare terms—this is exactly what lexicographers do. Perhaps, in the end, we could achieve an entire map of the text’s world. Doing so would allow us to build out towards more complex grammatical questions, such as the nature of the Hebrew verbal system. This approach to semantics relieves an often-unnoticed problem in cognitive-semantic approaches to Biblical Hebrew, that is, that we have no direct access to the authors’ cognitive processes (see esp. Van Hecke, From Linguistics to Hermeneutics, 2011). The empirical semantic approach, however, maps for us the one thing we do have: the world of the text itself.



This post was inspired by my 2018 Master’s thesis, “Toward a Distributional Approach to Verb Semantics in Biblical Hebrew” (Vrije Universiteit Amsterdam), the results of which can be perused in my Github repository. Further inspiration is derived from the following excellent resources:

Forbes, A. Dean. “Distributionally-Inferred Word and Form Classes in the Hebrew Lexicon: Known by the Company They Keep.” In Foundations for Syriac Lexicography II, edited by Peter J. Williams, 1–34. Piscataway, NJ: Gorgias Press, 2009.

Geeraerts, Dirk. “The Doctor and the Semantician.” In Quantitative Methods in Cognitive Semantics: Corpus-Driven Approaches, edited by Dirk Geeraerts and John R. Taylor, 63–78. Cognitive Linguistics Research 46. Berlin: De Gruyter Mouton, 2010.

Kravchenko, Alexander V. “The Experiential Basis of Speech and Writing as Different Cognitive Domains.” Pragmatics and Cognition 17:3 (2009): 527-548.

Levshina, Natalia, and Kris Heylen. “A Radically Data-Driven Construction Grammar: Experiments with Dutch Causative Constructions.” In Extending the Scope of Construction Grammar, edited by Ronny Boogaart, Timothy Colleman, and Gijsbert Rutten, 17–46. Cognitive Linguistics Research 54. Berlin: De Gruyter Mouton, 2014.

Stefanowitsch, Anatol. “Empirical Cognitive Semantics: Some Thoughts.” In Quantitative Methods in Cognitive Semantics: Corpus-Driven Approaches, edited by Dylan Glynn and Kertin Fischer, 355–80. Cognitive Linguistics Research 46. Berlin: De Gruyter Mouton, 2010.

Stefanowitsch, Anatol, and Stefan Th. Gries. “Collostructions: Investigating the Interaction of Words and Constructions.” International Journal of Corpus Linguistics 8, no. 2 (2003): 209–43.