A: Yes, absolutely. For instance, researchers take short texts and present them to people word by word, while simultaneously "feeding" the same material to a large language model. For the model,
surprisal is immediately calculated — a number that indicates how unexpected a given word is for it. Common words yield low surprisal; rare or anomalous ones yield high surprisal. In humans, unexpectedness is captured by EEG sensors, as well as by increased pauses, reading times, and gaze fixation on the word. If a sentence is "broken," two characteristic peaks light up on the graph of the brain's electrical activity: N400 and P600. N400 appears approximately 0.4 seconds later when a word violates grammar ("He
spread-plural jam on the bread"); P600 emerges another two-tenths of a second later when a word is grammatically correct but semantically absurd ("I spread socks on the bread").
The results are intriguing. Model surprisal aligns well with N400: when the algorithm and the brain encounter a grammatical disruption, both "stumble" simultaneously. An analogue of the P600 response is also present in the model — just not as a discrete signal, but as the cost of prediction reassembly (the difference between surprisal on the final and penultimate word, or KL-divergence). This allows for a more precise linking of the computational and neurophysiological levels of language processing description.
The second line of research involves using large language models as a "test range" for probing the limits of human cognition. Here, scientists deliberately place models and humans under identical conditions and look for divergences. For example, a model can be artificially constrained to "hold" no more than two phrases simultaneously; when this limitation is introduced, its predictions of reading times and error rates begin to resemble human data — confirming the hypothesis that humans consider only a small number of predictions in parallel. In a similar fashion, researchers test the so-called frequency indistinguishability threshold: humans barely notice the difference between very rare words, whereas a model distinguishes them effortlessly. When a simplified "threshold" is imposed on the model, its numerical results once again approximate reader behavior.
Hahn, M., Futrell, R., Levy, R., & Gibson, E. (2022). A resource-rational model of human processing of recursive linguistic structure. Proceedings of the National Academy of Sciences, 119(43), e2122602119. https://doi.org/10.1073/pnas.2122602119Andrea de Varda, A., & Marelli, M. (2024). Locally Biased Transformers Better Align with Human Reading Times. In T. Kuribayashi, G. Rambelli, E. Takmaz, P. Wicke, & Y. Oseki (Eds.), Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics (pp. 30–36). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.cmcl-1.3Kuribayashi, T., Oseki, Y., Brassard, A., & Inui, K. (2022). Context limitations make neural language models more human-like. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. EMNLP 2022. https://doi.org/10.18653/v1/2022.emnlp-main.712