Access AI content by logging in

“One very interesting question is what might we learn from recent developments in AI about how humans learn and process language. The full story is a little bit more complicated because, of course, humans don't learn from the same kind of data as language models. Language models learn from this internet-scraped data that includes New York Times articles, blog posts, thousands of books, a bunch of programming code, and so on. That's very different from the kind of linguistic input that children learn from, which is child-directed speech from parents and relatives that’s much simpler linguistically. That said, there is a lot of interesting research trying to train smaller language models on data that is similar to the kind of input that a child might receive when they are learning language. You can, for example, strap a head mounted camera on a young child's head for a few hours every day and record whatever auditory information the child has access to, which would include any language spoken around or directed at the child. You can then transcribe that, and use that data set to train a language model on the child-directed speech that a real human would have received during their development. So some artificial models are learning from child-directed speech, and that might eventually go some way towards advancing debates about what scientists and philosophers call “nativism versus empiricism” with respect to language acquisition—the nature/nurture debate: Are we born with a universal innate grammar that enables us to learn the rules of language, as linguists like Noam Chomsky have argued, or can we learn language and its grammatical rules just from raw data, just from hearing and being exposed to language as children?If these language models that are trained from this kind of realistic datasets of child-directed speech manage to learn grammar, to learn how to use language in the way children do, then that might put some pressure on the nativist claim that there is this innate component to language learning that is part of our DNA, as opposed to just learning from exposure to language itself.”