See this file:
And some additional comments...
A few points...
1) There's not a dichotomy between embodiment based and corpus based statistics. So the real question is whether *adding* embodiment based statistics to corpus based statistics can provide a significant improvement for NLP. It doesn't really matter from a practical perspective how well one could do with embodiment based statistics *only* ... though this is interesting from a theoretical perspective.
2) To really make a good test of the value of embodiment based statistics for NLP, one would need a large "corpus" of experiences. With a very small collection of experiences, one can do "toy" small-scale tests that demonstrate the principle of embodiment based disambiguation ... but one can't expect to get really useful results.
A human child interacting in the world all day every day, for a few years, *does* gather a fairly large corpus of experiences. On the other hand, a robot that's turned on for a couple hours each week, for experimentation in a fairly empty robot lab, really does not. And nor do our virtual dogs right now, in our simple "toy" virtual world. What we really need are robots that are interacting with people and objects all day long every day ... or virtual agents that interact with players all day in a popular video game, etc.
3) It's not quite fair to compare the **first ever attempt** at embodiment-based disambiguation, with current algorithms for corpus-based disambiguation. After all, there have been decades of effort by thousands of people on corpus-based disambiguation (including a lot of folks at Microsoft, Google, etc.). Whereas embodiment-based disambiguation is just beginning, and surely we have a lot of very basic things still to learn about it. The first attempts may not be that great in performance, but there is going to be a *lot* of room for improvement....
4) The really hard thing about getting accurate frequencies from embodied experience is contextualization. For instance what if we're calculating the frequency of
Do we want
P( subj(sleep, person) ) P( subj(sleep, person) | person is indoors ) P( subj(sleep, person) | person is in their bedroom ) P( subj(sleep, person) | person is at work ) etc.
Once you know what the relevant context is, then if you're a reasonably broadly experienced agent, you can calculate very useful embodied-experience-based probabilities. But knowing what is the relevant context is a hard problem.
This problem doesn't exist so badly with the simple examples in my paper, because the virtual agents we're working with now only operate in a single context ;-)
Of course, the same problem exists with corpus based linguistics. But I think this is a big advantage of the embodiment based approach: contexts are easier for young minds to identify, in the embodied world in which they live. The way we identify contexts when reading text, is largely by analogy to how we do it in the embodied world...
Large, passive corpus
Much of corpus linguistics is based on analyzing texts of human interaction. If a large collection of movements, gestures, interactions and chat were available for a virtual world, then many of the same techniques that are applied in linguistics could be applied to the analysis of this corpus. In either case, the learning is passive, not active, in that the observer is not interacting with the world, but is just watching it.
For example, in linguistics, one looks at the conditional probabilities (entropies, mutual information) of having one word occur near another, or be in some relationship to another. One could gather similar statistics to correlate movements, actions, and utterances. In linguistics, things don't become "interesting" until one has mined fairly "deep" patterns; just how "deep" one can go in observing 3D virtual interactions is unclear.