AGENT: A Benchmark for Core Psychological Reasoning
This Research paper was published at the ICML (International Conference of Machine Learning) on 23rd July, 2021 by MIT, IMB Watson and Harvard. It is about a machine understanding the human mental life in order to interact with in the real-world.
If you want to check the original Research paper link is given here https://arxiv.org/pdf/2102.12321.pdf. And If want the videos of an agent applying common sense and the AGENT dataset check the link here https://www.tshu.io/AGENT/
If you want a brief idea about what this research paper is about read the following.
Let me give you an abstract of what is it about.
Inspired by the cognitive development studies on intuitive psychology, a large dataset which generates 3D animations named AGENT (Action, Goal, Efficiency, coNstraint, uTility) is structured around four scenarios 1. Goal Preferences 2. Action Efficiency 3. Unobserved Constraints 4. cost-reward trade off. The result of this suggests that to pass a designed test of core intuitive psychology at human levels, a model must aquire or have built- in representations of how agent is making decisions or planning by combining the utility computation and core knowledge of objects and physics.
Commonsense Reasoning - The ability to acceptable and logical assumption or decision in daily life of a human.
Now, talking about the AGENT dataset itself. The above figure is giving you the idea of what trials in AGENT dataset consists of. It has four scenarios, every scenario has two phases: 1. Familiarization phase:- This phase is where the new agent is shown multiple videos of some typical behaviours of a particular agent i.e. familiarizing the agent to the situations. 2. Test phase:- This is the phase where the new agent has to perform task shown in the familiarizing videos but in new physical situation (A, B and D in figure) or putting some constraints that it has never seen (C in figure).
The test video is divided in two expected and surprising. In expected test video, the agent behaves consistently with its action from familiarization videos (i.e. pursues the same goal, acts efficiently to its contraints and maximize the rewards) whereas in surprising test video, the agent does the opposite of the expected video the agent behaves inconsistently with its action from familiarization videos and pursues the goal inefficiently and it costs the agent. There are also more scenarios which has new setups of the physical scenes and enables harder test generalization for the new agent (infant).
The dataset contains 8400 videos in AGENT. The videos are 5.6 sec to 25.5 sec long with 35 fps. There are 3600 videos which divided in 1960 training videos, 480 validation videos and 960 in testing that is 480 for both expected and surprising test videos. All training and test contains only expected videos.
There are two baseline methods used that are 1. Bayesian Inverse Planning and Core Knowledge and 2. Theory of Mind Neural Network.
The methods that are used by the researcher is ToMnet-G and BIPaCK model. These both are used and compared, we found that BIPaCK model had the highest performance on AGENT to human performance. There were also different tests done on these two models in which the again BIPaCK was ahead especially in strong generalization.


Comments
Post a Comment