2020 年 2020 巻 AGI-016 号 p. 03-
To make it possible for non-experts to operate a robot in a human environment, instruction following, that is to operate a robot by natural language instructions, is focused on. Recently ALFRED dataset is released. The ALFRED dataset is the first large dataset annotated with high-level instructions specifying the task and low-level instructions on the action to be taken at each step by the robot. The robot aims to achieve the task while observing a photo-realistic image and interacting with objects in this environment. It requires long steps to achieve these tasks. But the baseline of ALFRED is not robust to a long horizontal setting. In this work, we aim to build a robot that follows natural language instructions in a realistic environment using the recently released ALFRED dataset. We propose the method to split the task into easier sub-tasks utilizing natural language instructions and the method to use the auxiliary task predicting abstract high-level actions to make the robot robust for a long-horizontal setting. Our experiments show that our methods improve the task success rate.