In this paper it is shown how multilinear forms can be used in the perception-action cycle. Firstly, these forms can be used to reconstruct an unknown (or partially known) scene from image sequences only. Secondly, from this reconstruction the movement of the camera can be calculated with respect to the scene, which solves the so called hand-eye calibration problem. Then action can be carried out when this relative orientation is known. The results are that it is sufficient to either use bilinear forms between every successive pair of images plus bilinear forms between every second image or trilinear forms between successive triplets of images. We also present a robust and accurate method to obtain reconstruction and hand-eye calibration from a sequence of images taken by uncalibrated cameras, based on multilinear forms. This algorithm requires no initialisation and gives a generic solution in a sense that is clearly specified. Finally, the algorithms are illustrated using real image...