In most of current AR applications the interaction is limited to the motion of the handheld device or HMD as a way to change the point of view with respect to the scene that contains the virtual elements. We can say that for several cases this is sufficient if the objective is only the visualisation of those elements. What happens when the user wants to select different types of information or eventually interact with the virtual elements to modify their behaviour or even use them as control inputs for some physical system. The handheld approaches can make use of the touchable interface to select, open menus and select options of these elements. 

Conversely, the HMD-based applications are typically hands free approaches and selections or interactions must be made using buttons on the helmet itself, or using a gamepad or other handled device. We can say that there is more freedom for the interaction as the surrounding environment remains visible, keyboards, button pads, or any traditional device may be used.

 

A. Limitations in direct manipulation

All the above mentioned interaction mechanisms can be used to modify the behaviour of the application, or even of the virtual objects added to the scene. But how strange is may be to change object properties like the position or the orientation through one of these indirect interaction mechanisms. The natural and intuitive way of moving an object is by direct manipulation. 

Nevertheless, touching virtual objects is still restricted to the use of rod-like interfaces of haptic devices, that enables to touch the objects not with the fingers, but with the rod or pencil handled by the user. Other approaches like vibro-tactile gloves aim at the use of vibrations to represent the touch with some level of success. Thus forgetting about touch, the interaction with virtual objects using direct manipulation has been at the center of the attentions of several researchers. Using hands to rotate and move objects is natural for the users. The main difficulty is on how to reliably track hands and their gestures, given the high number of degrees of freedom of their articulated nature, and the constant creation of self-occlusions.  Although vision- or image-and-depth-based approaches have shown good results, as is the case of LeapMotion, they are still limited to configurations where the gestures occur in a limited volume. Some attempts to use LeapMotion devices mounted on the HMD have shown some good results, but hands are better tracked from bellow given that their natural poses generate many occlusions when observed from above.

 

IV. Implementing Augmented Reality Applications

To create an augmented reality application, independently of the target device being a handheld or a HMD, the principle is the same, to create the illusion that virtual objects or entities are integrated on the environment or some of its elements, through the  view

that is transmitted to the user. Excluding see-through devices because these raise a new set of problems, the remaining systems employ one or two cameras to capture the view of the environment that is going to be shown on the devices screen or on the pair of  displays of HMDs. The virtual elements are displayed combined with the image and this should be sufficient to create the intended perception. But for this to be true, the virtual elements appearance much evolve exactly as that of the "neighbouring" real elements (NRE) do. So the principle is that we need to estimate changes in the relative pose between the NRE and the camera. This can be done using any type of technology that enables to track both parts, or just one with respect to the other.

Two main types of technologies are used, marker detection and pose estimation using the camera images and sensors that measure camera displacement- and rotation-related quantities, known as IMUs. 

Both have advantages and disadvantages that can be summarised as follows: Visual markers-based AR tends to be shaky, illumination dependent and does not typically behave well when the supporting markers are not fully visible, making the virtual models to appear or disappear instantly, or stop moving thus not following the camera/marker movements. 

On the other side, IMUs do not provide direct pose information but it has to be estimated using double integration of the measurements for position. Fortunately, in what concerns orientations it is possible to obtain much more reliable estimates. For this reason, IMU-based AR applications typically only use the estimated orientation the camera to manipulate the view, not allowing for lateral, up-down, or proximity changing movements.

 

A. Direct manipulation for virtual objects

To interact with the virtual elements, using direct manipulation there are several possibilities, but we can expect that if we are seeing the objects in front of us, we should be able to touch them. This can be done in various ways, but if we focus on the use of our hands for that we can propose either the use of a tracked object that will be used to do the interaction (e.g. pliers, forceps, or stick), or track the hand itself. A low-cost wireless hand tracker was developed at ISR-UC. The new device was developed as a set of 6 modules: one main board that connects the five other smaller boards, one for each finger. Each of the boards contains an inertial measurement unit (IMU) that through the use of a special purpose adaptation of the Complementary Filters estimation algorithm enables the acquisition of the hand motions. The constructed prototype is shown in Figure 1. , where the parts are individually identified. 

As the motion capture provided by this device is based on inertial sensors, only finger flexion-extension (adduction-abduction) and hand orientation are considered, given that position (translation) estimation will suffer from error accumulation that rapidly makes it unusable.  The principles and description of the design of this device is also available.

 

Figure 1. Prototype of hand motion tracker developed around the MPU-9150 and the ESP-01 wireless processing board.

This device, which is accessible through a normal TCP/IP connection, can be included in different types of applications that require the capture of hand motions, eventually for the two hands if two devices are employed. Being wireless it can be used in a variety of applications, namely in AR or VR. In particular if used in conjunction with a Kinect sensor, it is possible to capture full body and hand movements, even in configurations where the hands are occluded by other body parts or objects.

 Figure 1. shows an image where a virtual hand replicates the user’s hand movements.

 

 

Figure 2. Example of hand tracking and model animation, using the developed hand tracker prototype 

 

B. Touchable but virtually modifiable objects

 One of the parts that defies realism is the lack of sense of touch when it comes to manipulating virtual objects. Despite the attempts in creating haptic devices these still have limitations in terms of stimulus provided and/or manipulation constraints. One possibility is in creating physical objects that can be easily tracked or even instrumented, and then replace their views by modified ones. Using the appropriate sensors, it is possible to give controllable perceptions of the object in terms of some properties like its rigidity~\cite{trcocacola}.