Augmented Reality: Basic concepts

Augmented reality has recently received an enormous amount of attention from both general public but also commercial companies.

The game industry has been attentive to the long promised augmented reality, but technical difficulties have limited the achievable quality and this has postponed its inclusion in the offered games until recently. The increasing computational power of game consoles and the current awareness of general public to this type of technology makes impossible to ignore it at the risk of losing visibility to other competing companies that invest on it. 

 From the users side it is the novelty that mainly that catches the attention, in particular amongst the younger generations. By another side, many companies explore this new concept to promote their products by adding marker cards to enclosing boxes (e.g. breakfast cereal boxes) that gives access to simple games that can be played via specific applications that can be downloaded to smartphones or tablets.  

There are indeed several areas where augmented reality may create new opportunities and added value. One example that may be given is for fashion-related selling products like sunglasses, clothes, etc., where customers can try the items on and evaluate how they fit. This is certainly appreciated by people that do not feel comfortable or simply don't have the time to go a store and try the items. With AR people can try them on at home, or even on the store without needing to put clothes on and off several times.

AR (and VR) will certainly play a role in the future industries. Manufactures will benefit from AR-based systems in assembly tasks, inspection, and maintenance tasks. 

A first use can be in providing guidance about the sequencing of operations to be executed, where not only what to do next, but also how to do it will simplify the task of the operator. Beyond the guidance it may also provide direct visualization of values of the quantities being measured at a given instant, or functioning parameters, during verification, maintenance or tuning operations.

For this it becomes clear that engineering students should be introduced to the concepts of AR, as it is most likely that they will encounter this type of technology in their future workplace. Beyond the question of what  AR is and how it can be used, the question of what it is built upon may be explored either by the curious student or in the context of specific courses like computer vision (CV) or human-machine interaction (MHI). 

For the first case it can be used to give practical examples of the use of various subjects that may range from pattern recognition to projective geometry. In the case of HMI it opens the possibility to create new interaction mechanisms that may support activities like: AR-guided minimally invasive surgery, immersive teleoperation of micro or remote robots, tele-surgery and tele-diagnosys, to name a few. 

The remaining of this paper will present some of the subjects that will be contained on this U-Academy module. The next section discusses some concepts that are needed to understand the difference between AR and showing information or graphics on top of images, how is reality perceived and what are the ingredients for creating systems capable of induce augmented reality perceptions. The next section provides an analysis of what are the main types of interaction used nowadays, their limitations and the need to develop direct manipulation mechanisms. 

 

Augmented Reality Concepts

There are indeed several misconceptions about augmented reality (AR), especially among programmers and companies willing to use the current hype to promote their products. The most common one is the notion that for creating AR, one needs to get some nice 3D model and just superimpose it on live video. In fact, that can be part of it is enough to not create "augmented reality" by itself. In fact, we can use TV examples to understand the differences: During some interviews and movies subtitles may appear on the TV screen, to enable some people to understand what is being said. This is the case of movies exhibited in their original version with translated subtitles. The subtitles are not part of the scenarios being shown, so they do not belong to the reality shown there. On the other side, it has become common in sports transmissions to have virtual field marks displayed in repetitions of some important or polemic moments, to help the viewer in understanding why a referee had some decision, or why someone claims that that was a wrong decision. In this case, those marks can be perceived as lying on the field, so they "augment" the perceived field. This latter cases can be seen as examples of augmented reality.

 

A. So, what is  reality augmentation?

To know how to augment reality we need first to understand what is reality. Is it some absolute truth or it is  the result of  a set of cognitive processes that involve learnt concepts, mental models and perception mechanisms?

As human beings, we can only verify (and accept as true) what we see, touch, hear, smell, or taste, and compare with memories of previous experiences or with acquired concepts.

  We can say that it is the combination of what is acquired through the senses, its processing and matching with pre-learnt models that results in the perception of reality.

  So, reality is not solely what can be perceived by our senses, but it goes beyond that, just as do our perception mechanisms. In fact, perception involves acquired models and concepts, that may completely change the interpretation of any sensed (acquired) information.

  An example of on how knowledge may affect reality perception, can be the one where an adult and a child walk on a field and encounter a strawberry poison-dart frog (Oophaga pumilio). The child will probably become excited with the beauty of the frog and will want to try to catch it, while the adult will be terrified and will stop the child from doing the probably mortal move. Here the two persons will have completely different notions of reality for exactly the same situation.

 

B. Cognition and perception of the reality

Being our senses and cognitive processes limited both in acquisition and processing capabilities, we have developed impressive capabilities of inference, recognition and reasoning, even in face of incomplete data. 

This is probably the result of our evolution in terms of anticipating dangers or survival advantages. This capability of using partial data has made possible the development of our visual system which is based on 2D projections of the 3D world, and, from these 2D representations, infer about the 3D structures and deal with them. On another side, the 2D nature of this perceptual system leads to the appearance of illusions, that are just the result of some model fitting process upon incomplete or ambiguous data.

Although the two-eye configuration has an important role in the perception of 3D structures, the great capability of our brain of integrating sensory information along time, enables us to use self-motion to get more information about the neighbouring 3D structures, in particular when the stereo-based vision is not enough for that purpose.

These movements, which are frequently done in an automatic and unconscious way, have the purpose if removing ambiguities or breaking misinterpretations. In other words, this is the way we check how \emph{realistic} is what we perceive.

This can be seen as a geometry-related consistency verification, where we move to check if the 2D structure we are perceiving respects some 3D mental model, that was selected as hypothesis. 

 

C. Augmenting (the perceived) reality

To produce augmented reality, it is necessary to generate the necessary sensory stimuli, through the use of some mediating technology, that enables the perception of virtual elements perfectly integrated with the real (physical) ones.

Being our perception mostly devoted to extract geometric properties, it is fundamental that the integration of   the virtual models and the "real" scene exhibit   spatial coherence. Let us take as example a virtual object in a real scene composed of other "real" objects on top of an also real table. 

For the augmented scene to be credible (or realistic), the virtual element must always appear in the same relative position and pose with respect to the physical ones. Or, as an observer moves towards, away or around the table, the view of that element must suffer exactly the same perspective and rigid transformations that are applied to the rest of them. 

Through this consistency check enables us to perceive that virtual element as being part of the scene, and therefore in our vicinity, so that we develop the feeling that we could touch it. 

We can say that when we achieve a sense of tangibility or sense of presence, as defined by Sheridan~\cite{sheridan1992musings}, we tend to accept the scenario as real, but for that it must pass all the voluntary and involuntary consistency checks we perform.

 

D. HMD versus hand held visualisation

Although there may seem to be a place for discussion if the right way to produce AR is using either HMDs or handheld devices like tablets, smartphones, or other, as we will see, both of them have advantages and disadvantages.

In fact, an HMD with one or a pair of coupled cameras, seems to be the right choice for creating AR experiences. This can indeed be true as when the user looks in any direction he will see the augmented scenario.

On the other side, a handheld device can also be seen as an instrument that enables us to see through it, and obtain different and augmented views of the surrounding environment, in a similar way as a the use of a portable magnifier does.

There is no distinction between them in terms of involved principles. In both cases the device enable the exploration of the surrounding 

environment and see it with added contents.  The difficulties are also similar in terms of estimating the visualization pose in the environment in a perfectly stable way, on one side. On another side, the extraction of the 3D dimensional structure of the environment would enable the correct management of the occluding interactions between real and virtual elements, but this is still a hard task given the computational difficulties it imposes. As a result, both cases can work well in simplified utilisation scenarios like planar surfaces containing detectable markers, or in complex ones for which a priori models or the environment exist and precise localisation technologies are in use, like magnetic field-based trackers. 

  The differences between the two systems are on the application scenarios and therefore not on the involved processing or algorithms. We can say that AR on HMDs is adequate  for tasks that require the use of both hands and/or require the visualization from a user-centred perspective. The use of handheld can be more favourable in cases where its use happens during short periods of time, so that the AR visualization tool can be picked up, used to examine the object or scene for a time less than a few minutes, and then released.  

  One should note that although AR can make use of different kinds of visual markers to detect and select the information to visualize, if the visualised object does not appear perfectly integrated in the environment, we cannot say that we are using AR. In such a case it is just a QR-code (or other) reader application that fetches and displays the related information. We can say that in many cases we don't really need AR, or even worse the use of it makes the task more difficult, than just a simple code reader that selects the appropriate information to show. And the reason is that it is more practical to scan the code and look at the device by looking down at the normal handling position, than to keep it up at in front of some marker for reading the same information, in most of the situations. Other situations where the device can be used like a hand magnifier and interactively visualize information about objects, devices, or places just by passing the handheld in front of them can be very useful.

 

 

Figure 3. A cube whose faces are markers for visual pose estimation that contains a wireless processing board with IMU and pressure sensors.

V. Conclusion

The concepts presented in this article are available in a U-Academy module about the subject "Augmented Reality: From basic principles to connected subjects". The module is sill under construction but it will integrate images, videos, pointers to demonstrators and example codes pieces to enable the students to learn about the principles that support the construction of AR applications.