Mapping out the future of VR and AR

Mobile World Congress
Shanghai, China
30 June – 1 July, 2017

The Yu Garden with the Shanghai Tower behind

The Yu Garden (豫园) with the Shanghai Tower (上海中心大厦) behind. Photo by Mark Pegrum, 2017. May be reused under CC BY 4.0 licence.

After flying up from Guilin on 29 June, I managed to catch the last two days of the Mobile World Congress in Shanghai. An enormous event that brought together technologists, marketers and investors, and showcased new technologies from phones to drones and robots to cars, it also hosted a series of summits on specific themes. I spent Friday 30 June at the VR and AR Summit, where industry speakers offered their perspectives on the latest developments and the current challenges facing VR and AR.

In his presentation, What is the future of VR & AR?, Christopher Tam (from Leap Motion) argued that there are 5 key elements of VR and AR, namely immersion, imagination, availability, portability and interaction. Before the advent of VR/AR, it was as if our computing platforms only allowed us to peek at the possibilities through a tiny keyhole, but now we can open the door into a utopian world, he said.

Immersion needs high quality graphics and rapid refresh rates; imagination needs good content; but interaction is hard to measure. One way of measuring interaction is by considering human-machine interaction bandwidth. This is a fundamental factor to unlock the mainstream adoption of VR/AR and, while a lot of progress has been made on the other elements, this remains a bottleneck which the industry is currently focused on addressing. The leap from 1D to 2D computing required the invention of the mouse to accompany the keyboard. A mouse works for 2D because it allows one-to-one mapping; however, it is not sufficient in a 3D world, because in such a world we need to do more than moving, selecting, pointing or clicking. Interaction in a 3D world should be inspired by the way we interact with the real world; we should use the model of ‘bare hands’ interaction, given that this is our primary way of interacting with the real world. It is natural, universal, unencumbered, and accessible. In education, children can study in a hands-on style, with more fun and better retention; this is how children learn in the real world. In training, people can practise how to handle complex situations in hands-on ways. In commerce, consumers can enjoy the digital world and be impressed at the first try. In healthcare, we can enable diagnosis, physical therapies and rehabilitation; this moves the barrier between healthcare givers and their patients. In art and design, we can express ourselves by creating in a 3D manner with no restraints. In social relations, we can hang out and interact with friends. In entertainment, there will be easier, more intuitive controlling, and deeper immersion; users can become the protagonists in the stories we are telling, not just operating a person but becoming that person. Thus, hand tracking brings to life the advantages of VR/AR in almost all verticals. He concluded by demonstrating Leap Motion’s hand tracking technology.

In his presentation, The future of virtual reality in China, James Fong (from Jaunt China) suggested that VR is the next stage in a long human quest to experience and interact with captured and created realities; this stretches from cave art through painting, photography, gramophones, motion pictures, television and 3D films to AR and VR. He suggested that there is no need to separate VR and AR as they will merge soon. He briefly pointed out some questions of looming importance: we want Star Trek’s Holodeck or the Matrix experience, but we need to ask how this affects our humanity. Will we become isolated from each other? Will we appreciate human connections? Will we not want to leave the perfect VR/AR world?

In VR/AR storytelling, we can be part of a scripted narrative or take our own pathway through a free-form construct; engage in first-person participation or third-person observation; venture alone or interact with n-number of participants; and focus on private enjoyment or share experiences with family, friends and the world. It will however take a long time for high quality and compelling content to arrive, in part because VR will disrupt every element of content creation. We are used to third-person stories and it will take time to get used to first-person stories. We haven’t yet developed the creative language for working with VR. However, all of the major companies that run operating systems are moving to support VR natively, and this will usher in major developments.

He wrapped up by looking at the Chinese market, where there is no Google, Facebook, Amazon or Twitter, and where the market is dominated by local players like Baidu, WeChat, Weibo, iQiyi, Youku, Tencent, Alipay and WeChat Pay. Therefore a lot of international products don’t work in this country. Some challenges in China are the same as in the rest of the world (e.g., poor headset viewing experiences; market experimenting with live and 360) and some are different (VR experience centres/cafés in China keep interest high; content quality has not improved due to a lack of financing; and the camera and higher quality headset market is starting to pick up). He predicted that China could be the largest VR market in the world by 2018.

The slogan of the 2017 Mobile World Congress, Shanghai. Photo by Mark Pegrum, 2017. May be reused under CC BY 4.0 licence.

In a panel discussion moderated by Sam Rosen (ABI Research), with panel members Alvin Wang (Vive), James Fong (Jaunt China) and Christopher Tam (Leap Motion), it was suggested that 5G will make a big difference to VR/AR adoption because if processing is done online at high speed, we will be able to use much less bulky headsets with less drain on batteries. Alvin Wang mentioned that it will soon be possible to wear headsets that incorporate facial recognition and emotion recognition based on microgestures, allowing interviewers to sense whether an interviewee is nervous or lying, or teachers to sense whether a student understands. He claimed that one of the scarcest commodities in the world is good teachers, but AI technology can give everyone personalised access to the best teachers. He mentioned a project to put 360 cameras in MIT classes so that anyone in the world can join a class by high profile professors. James Fong talked about the power of VR to give people a sense of real-world events; he gave the example of being able to place viewers in the context of refugees arriving in another country, seeing the scale of the phenomenon, maybe being able to touch the boat the refugees arrived on, and thereby building more empathy than is possible with traditional news reports on TV.

In his presentation, The next big test for HMDs: Is the industry prepared?, Tim Droz (from SoftKinetic) said the aim of VR and AR is to take you somewhere other than your current location. There are two types of interaction which are theoretically possible in VR and AR environments; inbound interaction through sight, hearing, smell, taste, and haptics; and outbound interaction through the mind, gaze, facial expression, voice, touch, pushing, knocking, grabbing (etc), gesture, body expression, and locomotion. At the moment only a few of these are available, but as more are built into our equipment, it will become more bulky and unwieldy. However, for mass adoption, a lighter and more seamless experience is needed. He demonstrated some SoftKinetic hardware (like the time-of-flight sensor) and software (like human tracking and full body tracking software) which will make a contribution to interaction through hand movements. This greatly strengthens users’ sense of presence.

In his presentation, 360° and VR User Generated Content – Millions of 360° cameras and smartphones in 2017!, Patrice Roulet (from ImmerVision) suggested that it will soon become normal for everyday smartphones to be used to record and share 360 content, in such a way that it captures your entire environment and the entire moment. It will only take two clicks to share such content on social media. To capture this content, it’s necessary to have a very good lens (such as ImmerVision’s panomorph lens which provides a high quality image across the whole field of view, can be miniaturised for mobile devices, and allows multi-platform sharing and viewing), and advanced 360 image processing. The panomorph lens can be used for much more than capturing 360 images; the internet of things (IoT) is about to evolve from connected devices to smart devices, and this technology has the potential to play a role as part of artificial intelligence (AI) in the upcoming ‘Cambrian explosion’ of the IoT.

In his presentation, VR content: Where do we go next?, Andrew Douthwaite (from WEARVR) stated that one key question is what comes first: adoption of hardware or high quality content; it’s something of a chicken and egg situation. He showed an example of a rollercoaster VR experience on a headset linked to a desktop computer; he noted that many people initially experience some nausea due to the sensory conflict that arises from, for example, sitting still while immersed in a moving VR experience. The emergence of mobile VR is now bringing VR experiences to a much wider audience; Google Cardboard is currently the most widespread example. There is a lot of 360 content on YouTube, and games like Raw Data are helping to drive the industry forward. Google Earth VR is another great example and will help VR reach the mass market, and could impact travel and tourism. New software is now making it possible for users to create VR characters and then inhabit their bodies and act as those characters.

Important future developments are wireless and comfortable VR headsets and more natural input mechanisms, including hand presence. One problem is that much 360 video content is currently of low quality; there is no point in having high quality headsets unless there is also high quality content available. The future of content, he said, lies in storytelling and narrative-based content; social interaction; healthcare; property; training; education; tourism; therapy and mental health (e.g., mindfulness and meditation); serialised content; lifestyle and productivity (though this might be more AR); and WebVR (an open standard which is a kind of metaverse, allowing you to have VR experiences in your web browser).

In his presentation, VR marketing, Philip Pelucha (from 3D Redshift) suggested that the next generation of commerce will not be browser-based; he gave the example of a 360 video of a product leading to a pop-up store allowing customers to further engage with the product. Noting that we already have online universities, he asked how long before virtual reality universities appear. He mentioned that soon we won’t have to commute to work because our phones and laptops will turn the world into our virtual office. In fact, he said, this is already beginning to happen, and when today’s children grow up, they won’t understand why you would have to go to an office to work, or to a shop to buy something. He also spoke about one major area of current development as being language education; a VR/AR app for immersive learning, or to support you when travelling, could be extremely helpful.

In his presentation, Bring the immerse experience to entertainment, movie and live event, Francis Lam (from Isobar China) showcased innovative examples of 360 videos. He showed the B(V)RAIN headset that combines VR with neural sensors; as your emotions change, what you see changes. In effect, the hardware allows you to visualise your mental state, and this can have consequences such as the targets you face in a shooter game, or the taste combinations in drinks that are recommended to you.

He concluded with some issues for consideration. Bad VR, he pointed out, can make you feel sick, so it needs to be high quality and low latency. VR is not just about watching, but rather about experiencing; it is about how, from a first-person point of view, you can go into a scene and experience it. VR is not just visual; audio is important, but there can be other sensors and tactile feedback. We should also ask to what extent VR can be a shared experience, where someone wearing a headset can interact with others who are not. VR is good for communication, a point which is well understood by Facebook; for example, with VR you can make eye contact in a way that is not possible in video chat. VR can allow us to explore new possibilities, such as experimenting with genders. In fact, VR hasn’t arrived yet; there is much more development to happen. Finally, he stated, VR is really not content, it is a medium.

China Mobile slogan, 2017 Mobile World Congress, Shanghai

China Mobile display, 2017 Mobile World Congress, Shanghai. Photo by Mark Pegrum, 2017. May be reused under CC BY 4.0 licence.

There is no doubt that industry perspectives on new technologies differ in some ways from those usually heard at academic and educational conferences, but is important that there is an awareness, and an exchange, of differing views between technologists and educators. After all, we face many of the same challenges, and we stand to gain from collaboratively developing solutions that will work in the educational and other spheres.

e-language

Mark Pegrum's Conference Blog