How do we learn a new language?  We always complain that children learn faster than adults only because young minds have a more dynamic structure. However, another very important factor that we need to take into consideration is that children have environmental advantages when learning languages that most adults don’t have.

In this project, I created VR scenarios for language learning. The intention behind is that language learning is difficult, especially for adults. However, VR is an ideal environment because it provide immersive environment just like how child be in.


Adults learn English much slower than children.

One of the most important reasons is that they are not able to learn in an immersive lingual environment. Besides, very young children are not instructed in language as efficient as adults are and thus adults can learn much faster through immersion than a child.


Our project simulate an immersive language learning environment for Non-English speaking users.

  • Stimuli

  • Interaction with avatar

  • Body Movement and Sentimental Audio




1 Prototype Designer

1 Developer

Time Frame

2 Weeks

My Role

Asset Creation + Animation

UI/UX + 3D Prototyping




  • Media Screen

  • Prior Work


  • Storyboard


  • Asset Creation

  • Mock up in Unity


Media Screen

They have environmental advantages that allow them to learn through natural, intuitive and inclusive environment responsive.

In order to to validate our design and understand the difference of learning behaviors between children and adults, we browsed some current studies online. Here are some highlights that help me understand the problem:

  • Children have environmental advantages when learning language that most adults don’t have. They learn by being immersed in multilingual environments.

  • Children learn through their responsive environment, i.e passively “absorbing” the language through context, rather than verb conjugation and exams.

  • Children have fewer inhibitions. It’s much easier to learn a language if you’re comfortable making mistakes and sounding foolish, a hurdle that makes most adults extremely anxious.

Prior Work

Useful interaction in VR: voice recognition, sound effect, expression, gesture and contextual cues

We also looked at what other VR products have done in the realm of language learning. Specifically we looked at Mondly VR and ImmerseMe. By trying these products as a first time user, I was able to have a better sense of what interaction is helpful and to what extent is possible in VR, like voice recognition is possible and plausible to use, also, sound effect, expression and gesture are very important indicators that help user understand and get trustworthy feedback in a totally unfamiliar language.

  • Voice recognition is possible to support pronunciation correctness. It is plausible to use the built-in mic in VR headsets or connect it to your phone if not applicable.

  • Feedback from the environment, such as sound effect, expression and gesture are very important indicators that help user understand and trust a totally unfamiliar language.

Mondly VR

Mondly VR






Considered the technical constraints and language learning experience from friends, we adopted contextual dialogue, tooltip and flash card.

Based on those insights, we then began to sketch out how those methods might be displayed in VR and when they might be delivered. The scenario starts from the player opens his eyes and finds he is in the cradle. Then he realizes he embodies in an infant body and a girl is standing in front of the cradle trying to teach him the word bear. After finishing this task, the user is allowed to browse the room.

At the same time, we also had to take into account what interaction we were going to perform, and factor them into the story. All the interaction choices were inspired by the English learning experience from my friends, I asked them what do you think is the most effective method in their previous language learning. Considered the limited time we have and technical constraints, We finally decided to adopt contextual dialogue, tooltip and flash card.



Asset Creation

When it comes to creating a VR experience, the most important thing you need to impart to your player is a sense of being. They need to feel as if they really are on another world. Because of this, both character and scene design are vital to ensure this sense of immersion.

Scene Design

I made ample use of Unity Asset Store for basic geometry, and re-assigned the texture, material and scaling size to create the child room.

Screen Shot 2018-12-08 at 00.55.45.png

Character Design

In order to deliver a fast prototype, I grabbed the basic geometry from Sketchfab. Then clean it to make it ready for animation.


Body Movement

I used Mixamo to auto-rigged and animated the avatar.



Point the bear

Point the bear






Mock up in Unity

There are three interactions, each of them corresponds to one of the three main parts in English learning: listening and speaking, vocabulary and understanding. Below are our development details.

Contextual Dialogue by Voice Recognition

Task : Listening and Speaking

User learns what is the meaning of “bear” and how to pronounce it by interacting with avatar, especially through gesture, emotional feedback and the context.


Animation Controller & Code

We used “KeywordRecognizer” which is a built-in class in Unity that provide speech recognition from Microsoft.

Avatar’s animation controller/task flow

Avatar’s animation controller/task flow

Code for keyword recognition

Code for keyword recognition



The demo shows how the avatar taught the user what is a bear by using gesture, body movement, contextual cues and mood.


Vocabulary by Tooltips

In the “Vocabulary Mode”, we achieved this by labeling each object and enhancing the connection by similar color and line.



I used VRTK prefab to label the objects and assigned them with the color that is similar to the object so that it made a stronger connection. Besides, in order to increase readability, “card” should always face the headset.

Prototyping with VRTK prefab

Prototyping with VRTK prefab




Understanding by Flash Card

To help building connection between word and its true meaning, flash card is used here to display associated information including spelling, introduction, images or videos. I made the card very simple, highlighting the picture and the word to imply the connection between.



Once we had the prototypes down it was time to see how they performed with people. I asked four friends for feedback with focus on following four aspects:

Understandability: Can you understand what the avatar is conveying?
Interactibility: How much did you want to play with it?
Believability: Did you believe you are embodied in a foreign baby's body?
Comfortability: How comfortable do you find to speak and interact with the avatar?

And here are some highlights:

  • Contextual learning is told very helpful in learning work and dialogue

  • Animation of the avatar surprisingly attracts their attention most and almost everyone will do different gesture to try to trigger the avatar

  • It is more comfortable to speak in VR because “no one is judging you here”

  • They like the simplicity of the flash card and they thought it’s really easy to use.

But a lot of them felt like they don't feel like they are in a real word and thought they were talking to a fake person. I agree and I think it is mostly due to the style and level of simulation of the scenario and the limited interaction the user can play in this version.


This project is creative and full of surprises. It helped me to transfer my 3D modeling technique from architecture design to VR design. I also honed my skills in C# and prototyping skills in 3d engine.

Here are some takeaways:

  • 2D prototype is still the most efficient prototype way in the early stage

  • Empathetic design is essential in VR design so that it requires higher fidelity than generic UX design

  • Prototype fast, get feedback and iterate more