Unveiling the Story Behind Ray-Ban Meta

An intimate conversation with Meta's smart glasses visionary

and

Dec 09, 2024

On November 26th, we hosted our first meet-up, “Future of AI Robotics and Wearable Devices” in the heart of Menlo Park. During the event, Thomas Luo, Founder and CEO of GenAI Assembling, engaged in an insightful conversation with Kenan Jia, Device PM lead for Meta’s AI/AR smart glasses team.

Kenan, the visionary behind the widely acclaimed Ray-Ban Meta smart glasses, is currently at the helm of Meta’s efforts to develop its next-generation smart glasses.

In a session titled “Creating a New AI Hardware: The Art of Trade-offs,” the discussion explored the untold story of how Ray-Ban Meta was brought to life, the complex trade-offs involved in designing AI hardware, and forward-looking perspectives on the future of this transformative technology.

Below is the full transcript of their conversation:

Ray-Ban Meta: From Smart Glasses to Everyday Companion

Thomas Luo: I want to raise my first question from the user's perspective - how often do you wear these Meta glasses, and what are your most frequent use cases?

Kenan Jia: Hi everyone, I'm Kenan, the device PM on the Meta glasses team. I'm personally working on the next generation of our smart glasses, but happy to share more about Ray-Ban Meta today.

The top use cases for me are hands-free capture for photos and videos, especially when I'm traveling and events. I always bring my Ray-Ban Meta nowadays. For example, I went to Disneyland in LA and rode the Incredicoaster. You can't take your phone on rides as it's dangerous, but I had my Ray-Ban Meta sunglasses on and captured the entire ride - it was amazing.

I went to two concerts this year and didn't need to hold up my phone to record. I could actually enjoy the concert while recording. I use them often for events, weekends, and traveling with friends. In Hawaii, when riding ATVs, this allowed me to enjoy music and capture moments while staying present, which is one of the top goals for our Ray-Ban Meta products.

And I'm starting to use the AI features more frequently now, especially when traveling. In museums, instead of reading entire catalogs, I can ask Meta AI specific questions. When I was recently in Europe, it could quickly translate menus and posters. It was super convenient and easy to use.

Thomas Luo: Let me share an example of how I used the AI features of the Meta Ray-Ban. Last month, I was at the Museum of Flight in Seattle. When I was standing in front of a MiG-3, I just said "Hey Meta," and it took a photo. Then I asked it to tell me the background story of this aircraft. Within seconds, the voice came from the frame and told me that it was a MiG-3, made in 1940 and was rapidly adopted by the Soviet Union in the war with Germany. It explained the development history of this aircraft. I'm particularly interested in how AI combines with the wear experience. Could you explain more details about how this technology works on my glasses? We understand that some AI features may run in the cloud or use edge AI - could you explain more about these details?

Kenan Jia: Of course. The architecture is actually more interesting than many people think. There are three parts to how it works. First is the glasses itself. Then it’s the Meta View Companion App on your phone. Finally, it’s the server side.

When you say "Hey Meta," the wake word model runs on device on your glasses, and then AI starts listening and uses speech recognition to understand your query.

If you're just saying device control queries like "take a photo," it completes directly on the glasses. But for more complex questions, like multimodal queries where you take a photo and ask about an aircraft, it will call the Llama model on the server side. This process starts with Bluetooth connecting to the phone's companion app - the Meta View app - which then sends the query to the cloud server using either Wi-Fi or cellular connectivity.

On the server side, it's a complex flow involving knowledge grounding, getting the actual answer, privacy and security filters, and then sending the response back through the Meta View app before audio streaming to the glasses.

There's an interesting debate in this field about whether processing should be on-device or in the cloud. There are many trade-offs, especially for the glasses form factor. The device is very small and people want it even lighter. It's power-constrained, and if you hit thermal limits, it can impact functionality. The beauty of this three-part system running complex tasks on the server side is the very low power consumption - you're just using Bluetooth to connect to the phone, with computation happening server-side. The benefit of server-side processing is getting the best quality answers from larger models.

The challenge is optimizing latency. We don't want users to wait for a long time for the response. Right now, we're optimizing response latency for both voice and multi-modal queries.

Some features, like device control and live speech translation, run on device for speed and reliability. But for more complex multimodal queries, it's server-side. Our team has to optimize across hardware, systems, power and thermal management, and AI to ensure quality, latency, and reliability all work well when you ask a simple question.

Thomas Luo: That's great. Let's go deeper into AI plus wearables. You mentioned your most frequent use cases are taking photos while traveling and recording concerts without holding a phone. Most of these scenarios, as well as the translation mode, don't require extensive AI or large language model-based reasoning and inference, right? My curiosity is - I remember the Meta Ray-Ban was launched last October, correct?

Kenan Jia: The second generation was last October. The first generation Ray-Ban Stories was in 2021.

Thomas Luo: And the new AI features were added this April. From your experience or user surveys, have the usage scenarios or use cases changed during the past year?

Kenan Jia: This is a great question. First, there's been strong adoption and growth for AI, especially after we rolled out the feature. What's interesting about Ray-Ban Meta is that when Gen 1 launched in 2021 as Stories, our positioning and goal was really as camera + audio glasses - for hands-free capture, music, and phone calls. People loved it.

What's unique about this product is that beyond just the large language model and AI use cases, people think they're really cool glasses they love to wear. They love the design and hands-free capture. Surprisingly, audio is one of the highest adoption and retention use cases because people realize they don't need to constantly put on and take off their AirPods or other Bluetooth headphones.

The AI features rolled out in April, and capabilities will continue to improve as we have newer models optimized for glasses. All of these features come together as a package because this is a standalone product. Unlike a phone where you might use one AI feature in an app, this is something you're carrying every day along with your phone and keys. The mental load is significant, so you either have to really love the product or it needs to offer substantial value.

AI use cases are starting to show real value. Some are traditional voice assistant features like weather, setting timers, or hands-free photo/video capture. We're seeing interesting emerging trends in translation and multimodal use cases like plant identification or quick queries on the go. There's huge potential here as models improve and we optimize for glasses, because it's always on your face - it sees what you see and hears what you hear, with audio playing directly to your ear.

It's not going to be as suitable for productivity use cases where you need long-form responses like on ChatGPT or Llama models - that's better suited for laptops. But it's very effective when you're on the go, whether at a museum or landmark, for quick queries with direct responses.

As we announced at Meta Connect this September, we will roll out real-time AI features, which I think will be one of the highest potential categories.

This needs to be viewed holistically with all the other features people already love. Compared to AI-only devices where the AI use cases aren't strong enough, especially in niche markets, those often end up as gadgets that people try and then put in a drawer. That's why Ray-Ban Meta is doing pretty well.

The Art of Trade-offs in Smart Glasses Technology

Thomas Luo: One thing that raises my curiosity - I heard integrating Llama into the glasses was particularly challenging during the development process. Would you mind sharing what kind of difficulties or challenges the team met? Since you had to work with both research and development teams to overcome these challenges, especially regarding putting a large language model into this device for multimodality features, what challenges did your team overcome?

Kenan Jia: We have a very strong AI team, both on the model / research side, and for integrating it into smart glasses. Going back to the architecture of glasses, phone, and server side with direct speech response, there were several challenges. For example, how do you optimize responses for glasses and audio feedback? With ChatGPT, you usually get very long responses. If you don't modify the LLM response for glasses, having a two or three-minute text-to-speech response would be really impractical. A lot of work went into optimizing and providing the most relevant responses on the spot.

Then there are latency challenges, which we discussed. You have multiple legs - processing on the frame, transmission, and backend connectivity. Our teams broke this down step by step to improve overall latency and reliability. If your Bluetooth connectivity drops, it stops, and as a user, you wouldn't know what happened - you'd just think Meta AI doesn't work.

Even after launch, we're working on improvements to take AI from good to great. For instance, the natural interaction - currently, you have to say "Hey Meta, look and tell me this." We want it to be more natural, like simply asking "What is this plant?" We've invested heavily in natural interaction so people can talk directly to the AI without specific trigger words.

We've also worked on multi-turn conversations. Earlier, you had to say the wake word multiple times. Now, when you're at a museum, for example, the LED indicates it's listening for a few more seconds for follow-up questions before responding.

The key areas we focused on were making AI reliable, fast, and high-quality on glasses. Post-launch, we've invested in developing more valuable use cases, making interactions more natural, and expanding language availability - internationalization is also very challenging.

Thomas Luo: Let's go deeper into discussing these priorities of latency, reliability, and response quality. I think these trade-offs and balances are crucial because I truly believe glasses will be one of the most promising options for the next generation of AI devices. My reasoning is that the human head contains our most important sensors - our eyes, ears, mouth, and more. Essentially, the human head is like a carbon-based multimodality system that can naturally interact with silicon-based multimodality AI through glasses. Glasses have a 700-year history, and people are comfortable wearing them - it doesn't look weird. So, for these trade-offs and balances, could you give us more detail about the priorities? What's the most important thing when producing AI-powered glasses? If something has to be given up, what's the baseline we need to maintain to make this happen?

Kenan Jia: I wish I had a one-liner answer for these trade-offs to make a great pair of glasses. The reality is challenging because when you ask people about Ray-Ban Meta or AR glasses, they always want them lighter and smaller while simultaneously wanting better performance and battery life, plus better image quality. These goals are at odds - you can't have both at the same time.

I can give you an example that shows our trade-off thinking. For Ray-Ban Meta's second generation, we greatly improved image quality and FOV by moving from a 5-megapixel to a 12-megapixel camera. However, we went from two cameras in the first generation to just one ultra-wide camera on the left side in the second generation. Initially, with two cameras, we thought about depth sensing and creative photo formats, but these wouldn’t be heavily used. For gen two, we knew people primarily wanted better quality photos for social sharing.

Having two cameras allows different lenses, zoom capabilities, and broader field of view, but in such a small device, where the battery sits in the temple arm, another camera would reduce mechanical space for the battery. It would also impact power consumption when processing video feeds from two cameras. Cost is another important factor.

We evaluated how to improve image quality within these constraints and landed on a single 12-megapixel camera with improved FOV. This decision met our quality requirements for social sharing while freeing up about 10% more space for battery life. It also improved power and thermal performance. Users have responded positively to the improved image quality.

This example shows how we evaluate specific options across hardware, user experience, system requirements, and mechanical constraints. If there's a bottom line, it's balancing performance with size, weight, fitment for human factors, and power/thermal considerations for a face-worn device.

Thomas Luo: Is AI not the most critical factor?

Kenan Jia: The previous example was specifically about camera decisions. When we discussed on-device versus server-side processing earlier, there are different trade-offs around latency for different use cases.

Thomas Luo: How does your team work on these trade-offs? Trade-offs can't be made by a single team like the product team - they require collaboration and discussion with other teams. Could you share a scenario or story about how multiple teams work together and debate to make these trade-offs?

Kenan Jia: Let's use the camera architecture example. As the product team, your role isn't just to make the decision - it's to frame the challenge and problem to solve. In this case, we wanted to improve camera and image quality while balancing considerations around mechanical battery space, power, thermals, and costs.

You work with different teams to evaluate various options against these different considerations. You end up with almost like a heat map table showing what each option is good at. No option is great at everything. But you might see that a single camera is great at improving quality to "better" (though maybe not "best") while excelling in other considerations.

The teams work together to agree on this evaluation. For decision-making, we need logic about what's most important. In this case, we determined that maintaining battery life while staying within power and thermal constraints was more critical than maximizing image quality. Some teams might prefer a different decision, but they shouldn't disagree on the evaluation or the final decision once they understand the logic.

Thomas Luo: Any challenges working with supply chains or manufacturers?

Kenan Jia: You need to hit performance targets at certain yield rates, cost targets, and launch volumes. It's not just about setting specs and moving forward - there are many manufacturing challenges around reliability and quality, especially with new custom modules. We're learning and collaborating with partners worldwide to build understanding and improve processes.

For glasses specifically, the go-to-market side is also challenging. Traditional glasses are sold through optometrists and LensCrafters, while consumer electronics users go through Best Buy and Amazon. We're working with channel partners, both traditional consumer electronics retailers and glasses channels. We partner with EssilorLuxottica, who owns Ray-Ban and many retail channels. It's a huge learning process because we're combining style with technology.

Breaking Through in AI Hardware: Lessons, Competition, and Future Scenarios

Thomas Luo: Let's talk about Meta's previous VR device experience. Are there lessons learned from VR development that can be applied to AR glasses?

Kenan Jia: There are common learnings but also very different device constraints. Both are complex systems with display, audio, and mechanical modules, but VR headsets are bigger and primarily used at home, while glasses need to be light and wearable on the go. Both aim for miniaturization - we're trying to make Quest smaller and cheaper while maintaining performance, similar to AI glasses and future AR glasses.

The display modules differ between AR and VR, though both need to handle 2D and 3D objects and consider interaction models like hand tracking, eye tracking, or EMG (electromyography) rather than mouse or touchscreen input. While we share learnings across teams in areas like display, optics, manufacturing, and go-to-market, the specific use cases and trade-offs remain quite different and aren't likely to converge in the near term.

Thomas Luo: Any lessons learned from smartphone players, or is that not worth learning from?

Kenan Jia: There are interesting learnings from phones and other consumer electronics. I previously worked on smart displays like Portal.

Looking at other form factors like smartwatches or smart displays, we've learned that standalone devices need to provide significant value beyond what phones can do. People can already do so much with their phones, so you either need to target a niche market deeply (like creators) or provide compelling general market value through combined features like AI and other use cases.

Thomas Luo: You mentioned it's hard for single-feature AI devices to get people to pick them up again. Can we talk more about other AI wearables or portable devices? We see things like AI necklaces for video recording, Rabbit R1, and Plaud for meeting summaries - each solving a single problem perfectly. How do you view these devices, given that Ray-Ban Meta isn't an AI-only or single-feature device?

Kenan Jia: I think the problem isn't just about being single-feature. The challenge with many AI devices is finding that sharp use case with enough value to make people remember to pick it up. We see many concepts with good marketing, but the question remains: why pick up that device over a phone that can run any model and is optimized for multimodal use?

I've recently bought other non-AI single-purpose devices that I love and use frequently. For example, reMarkable, a Norwegian E-ink note device. It's expensive but has sold millions because it perfectly serves people who prefer digital minimalism over tablet note-taking. Similarly, I have a Freewrite E-ink typewriter. These devices do one thing exceptionally well and are truly optimized for specific users and scenarios.

That's the dilemma for many AI-forward devices - they're neither single-feature focused enough nor multi-purpose enough. They're in an awkward in-between situation where people question why they need another device. It's an interesting stage where people are trying different form factors and use cases. We'll see what works. I believe glasses are one of the most promising form factors.

Thomas Luo: Let's talk about the future of AI/AR-enabled glasses. Do you know how many such devices are being manufactured in China now?

Kenan Jia: Definitely more than 20.

Thomas Luo: How do you view this heated-up market category? What's the key competence needed? Like Tesla was a breakthrough in autonomous driving and EVs, but China has 5-6 competitive brands performing well. How do you view this competition?

Kenan Jia: We're at an interesting stage where the category is exploding with interest from consumers, brands, and manufacturers. It's positive - different players will try various form factors, designs, and use cases. However, regardless of company size or location, the challenge is that this is an integrated device requiring optimization across hardware, software, and go-to-market.

If you're software-only, it's hard to optimize for the new form factor since we haven't stabilized the glasses architecture. If you're hardware-only without owning the model side, it's difficult to optimize for the challenges we discussed. That's why you see big companies entering who can optimize all these aspects, including unique brands and channels for sales, returns, and retail demos.

Thomas Luo: It's the combination of hardware, software, and go-to-market. What's Meta's advantage? The hardware part is competitive, and with Llama, the software side is similar. How do you evaluate Meta's hardware DNA, since Meta hasn't been well-known for hardware?

Kenan Jia: The partnership with EssilorLuxottica really helps in this category. While we are a software company with leading models, Reality Labs has been making hardware for 10 years - I've been there over 6 years. So Meta now has a lot of experience in consumer electronics.

What works well with EssilorLuxottica and Ray-Ban is their expertise in industrial design, making fashionable products people love. We've learned a lot from them about channels - while we understand CE channels from Meta Quest VR, but glasses are different as they're medical devices. We've learned about lenses, transitions, clear coating, purchase flows, branding, and design. We're reinventing ourselves, especially with glasses, while doing well in VR at the same time.

Thomas Luo: Looking to the future, what are your most interesting imaginations about AI hardware products, beyond Ray-Ban Meta?

Kenan Jia: I'm excited to see different form factors in various scenarios - education, museums, interactive learning and creation through robotics or toys. While general-purpose devices are challenging, people will try different approaches, both personalized products and ambient solutions. For example, how do you create an interactive museum tour guide that also understands personal context? I'm excited to see different form factors landing in various usage scenarios like education, entertainment, and travel. These innovations will fundamentally change how we live and work, and I’m very excited to be part of the future.

Unveiling the Story Behind Ray-Ban Meta

An intimate conversation with Meta's smart glasses visionary

Ray-Ban Meta: From Smart Glasses to Everyday Companion

The Art of Trade-offs in Smart Glasses Technology

Breaking Through in AI Hardware: Lessons, Competition, and Future Scenarios

Discussion about this post