Developing robust audio-video communication for hundreds of simultaneous users navigating freely in the 3D environments of bublr.co metaverse.
Bublr users have free-roaming behavior. They can be talking to one single user or attending a meeting with many other users, and the transition from one situation to another is seamless. After studying the different communication architectures, we opted for an SFU architecture which allows more flexibility besides the higher stress on the download stream.
SFU architecture
In an SFU situation, each user sends a single signal to the server, no matter how many users are around them. Then, each user receives a single signal from every user around them. Based on the quality of the network, the user will receive a low or high-quality signal. The transcoding to different quality is done by the server.
We narrowed down the most common situations for Bublr future users:
We ended up working with the young Indian startup 100ms.live, being one of their early customers and helping them out along the way to fine-tune their API and overall solution.
Their SFU technology is based on Pion-Ion, a "Real-Distributed RTC System by pure Go and Flutter, " which is a bit less computing-intensive and cheaper on the market.
100ms.live was not the faster provider on the market. We can find Agora, Twilio, or Jitsi, but they were close enough at a cheaper price. Also being able to communicate directly with the team was a real plus for us.
Stress testing
To be able to compete in this challenging market, we also needed to offer the capability to have a lot of users on our platform. The first tests were giving a max of 20 users, which was not sufficient. Commercially we needed at least 50, and ideally 100.
We worked on 3 axes:
- optimizing the video signal
- adjusting the maximum distance of audio and video
- optimizing the 3D renderings

Optimizing the video signal
After analysis, we realized that we are sending a lot of useless image-based data for nothing. We send large backgrounds, high fps, and high resolution.
We ended up cutting bandwidth by 65% with equal results.
Adjusting the maximum distance of audio and video

We adjusted the maximum distance where you stop seeing and hearing the other users, and where we display their avatar picture instead. This is done either manually on specific scenes, or on the fly automatically.
This saved a significant amount of bandwidth, and also calculation for the 3D scene. 
All along the development, we put in place real-situations testing, both with real-users located on all sides of the planet, and, when we could gather enough people, testing with bots.
We reached decent performance until 80 simultaneous users.
Back to Top