Tuesday 18th May 2021

Unity aims to cut AI training time and budgets with synthetic datasets

Published on April 19th, 2021

The new Unity Computer Vision Datasets aims to reduce the cost of developing computer vision applications, and more quickly train AI for the Manufacturing, Retail, and Security industries. It has been launched by Unity, a platform for creating and operating real-time 3D (RT3D) content.

Computer Vision solutions providers can now purchase bespoke datasets for their artificial intelligence (AI) training needs, while maintaining strict privacy and regulatory standards.

Synthetic data is said to be important because it is generated to meet specific needs or conditions that are not available in existing (real) data. This can be useful in numerous cases such as when privacy requirements limit data availability, or dictate how it can be used. One common use for synthetic data is for testing a pre-released product, when data either does not exist or is not available to the testers.

Synthetic training data is also a major requirement for machine learning algorithms. However, especially in the case of self-driving cars, such data is expensive to generate in real life. With today’s launch of Unity Computer Vision Datasets, cost is no longer a barrier in obtaining high quality synthetic datasets that can accelerate AI and machine learning training.

“By creating a synthetic version of datasets that mirror validated privacy rules and accurately reflect real-world data, we enable these groundbreaking datasets to get into the hands of more innovators,” says Dr. Danny Lange, SVP of artificial intelligence & machine learning, Unity.

“Essentially, these datasets empower companies to plan for and simulate scenarios they haven’t yet experienced, with a sizable increase in user data that mimics what they’d find over time in the real world. As a result, we’re seeing smarter indoor environments, such as cashierless grocery stores, and more as our customers discover new applications.”

Unity’s Computer Vision Datasets make use of a technique known as “domain randomisation” to create diverse datasets that improve quality and control bias in applications. The process outputs permutations of how objects of interest are positioned and orientated, including variances on lighting and camera angles as well as the countless configurations to the Unity environment that are possible.

Unity’s synthetic datasets also avoid the privacy pitfalls and uncontrolled biases that arise from processes that often include images of real people and places scraped off the Internet or manually captured from the real-world using labour-intensive operations.

With real-world data, the price of annotation increases with the complexity of the annotation type. Unity offers one low price for any label type ensuring that customers pay the same price for simple and complex industry standard label types such as 2D and 3D bounding boxes, class segmentation or instance segmentation.

Datasets for purchase are available in a tiered pricing model that sees the price per image decrease proportionality to the increased need for more synthetic images.

“Synthetic data is revolutionising the training of machine learning models as it overcomes many of the shortcomings of manually collected and labelled real-world data,” adds Lange. “Exploring what’s possible, and connecting creators with the affordable data they need to make the right decisions continues to drive Unity, no matter the industry. This is why our team will be available to assist customers in ensuring that the datasets produced meet the right criteria for their needs.”

Comment on this article below or via Twitter: @IoTNow_OR @jcIoTnow