DiffuseDrive builds photorealistic imagery such as this from real-world data sets.

DiffuseDrive builds photorealistic imagery such as this from real-world data sets. Source: DiffuseDrive

Robots and artificial intelligence need copious amounts of data to train on, and if that data is synthetic, it needs to be as realistic as possible. Capturing real-world data can be expensive and time-consuming, while simulation-based data typically came from game engines and led to sim-to-real gaps. DiffuseDrive Inc. claimed that its generative AI platform evaluates existing data, identifies what is missing, and uses proprietary diffusion models to create photorealistic data.

Balint Pasztor, an engineer, and Roland Pinter, a physicist, founded DiffuseDrive in 2023 after meeting at Bosch. They then relocated the company from Hungary to San Francisco.

“We previously worked on Level 4 autonomous driving for Porsche,” Pasztor told The Robot Report. “Data scarcity is the missing piece to solving the puzzle of physical AI, which spans manufacturing, monitoring, agriculture, and aerospace.”

DiffuseDrive founders Roland Pinter (left) and Balint Pasztor (right).

DiffuseDrive co-founders: CTO Roland Pinter (left) and CEO Balint Pasztor (right).

AI needs data specific to the domain

“Industry has been using the same models since the early 2010s, and automakers and robotics developers don’t have enough realistic data covering their operational design domains,” said Pasztor, who is now CEO of DiffuseDrive.

“Synthetic data from simulations wasn’t realistic enough for safety or mission-critical functions,” he added. “We needed AI-generated data that was indistinguishable from real life.”

Even at this year’s IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), people in the space were scoring only 50%, he recalled. “They were just guessing,” Pasztor said.

Commercial robotics applications require high amounts of relevant data. Self-driving vehicles and item recognition for e-commerce picking have known and growing data sets, but automation can flexibly serve many more applications — if it is properly trained.

DiffuseDrive identifies, understands gaps to fill

DiffuseDrive can bridge the simulation-to-reality gap by generating suggestions based on business logic, explained Pasztor. This allows it to create relevant data sets in days rather than months or years, he asserted.

“Engines like GPT or Dali can generate models, but you need a quality assurance [QA] layer like DiffuseDrive,” he said. “The QA layer is built on the application or use case from aerospace, etc., and the reasoning model understands what has already been presented.”

DiffuseDrive uses both classical and new methods of statistical analysis to contextually understand existing data and build out data points, similar to a point cloud, Pasztor said.

“We use a separate system to understand what clients already have, essentially building a decision tree,” he said. “For example, for Level 2 autonomous driving, we built a heat map of parking scenarios and object location distribution. DiffuseDrive then identified that it was missing large and close items at certain times. By getting to a wider distribution of data, we improved performance by 40%.”

Customers control the ODD data

At the same time, DiffuseDrive does not develop domain expertise. Instead, the company digests its customers’ documentation and real-world operational design domain (ODD) data.

“They’re the domain experts and are in control of in terms of generating their requirements,” said Pasztor. “They don’t want anyone to take over their jobs but want us to augment them.”

Once it has the basic data, DiffuseDrive uses semantic segmentation, contextual and visual labeling, as well as 2D and 3D bounding boxes. “Every time they generate images, the data-point map fills up, not just filling gaps but also expanding ODD knowledge,” Pasztor said.

Graphic explaining that customers control their data for faster time to market, says DiffuseDrive.

Customers control their domain data, which is then rapidly analyzed for gaps. Source: DiffuseDrive.

DiffuseDrive sees market opportunities

The global market for AI in robotics could experience a compound annual growth rate of 38.5%, expanding from $12.77 billion in 2023 to $124.77 billion by 2030, according to Grand View Research.

“Our vision is to eventually have every autonomous system use DiffuseDrive data — it could be an enterprise or an individual’s project,” said Pasztor. “We decided to build on our experience with cars and drones, since autonomous vehicles still need a lot of data, and most companies don’t have the scale of Tesla.”

DiffuseDrive is onboarding its third wave of customers, following drone pilots and then autonomous driving and security monitoring. They include AISIN, Continental, and Denso. The company said it also sees potential in defense, warehousing, construction, and agriculture.

“At CVPR, we spoke with 50 potential customers from the Fortune 500, several of which are producing not only autonomous systems but also stationary ones like industrial robots,” Pasztor said. “Healthcare people were also interested in closing the data loop.”

In May, DiffuseDrive raised $3.5 million in seed funding, adding to $1 million it previously received from E2VC. It also appointed Jordan Kretchmer, a senior partner at Outlander VC and co-founder of Rapid Robotics Inc., to its board.

“Jordan has experience in robotics investment, and our thesis is to be industry-agnostic, from manufacturing applications like QA all the way to household picking robots,” Pasztor said. “Realistic imagery should spread quickly between different verticals, as we’re learning from everyone. The differentiator is not the synthetic data anymore; its creating the data engine.”

As my co-founder says, ‘Software is developed iteratively, so why isn’t data,” he concluded.



The post DiffuseDrive addresses data scarcity for robot and AI training appeared first on The Robot Report.

By

Leave a Reply

Your email address will not be published. Required fields are marked *