Unlock the full potential of AI models with Databricks’ groundbreaking Test-time Adaptive Optimization (TAO) technique, which harnesses synthetic training data and reinforcement learning to boost performance without relying on clean, labeled data.
Boosting AI Model Performance with Synthetic Training Data
Databricks’ latest innovation allows customers to improve the performance of their AI models without relying on clean, labeled data.
The Problem with Dirty Data
Dirty data is a major challenge for businesses looking to deploy reliable AI models. Jonathan Frankle, chief AI scientist at Databricks, notes that ‘nobody shows up with nice, clean fine-tuning data that you can stick into a prompt or an application programming interface‘ for a model.
Combining Reinforcement Learning and Synthetic Training Data
Databricks’ technique exploits the idea of ‘best-of-N,’ where even weak models can score well on a given task or benchmark. The company trained a model to predict which best-of-N result human testers would prefer, based on examples. This created synthetic training data for further fine-tuning the model.
Synthetic training data refers to artificially generated data that mimics real-world scenarios.
This type of data is used to train artificial intelligence and machine learning models, enabling them to learn from diverse and realistic examples.
Synthetic data can be created using various techniques, such as generative adversarial networks (GANs) or automated data generation tools.
It offers numerous benefits, including improved model accuracy, increased data diversity, and reduced costs associated with collecting and labeling real-world data.
Test-time Adaptive Optimization (TAO)

Databricks calls its new approach Test-time Adaptive Optimization (TAO). By using some relatively lightweight reinforcement learning, TAO ‘basically bakes the benefits of best-of-N into the model itself,’ Frankle says. The method shows promise in improving language models and has been tested on FinanceBench, a benchmark that tests how well language models answer financial questions.
Test-time adaptive optimization is a technique used in machine learning to adapt model parameters for improved performance on specific input data.
This approach involves modifying the model's behavior during inference, allowing it to adjust to changing conditions and optimize its output.
By doing so, test-time adaptive optimization can enhance model accuracy, reduce computational costs, and improve overall efficiency.
Studies have shown that this technique can result in up to 10% improvement in model performance on certain tasks.
Real-World Applications
TAO can be used to boost the performance of AI models without relying on clean labeled data. This is particularly useful for companies looking to deploy agents, such as those used in finance or health insurance. Databricks tested TAO on a customer’s health-tracking app and saw significant improvements in reliability.
Expert Validation
Christopher Amato, a computer scientist at Northeastern University, notes that ‘the general idea is very promising‘ and that the lack of good training data is a big problem. He agrees that the TAO method could allow for more scalable data labeling and improved performance over time. However, Amato also cautions that reinforcement learning can sometimes behave in unpredictable ways, requiring careful use.
Reinforcement learning is a subfield of machine learning that involves training agents to take actions in an environment to maximize rewards.
It's based on trial and error, where the agent learns from its interactions with the environment.
Q-learning and SARSA are two popular algorithms used for reinforcement learning.
The goal is to find the optimal policy that maximizes cumulative rewards.
Reinforcement learning has applications in robotics, game playing, and autonomous vehicles.
Conclusion
Databricks’ TAO technique offers a promising solution for businesses looking to deploy reliable AI models without relying on clean labeled data. By combining reinforcement learning and synthetic training data, the method shows promise in improving language models and has real-world applications in industries such as finance and health insurance.