Auto XGBoost: Effortless Integration of Advanced Modeling in Production
Yuval Ben Dror, Data Science Researcher, Earnix & Yitzhak Yahalom, Senior Data Science Researcher, Earnix
3. March 2025

Earnix Analytical and Technical Blog Series
How can insurers and banks tackle today’s toughest analytical challenges? At Earnix, we believe it starts with asking the right questions, challenging assumptions, and finding better ways forward.
In this blog series, we explore key issues in financial analytics—addressing complex problems, improving models, and staying competitive. Our first post covered Model Analysis, and you can check it out here.
These technical posts are designed for professionals in actuarial science, data science, and analytics, with a focus on clarity and practical insights. The second topic of the series, which we will cover today, is Auto-XGBoost. Let’s get started!
Introduction: The Rise of the Trees
In the past, Generalized Linear Models (GLMs) were the undisputed kings of insurance modeling. They were loved for many reasons, but the truth is that they simply didn’t have any competition - limited data availability and minimal computational power made advanced modeling impractical. However, in recent years the world of ML has changed dramatically – increased data availability and the rise of cloud-based computers gave way to some of the most accurate and robust machine learning algorithms. Many of those algorithms, especially in the world of tabular data, are based on decision trees.

Decision trees are a type of supervised learning model used for both classification and regression tasks. They work by splitting data into smaller and smaller subsets based on feature values, forming a tree structure where each internal node represents a decision based on a feature, and each leaf node represents a final prediction. Their simplicity and interpretability make them popular, but they are prone to overfitting, especially when trees become deep and complex. Additionally, they can be sensitive to small changes in the data, leading to high variance and poor generalization on unseen data.
XGBoost
To address these limitations, boosting was introduced as a technique to combine multiple weak learners—usually shallow decision trees—into a stronger model. Boosting works iteratively, where each new tree corrects the errors made by the previous ones.
XGBoost (Extreme Gradient Boosting) builds on this idea with several optimizations, such as regularization techniques to control overfitting, parallelization to improve computation speed, and handling sparse data more efficiently. These improvements allow XGBoost to achieve high performance, making it one of the most popular machine learning algorithms in practice today.
Released in 2014, XGBoost is an open-source library. Alongside LightGBM and CatBoost, it's one of the most renowned tree-based modeling frameworks. In fact, according to Google searches, it's nearly twice as popular as both of them combined.

Much of XGBoost's popularity over the years can be attributed to its role in helping numerous participants in machine learning competitions secure victories. Some of its benefits compared to other tree-based models:
Regularization: XGBoost includes L1 (Lasso) and L2 (Ridge) regularization, reducing overfitting by controlling the complexity of the model.
Parallelization: XGBoost supports parallel computation, making it significantly faster during training compared to traditional gradient boosting or random forests.
Handling Missing Data: XGBoost automatically handles missing values, determining the best path for missing data points during tree construction, unlike some models that require preprocessing.
Pruning: XGBoost prunes trees using a "maximum depth" and only keeps branches that improve the model, leading to more efficient and interpretable trees compared to models like random forests, which grow full trees.
Custom Objective Functions: XGBoost allows for custom objective functions, giving users flexibility to optimize different loss functions beyond standard regression or classification loss.
Memory Efficiency: XGBoost is optimized for memory usage, especially with sparse data formats, which allows it to handle larger datasets with greater efficiency.
Flexibility and Control: It offers a rich set of hyperparameters for fine-tuning model performance, providing more granular control over how trees are constructed and how the boosting process is managed.
Robustness to Outliers: With careful tuning, XGBoost can be more robust to outliers, as its regularization helps mitigate the influence of extreme values.
Advanced Modeling in Earnix
So, why not just deploy the Python XGBoost framework directly into production? Well, in practice, production systems often can't run Python scripts, and those that can, may suffer significant performance issues. At Earnix, for example, while Python scripts can be leveraged for automation, they aren’t executed online.
Over the years, Earnix has added support for various ML platforms like H2O and DataRobot. However, advanced modeling was still limited to these specific platforms.
.png&w=2048&q=75)
Implementing XGBoost in production through these platforms was cumbersome, requiring extensive work and overhead. Even if Python integration were possible, building the model itself can be a complex process with many parameters to define, necessitating domain knowledge and coding skills. This is where the Auto XGBoost Lab comes into play.
Earnix Labs - Driving Innovation at Earnix
How do new ideas go from “Wouldn’t it be cool if…?” to real tools that insurers and banks can use? At Earnix, we explore those questions in Earnix Labs—our space for testing fresh ideas and early-stage features. For more information, refer to our previous blog.
Leveraging ONNX
One of the main technological advancements we leverage in this new app is ONNX, which is a model format that was added to Price-It in version 12.0. Without delving into technicalities, think of ONNX as a universal translator for ML models across different platforms and hardware. It's a library that offers great flexibility in creating platform-dependent, graph-based models. Over the years, it has gained popularity, leading to numerous pre-built ONNX conversion libraries for a wide array of advanced ML models.

Before ONNX was integrated into Earnix systems, using XGBoost with Price-It was infeasible. However, the availability of ONNX paved the way for the creation of the Auto XGBoost Lab. This new app completely abstracts not only the coding of XGBoost but also, perhaps more importantly, the complexities of converting it to ONNX for production use.
Auto XGBoost: How does it work?
The Auto XGBoost Lab serves as a tool designed to streamline the process of building and deploying XGBoost models, specifically tailored for insurers and banks. By providing an intuitive user interface, automatic selection of hyper parameters, and automatic creation of ONNX, the lab enables users to create XGBoost models seamlessly within the Earnix Price-It platform. Below, we detail the various functionalities of the lab that facilitate ease of use and enhance the modeling experience:
User-Friendly Interface: The design of the Auto XGBoost Lab emphasizes usability. The user-friendly interface enables professionals, regardless of their technical background, to harness advanced modeling without delving deeply into the intricacies of coding or algorithm optimization. This accessibility promotes broader adoption of machine learning techniques across organizations, facilitating a data-driven culture and empowering teams to make informed decisions.
Hyperparameter Search Configuration: The ability to tune a model is essential for optimizing performance. The lab allows users to configure hyperparameter searches with ease, incorporating sophisticated validation techniques. Notably, The lab leverages random searches to optimize multiple hyperparameters simultaneously.

Seamless Categorical Encoding Integration: Unlike other types of tree-based models such as Catboost, the processing pipeline of XGBoost models requires careful handling of categorical variables. The Auto XGBoost Lab integrates the encoding of nominal columns seamlessly, reducing the risk of errors and enhancing the overall efficiency of the modeling process.
Automatic ONNX Conversion: One of the most significant hurdles in deploying machine learning models is the conversion process, often fraught with manual errors. The Auto XGBoost Lab automates the conversion of models to ONNX format, allowing for effortless deployment across different platforms. By doing so, it ensures that models are production-ready without the repetitive and error-prone steps that typically accompany manual conversions.
Explicit Variable Mapping: Often, implicit mapping can lead to confusion or misunderstandings in a model’s operational context after importation. The lab alleviates this concern by managing variable mappings explicitly, ensuring clarity in how features relate to model predictions.
Data Integration: One of the biggest advantages of the Auto XGBoost Lab is its capability to build models using data directly managed in Price-It. This functionality eliminates the cumbersome need for external data sources, thereby ensuring data integrity and coherence throughout the modeling process. Users can focus on analysis without the distraction or complexity of managing multiple data repositories.
Beyond these features, the Auto XGBoost Lab reflects Earnix's focus on practical solutions. By utilizing modern technology and listening to feedback from users, we aim to create tools that fit the changing needs of financial analytics. The main goal – make models easier to build, but keep their quality in-tact.
Conclusion
Building and deploying machine learning models in production can be complex, especially when dealing with data integrity, hyperparameter tuning, and seamless integration into pricing systems. At Earnix, we developed the Auto XGBoost Lab to simplify this process—eliminating inefficiencies, reducing manual effort, and ensuring that models are production-ready from day one. By automating model creation and ONNX conversion, insurers and banks can focus on what truly matters: making smarter, faster, and more reliable decisions.