For us data scientists, the benefit of using statistical modeling and optimization techniques for our business is clear. Although we spend our days coding in SQL, python, R or other frameworks and our models are a software code or some nice mathematical function, our aim is to use sophisticated statistical models to analyze our business and customers. Being able to calculate the impact of certain decisions on our business and share our findings with decision makers, enables them to select and implement the best strategy.
But this is just the beginning.
To improve business results, what we really want to do is to deploy the best strategy quickly and effectively into the market. Being faster to market is a competitive advantage as is having model sophistication. Those organizations that have already invested in having this capability have seen its value and repeated its rewards. So, in this blog, I will not focus on the importance of having such a production model but rather I will focus on the practical features needed when bringing your models into a financial organizations’ production system.
What kind of production system do you have?
Depending on your organization, you might have either a real-time or batch production system. Real-time production systems react to events as they occur. For example, you can provide a quote to a customer for his new product / loan/ renewal within a second of his request. On the other hand, with batch processing, current production models are executed one or more times a day on batches of data. This is perfectly suited for cases when you initiate the communication with the customer, e.g., renewals and marketing campaigns.
What does it take to bring your models into production?
Building an effective production system takes more than just connecting your model to that production system. The system needs to be designed to be robust, scalable, responsive and maintainable over time. Below, I have outlined 11 attributes of a production system that are essential for deploying and executing models successfully.
- Input variables
Incorporating data into models is an operational challenge because the data can come from various sources that you need to feed into one model. There are several data input variables that you should consider for your models:
• Data provided as is from customer inputs/customer records in the database
• Data transformed or extracted from an internal system
• Data extracted from an external system (such as vehicle history)
- Edge cases
When developing a model, the focus is on the result of a specific data sample you have. When taking the model into production, the focus should be on its live performance. One thing to consider is incorporating all kinds of data that you didn’t see in the sample, for example, it could be zeros, missing data, values that make your models go out of a numeric range, and so on.
This is the corner stone of the production system. The aim it to completely automate the system so that with a click of a button you can see your strategy deployed into production within a short period of time. Fast cycles give you time to react quickly to incremental changes in the market before they get out of hand.
In cases where your new strategy doesn’t work as well as expected, you should have the capability to be able to roll back to a previous strategy that has proven to work well. This will give you the time to re-adjust the new strategy or pick another strategy without causing damage while you are making the necessary adjustments.
- New model testing
Edge cases, accuracy and calculation times are all essential components in a production system. You would want to know if there is any problem with the model before you deploy it. The best practice is to create a process that automatically tests the model as part of the deployment process.
- A/B testing
When making a significant change to the strategy, or when there is not enough data to see which of the strategies is better, it is recommended to perform A/B testing between the two or more strategies in the market. Practically it means randomly selecting strategies based on some predefined ratio and comparing them after a certain time period.
- Execution time
For a real-time production system, the calculation of the model has to be fast. In cases where functions rely on large data structures, these should all be pre-loaded to enable acceptable response times.
Hopefully, your business is booming which means that your models will be put to work for more and more customers. Make sure to have the ability to scale to millions of calls per day while preserving acceptable response times.
- Load balance and failover
In case of any failure, such as a hardware or networking failure, a backup system should be in place to allow the work to continue as normal. Furthermore, load balancing should be done, especially in large organizations, so that the load is spread over multiple servers. This aims to optimize the use of resources and reduce response times, thereby avoiding overloading any one of the resources.
- Monitoring model performance
Today, changes in the market happen very quickly. If you monitor your models’ performance, you might be able to see these changes soon after they happen. Monitoring your models once a day might be good enough for a quick reaction to the market.
- 11. Gathering data for the next step
Being a nimble player in the market you operate in, requires quick reaction to market changes. Make sure to continuously improve your models. To re-run your models, you must have access to new data (predicted and explanatory variables). Best practice is to automatically collect full or random samples of the data and make this data available in the development environment.
Above are the most common best practices for moving your models into production. However, depending on your organization, you may have additional steps and requirements due to specific business goals, company structures or legacy systems. When it comes to implementation, this is a complicated process and in most organizations, IT teams are responsible for implementing these processes. In case some of the steps have to be done manually, this will make the time-to-market loop longer. In my next blog in this series, How to Transform Your Models into a Fully-integrated Analytics System, I will discuss the model lifecycle process because in a fully-integrated system, a model will typically be in a constant loop of adjusting, testing, and deploying to production.