I am always excited to see more organizations wanting to use analytical models at the center of their business processes. Like most things in life, the road to success is not straight; sometimes there are a few road bumps to navigate. The same is true for integrating models into the production environment. For instance, even if you have the most accurate pricing models in the development environment, if you do not integrate them into the production environment correctly, it can cause customers to get the wrong prices and spell losses for the organization. However, with the right foresight and expertise, data scientists have the ability to make the model deployment process smoother.
Welcome to the last blog in the “Creating a successful production system” series. In my first blog, Eleven Best Practices for Building a Successful Production System, we discussed the attributes needed for a production system to be capable of deploying and executing models successfully. In the second blog, How to Transform Your Models into a Fully-integrated Analytics System, we described the requirements of an integrated end-to-end analytical system and in this blog we will look at the challenges of building and operating such an effective production system. So let’s dive right in.
Road bumps: What to look out for?
After investing hours in building models, the final part of the process is deploying the constructed models in a production environment so that the analytical results can be used in the daily decision making process. The deployment process can be quite time-consuming and there can be many challenges along the way. Certain factors can make the process of model deployment, monitoring, and adaption longer. Below I will describe a few of the most common road bumps:
- Code Deployment:
Until you are ready to deploy your model live in the production environment, your model is just a code. Having tested it in your environment and passed it “ready” to your IT department, doesn’t prevent the time needed to test and deploy it manually into your production system.
- Function Deployment:
Your model is a formula or table structure (or combination). Sometimes it takes another step to export it from your model/development environment to the testing and production environment. It should be imported, sometimes manually, into your production system, and tested. This takes time and makes the process prone to errors.
- Variable Conversion:
Converting data to another format or calculating any sort of new data has to be done automatically, via mathematical functions or SQL. Making it a manual process (e.g., when a DBA person has to do it) is something you would like to avoid because it can be very cumbersome.
- Large Lookups:
Sometimes there are multiple lookups embedded in large amounts of data. For example, it could be finding a score for a postcode or a data point in a huge rating sheet. These lookups might be slow to run and are not agile enough to quickly update and redeploy them. It could become rather burdensome to maintain these structures. There are a few ways to overcome this: at times it is possible to store the score with a user profile and make it an input to the model; use existing solutions, like persistent and in-memory databases or pre-cache those structures in memory.
- Getting Real Data:
It can take days and weeks to get updated input data for your models as well as monitoring data for your strategy performance. Under these circumstances, you are less able to quickly react to market changes.
- Integration with Existing Systems:
In some organizations, IT systems might be complex and hard to integrate with. This slows down or makes it practically impossible to implement an automated integrated solution.
In certain cases, models in the production environment do not work as planned and it’s hard to determine what the reason is for their sub-par performance. When trying to solve the problem, it can be hard to get everything you need – input data, actual intermediate and final results – and bring them into your development environment.
Building a solution: Navigating the road bumps?
Reducing the challenges associated with integrating models into production systems is not a small undertaking. Organizational and technical obstacles can reduce deployment process efficiency. For instance, often IT departments will say it will take years to build and deploy the production systems because they have legacy IT systems in place, a very demanding priority list and limited resources.
As with any large project, when possible, the project should be broken down into smaller manageable tasks and deliverables. Furthermore, if possible, existing solutions for some or all requirements should be used instead of trying to find new solutions.
Below are 6 tips on how to breakdown a large project into pieces:
- Prioritize your needs first. Usually there is a huge list of to-dos. Make sure to prioritize the list in accordance to the must-haves. Tackle those first; all the rest can wait.
- Focus on deliverables. Place your attention on the first few deliverables and work on those. Don’t touch “the rest” yet. Once you have completed and delivered the outputs, you can move on to the next set of deliverables on your list.
- Go out and check for existing solutions. Crowd source your problem by talking to other organizations that might have already tackled the same or similar projects.
- Use modern technology and cloud infrastructure. Use modern technology to its maximum; making sure that the solution is flexible, is built the right way and that you have the latest updates and support needed for the technology you use. Similarly, the cloud infrastructure should be flexible and you should get the necessary support from the cloud provider.
- Use service architecture to separate your production systems with clear interfaces. Each interface should be clearly defined so that any dependency with other systems is defined in the system and accounted for. If not, you run the risk of miscommunication between systems that leads to sub-optimal performance.
- Work in an agile environment. There are too many unknowns in making software solutions and it is impossible to plan for every eventuality. It is important to design for the long-term but implement for the short-term. By doing this, feedback and correction can happen quickly so that small glitches can be fixed before they become big issues.
I hope that this blog series has given you some insights into what to look out for when taking your models into production. Building data-driven products or using the models for real-time predictions is reliant on deploying models into the production environment effectively and making sure these models are maintainable and reliable over the long-term. Feel free to ask question or make comments in the comment section below.