I recently had one of those experiences where parenting intersected with analytics… The other day I told my daughter she has way too many shoes laying around the house. She replied with “well, I bet the Wilsons next door have more.” After I recovered from the initial shock of the smart-mouth-red-haired-monkey’s answer, I pointed out that the Wilsons had three kids opposed to two and were all girls opposed to one girl and one boy, and her brother only has 1 pair of sneakers! As I start forming the dataset in my mind, I realize that this is not a traditional data structure. To solve this problem I would need to build a predictive model at the child level to estimate their number of shoes, and then combine this information at the household level with other household characteristics. When I came out of my analytics day-dream the kids were already outside playing…and I was left to pick up the shoes on my own.
In most modeling scenarios I’ve encountered among insurers, data is assembled in a single flat table, which holds attributes of various types and sources, all aggregated on the policy level which is represented by a single row. This is often good enough for common analytical practices. However, in some scenarios, greater accuracy and better predictions would be possible by separating attributes into different categories, allowing for more dynamics in the structure of the data. In several common scenarios, hierarchical data structures, in which each type of data is maintained in a separate table, linked together by reference keys, provide for a much more natural and intuitive data representation, and model creation with stronger prediction power. The benefits gained from using such hierarchical data structures, or nested tables as they are sometimes called, can range from reduction of database complexity and size, through more elastic aggregations of data, to provision for models which are fundamentally wrong if done in any other way.
Common use-cases for using hierarchical data structures (besides estimating “shoes per household”) include:
1. Separating Risk and Demand modeling to Different Levels
In such use-cases, there is a clear distinction between policy level attributes, such as credit score, tenure, cross-selling discounts etc.; and item level attributes such as car value or number of driver convictions, or property size and location (see figure A). In such scenarios, the risk is calculated on a per-item basis, while the demand, or decision to purchase, is made on the policy level as a whole. This is often the case for North American insurers, which do not as a practice sell individual car policies, rather house-hold level policies. However, the prices are still calculated at the vehicle level. This ultimately results in a disconnect between the pricing and the customer behavior that can result in suboptimal pricing decisions.
Figure A: Typical motor insurance heirarchical structure.
2. Redistribution of Collective Policies’ Risk Across All Individual Policies
An example of such a policy would be health policies, owned by a central body such as an employer, while each item is individually eligible. In such scenarios, the collective risk must be accounted for during the pricing stage and distributed in a reasonable fashion across all members, although the risk factor may vary significantly. Such collective policies may hold hundreds and thousands of individual policies, which all expect to be given similar conditions. Using hierarchical modeling methods allow for optimal cross-subsidies while accounting for both individual and global constraints.
In healthcare, hierarchical structures also allows maintaining dynamic relationship between independent items, such as households and Health Maintenance Organizations (HMO’s) (see figure B), with each household connected (typically) to a single HMO.
Figure B: Representative health insurance heirarchical structure.
Figure B: Representative health insurance heirarchical structure.
3. Brand Selection
Companies that front several brands for the same product line often require the ability to price several offers with slight variations. However, considerations such as brand power and brand loyalty have significant effect in customer shopping preferences. Hierarchical data structures serve this purpose well by allowing companies to separate non-brand related attributes from brand-specific attributes, thereby calculating risk and non-brand factors jointly for all brands, while the selection is done by using brand-specific attributes separately. Such approaches allow for more precise models, with higher flexibility on implementation techniques.
Figure C: Representative multi-brand motor insurance heirarchical structure.
In addition to the above use-cases, there are also benefits for high variability multi-item policies (such as car fleets or multi-property small-medium enterprises), as well as IT and usability benefits by reducing size and complexity of data tables, by not having to duplicate sparse attributes across all items (e.g. claims and convictions data).
Working with hierarchical data structures is not without its challenges. From data management perspective there is the obvious overhead of maintaining several data tables with their linkage. This could result in what would be simple row operations in flat data tables to become complicated SQL exercises which can involve inter-table queries, loops and aggregations. From the analytical perspective, drawbacks concentrate mainly around the need to maintain a clear distinction between hierarchies. Not every interaction between data attributes is viable, and users must keep a close eye on which analytical operation is performed on which level of the data. A deep understanding of the mathematical as well as the business logic is required to keep track of “what goes where”. However, in spite of these limitations and in some cases considerable ELT and analytical overhead, using hierarchical data structures reduces overall complexity, or in some cases, allows for analytical modeling approaches impossible any other way.
To know if hierarchical data structures are right for you, we encourage you to contact the Earnix Professional Services team, who can help you with understanding the use-cases which might be applicable to you, and guide you through if and how these can empower your analytical team.
Lastly, while hierarchical data structures can assist with a wide variety of business problems, you may still find yourself tripping over misplaced shoes in the dim of night, since as advanced as predictive analytics has gotten, children behavior is still a conundrum.