A User’s Guide to the “Perfect” Segmentation

It’s about a 6 min. read.

Jay L. Weiner, Ph.D
Chief Methodologist & VP, Analytics & Data Management

A really good segmentation benefits many users. The product development team needs to design products and services for key target segments. The marketing team needs to develop targeted communications. The data scientists need to score the database for targeting current customers. The salesforce needs to develop personalized pitches.  Last, but not least, the finance department uses segmentation to help allocate the resources of the firm. With so many interested parties, it’s easy to see why getting buy in up front is critical to the success of any segmentation.

A “perfect” segmentation solution would offer insights for each user to help them execute the strategic plan.  What does this mean from an analytical perspective?  It means we have differentiation on needs for the product development folks, attitudes for the marketing folks and a predictive scoring model for the internal database team.  That sounds easy enough, but in practice it is difficult.  Attitudes are not always predictive of behaviors.  For example, I’m concerned about the environment.  I have solar panels on my roof.  You’d think I would drive a zero emissions vehicle (ZEV) and yet I drive a 400HP V8 high octane burning gas powered car.  I don’t feel too bad about that since I don’t really drive much.  That said, my next car could be the Volkswagen I.D. Buzz, an all-electric nostalgic take on the original VW van, but I digress.

Segmentation is not a property of the market.  It is an activity.  It’s usually helpful to evaluate several potential segmentation schemes to see how well they deliver the key objectives.  We do this by prioritizing the objectives.  Getting nice differentiation on attitudes to help create more effective marketing campaigns might be more important than getting a high accuracy on scoring the database.

My colleague, Brant Cruz recently listed leveraging existing data sources as one of the keys to successful segmentation.  This is often one of the biggest challenges we face in segmentation.  How well can we classify the customer database?  What’s in the database?  Most often it’s behavioral data like month spend, products purchased, points redeemed.  These data are the most accurate representation of what happened and when it happened.  What they don’t help explain is why it happened and in some cases who did it.  For example, many families subscribe to streaming music and video services.  If you don’t remember to log in, then the behavior is correct for the family, but not necessarily attributable to a specific user.

Appending demographic and attitudinal data to the database can help provide the links.  When such data are available, we have to verify the source of those data.  Many companies offer the ability to populate demographic and potentially attitudinal data. If this is the source of the append, then is it an actual value for the specific customer or is it a proxy for that customer based on nearest neighbor values.  In either case, we would still need to determine the age of the appended data.  How often do these values get updated?  Are some values missing?  For example, if you have recently signed up for an account, then your 90-day behavioral data elements won’t get populated for some period of time.  This means that I would need to either remove these respondents from my file or build a unique model for new customers.  How well we can accurately predict the segments is contingent in part on how accurate our data are.

The most accurate solution would be to simple segment using only information in the database.  If our ultimate goal is to help the client with prospecting for new business, a segmentation of customers is not likely to be too helpful.  This means that I need to collect primary data and ask surrogates for the values in the database.  A concurrent sample of customers would help with any need calibrate the survey responses for over/under statement.

When we start to mix database values with primary survey data, we typically do two things.  First, we dilute the differences in attitudes and needs.  Second, we reduce the accuracy of scoring the database.  There are ways to improve the scoring accuracy.  We can provide a list of attributes that could be appended to the database to increase the correct classification.  Sometimes, the data scientists may be able to identify additions variables in the database that were not provided up front.  Other times, it’s simply a matter of figuring out how to collect these values and have them appended to the database.

One part of the evaluation is to determine how many segments to have. Just because you have a segment, doesn’t mean you have to target that segment.   You should have at least one more segment than you intend to target.  Why?  This lets you identify an opportunity that you have left in the market for your competitors.  Just because there are segments of folks interested in zero-emission vehicles, or self-driving cars does not mean you need to make them.  Most companies can only afford to target a small number of segments.  Database segmentations with targeted digital campaigns are often easy to execute with a larger number of segments.

How long can you expect your solution to last?  Typically, segmentation schemes last as long as there are no major changes in the market.  Changes can come from technological innovations.  ZEV and self-driving cars have changed the auto industry.  Shifts in the size of the segments over time are just one indication that the segmentation could use refreshing.