Murphy Choy

An interesting observation about people doing customer segmentations.

In Uncategorized on May 16, 2011 at 8:08 am

It was a busy day for me. I was having a short discussion with one of my friend about cluster analysis and noticed something that seems to be a common occurrence in customer segmentation.

One of the key problems that is widely encountered in customer segmentation is the ever common non-normal data. This is further compounded by the powerful presence of zero which skews the distribution even more drastically. At the same time, many people will attempt to do clustering on variables such as amount with widespread levels of zero only to find many outliers and unusual diagnostic plots.

There are a few ways to solve this. The first way involves ranking variables and the second involves one to actually convert the values into common comparatives. Both can be easily achieved in SAS using PROC RANK and Arrays.

Hope this will help.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: