Murphy Choy

Variable creation and how they relate to the problem

In SAS on May 18, 2011 at 5:46 am

I was meditating about some analytic problems that some of my contemporaries are facing and I realize the main problem lies in variable creation. Many of their problems are modeled with variables which are too Naive by any standard. This is compounded by extensive abuse of mathematical transformation which create all sorts of funny variables which are hard to interpret.

One important principle to variable creation is the relevance to the question. For example, any one trying to answer about the life span of an individual will be seeking information from the person’s lifestyle, family history as well as some general population information on life span. However, what I noticed is that many people will just put a whole bunch of weird information which might not make sense such as the number of children or whether the parents are alive.

Another important point to note is transformation and skewness of data. While transformation can make a data more normal looking, it might not work all the time. This is compounded by the massive effect of skewness which needs to be addressed. Sometimes, the problem with skewness of data is the fact that we might be cutting the data too fine causing unnecessary problems within the framework of curse of dimensionality.

Hope this help.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: