I was meditating about some analytic problems that some of my contemporaries are facing and I realize the main problem lies in variable creation. Many of their problems are modeled with variables which are too Naive by any standard. This is compounded by extensive abuse of mathematical transformation which create all sorts of funny variables which are hard to interpret.

One important principle to variable creation is the relevance to the question. For example, any one trying to answer about the life span of an individual will be seeking information from the person’s lifestyle, family history as well as some general population information on life span. However, what I noticed is that many people will just put a whole bunch of weird information which might not make sense such as the number of children or whether the parents are alive.

Another important point to note is transformation and skewness of data. While transformation can make a data more normal looking, it might not work all the time. This is compounded by the massive effect of skewness which needs to be addressed. Sometimes, the problem with skewness of data is the fact that we might be cutting the data too fine causing unnecessary problems within the framework of curse of dimensionality.

Hope this help.