Murphy Choy

Data preparation, not just another data exercise

In Uncategorized on June 1, 2011 at 10:24 am

Preparing data in the appropriate manner for modeling work is usually interesting. There are plenty of unusual things that one can find in the raw data that it will be almost a miracle for any analyst to get hold of clean data. This issue in data quality also translate into additional work when one has to prepare the data in a way so as to be for ready for modeling. One of the most commonly encoutered problem is the format of the data passed between analyst.

Very often, one will be encountering data which takes the form of a excel table. While in theory this is not a very tough data format, it makes it difficult to be used in many other cases. This is especially the case when there are many merged cells and empty spaces in the data. This is also problematic should the titles be repetitive with big multiple rows indicating the range of the data.

Data to be used in modeling tends to be better described as a simple table that captures observations at given time frame. This tend to lead to better results and easier manipulation.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: