Murphy Choy

Cable gate data and opportunities

October 4, 2011

I was having a short discussion with my fellow researchers who specialise in text analytic about the potential of the cable gate data. However, they have much reservation about the usability of the cable gate data in general. Much of the worry rests on the sensitivity of the data as well as its reliability.


In my opinion, there are two major areas that we can research on. The first area that we can look into is the research on document similarity analysis. This analysis will be useful to identify documents which are written by the same author. This is useful for several reasons such as identification of the number of unique sources as well as the reliability of the data. Of course, this is no easy feat but there is huge potential that we cannot ignore.


The second area of analysis would be to identify the lexical structures within the document which is so often ignore by so many researchers. This allows us to build factual lexical identification algorithms that might prove to be useful in many other situations.


Due to the sensitive nature of the data, it will be very unwise to look into the contents which might in some way or the other compromise the security of another nation’s people and interests. However, structures do not point to any information that are directly related to something which we can identify easily from existing information.


