Data Mining

From Machine Learning, my interest flew to Data Mining. Well, most of Data Mining techniques use machine learning algorithms anyway. But, why Data Mining?

Data Mining = given a bunch of data, Data Mining (automatically or semi-automatically) could extract an interesting information/pattern out of it. For example:
  • Given a transaction data of a supermarket, Data Mining could help you to find: "Aha, most of the times customer that buy product A usually also buy product B". Then, of course the shop manager can do something with this information. Either gives some promotion discount, bundle the product together, or even rearrange the shelf so that product A and B located close to each other.
  • Suppose that a bank has a good credit risk estimator, let's say Mr. T. This main task of Mr. T is to decide whether a customer could have his/her credit approved or not. What happen if Mr. T promoted into a higher position? Then this bank has to find a new person in this field to perform Mr. T task (as a credit risk estimator). But then, this could cause a low performance estimator since the experience of the new person is not as much as Mr. T. Then the idea is to have a machine that learn from Mr. T experience (given the history data files and Mr. T decision results). As you can guess, this machine then (we hope) could produce a decision similar enough to Mr. T.
There are many Data Mining task. The first point above is Association. The second point is Classification. The other is: Clustering, Prediction, Pattern Analysis, and Time-Series.

Once there is a competition similar to example about Mr. T above. So, University of Melbourne launch a competition to predict the outcome of a grant application: http://www.kaggle.com/c/unimelb. This competition is launch because there are so many grant application right now received by the university (and its professors has to review it all). In order to lower the work-rate of its professors in reviewing grant application, the idea is to have a machine that could predict a success-rate for each application. Next, the university could take only grant application which has score 80% or above to be given to its professors for a final decision. This approach could reduce the number of application reviewed by its professors tremendously. Of course the algorithm that is used by the machine should be a good algorithm.

Story about the competition by University of Melbourne above is a perfect example on how machine could really help human task. Example about Mr. T above could also extended: not merely the machine which give the decision, but the result by the machine should also be monitored by a human behind.

More article about Data Mining, see its article in wikipedia for details.



Comments