The growing interest in data mining is motivated by a common problem
across disciplines: how does one store, access, model, and ultimately
describe and understand very large data sets? Historically, different
aspects of data mining have been addressed independently by different
disciplines. This is the first truly interdisciplinary text on data
mining, blending the contributions of information science, computer
science, and statistics.
The book consists of three sections. The first, foundations, provides
a tutorial overview of the principles underlying data mining
algorithms and their application. The presentation emphasizes
intuition rather than rigor. The second section, data mining
algorithms, shows how algorithms are constructed to solve specific
problems in a principled manner. The algorithms covered include trees
and rules for classification and regression, association rules, belief
networks, classical statistical models, nonlinear models such as
neural networks, and local "memorybased" models. The third section
shows how all of the preceding analysis fits together when applied to
realworld data mining problems. Topics include the role of metadata,
how to handle missing data, and data preprocessing.
