| |
Abstract:
Imagine that you wish to classify data consisting of tens of
thousands of examples residing in a twenty thousand dimensional
space. How can one apply standard machine learning algorithms? We
describe the Parallel Problems Server (PPServer) and MATLAB*P. In
tandem they allow users of networked computers to work
transparently on large data sets from within Matlab. This work is
motivated by the desire to bring the many benefits of scientific
computing algorithms and computational power to machine learning
researchers. We demonstrate the usefulness of the system on a
number of tasks. For example, we perform {\em independent
components analysis} on very large text corpora consisting of tens
of thousands of documents, making minimal changes to the original
Bell and Sejnowski Matlab source. Applying ML techniques to data
previously beyond their reach leads to interesting analyses of both
data and algorithms.
|