login/register

Snip!t from collection of Alan Dix

see all channels for Alan Dix

Snip
summary

This tutorial gives a broad view of modern approaches fo... machine learning and data mining methods on parallel/dis... platforms. Demand for scaling up machine learning is tas... some tasks it is driven by the enormous dataset sizes, f...
The tutoria

http://hunch.net/~large_scale_survey/

Categories

/Channels/AI

[ go to category ]

/Channels/techie/collective intelligence

[ go to category ]

For Snip

loading snip actions ...

For Page

loading url actions ...

This tutorial gives a broad view of modern approaches for scaling up machine learning and data mining methods on parallel/distributed platforms. Demand for scaling up machine learning is task-specific: for some tasks it is driven by the enormous dataset sizes, for others by model complexity or by the requirement for real-time prediction. Selecting a task-appropriate parallelization platform and algorithm requires understanding their benefits, trade-offs and constraints. This tutorial focuses on providing an integrated overview of state-of-the-art platforms and algorithm choices. These span a range of hardware options (from FPGAs and GPUs to multi-core systems and commodity clusters), programming frameworks (including CUDA, MPI, MapReduce, and DryadLINQ), and learning settings (e.g., semi-supervised and online learning). The tutorial is example-driven, covering a number of popular algorithms (e.g., boosted trees, spectral clustering, belief propagation) and diverse applications (e.g., speech recognition and object recognition in vision).

The tutorial is based on (but not limited to) the material from our upcoming Cambridge U. Press edited book which is currently in production and will be available in December 2011.

HTML

<p>This tutorial gives a broad view of modern approaches for scaling up machine learning and data mining methods on parallel/distributed platforms. Demand for scaling up machine learning is task-specific: for some tasks it is driven by the enormous dataset sizes, for others by model complexity or by the requirement for real-time prediction. Selecting a task-appropriate parallelization platform and algorithm requires understanding their benefits, trade-offs and constraints. This tutorial focuses on providing an integrated overview of state-of-the-art platforms and algorithm choices. These span a range of hardware options (from FPGAs and GPUs to multi-core systems and commodity clusters), programming frameworks (including CUDA, MPI, MapReduce, and DryadLINQ), and learning settings (e.g., semi-supervised and online learning). The tutorial is example-driven, covering a number of popular algorithms (e.g., boosted trees, spectral clustering, belief propagation) and diverse applications (e.g., speech recognition and object recognition in vision).</p><p> The tutorial is based on (but not limited to) the material from our upcoming Cambridge U. Press <a href="http://www.cambridge.org/us/knowledge/isbn/item6542017">edited book</a> which is currently in production and will be available in December 2011. </p>