1.048.576 BD|CESGA

Providing quick access to ready-to-use Big Data solutions

Because Big Data doesn't have to be complicated

Scalable

Storage capacity 816TB
Aggregated I/O throughtput 30GB/s
Aggregated RAM: 2432GB
10GbE connectivity between all nodes
456/912 cores

Hadoop Platform

Ready to use Hadoop ecosystem
Covers most of the uses cases
Fully optimized for Big Data applications
Production ready

PaaS Platform

When you need something outside the Hadoop ecosystem
Includes a catalog of products ready to use: eg. Cassandra, MongoDB, PostgreSQL

Spark

A fast and general engine for large-scale data processing

Speed

Easy

Spark ML

New Spark’s machine learning library

Spark ML vs MLlib

As of Spark 2.0, the RDD-based APIs in the spark.mllib package have entered maintenance mode. The primary Machine Learning API for Spark is now the DataFrame-based API in the spark.ml package.

Why switching?

DataFrames provide a more user-friendly API than RDDs
The DataFrame-based API for MLlib provides a uniform API across ML algorithms and across multiple languages
The RDD-based API is expected to be removed in Spark 3.0

Spark ML

Demo

Summary

Using BD|CESGA and Spark ML you can easily scale your machine learning problems to larger datasets

Take advantadge of parallelism!!

Q&A

hadoop.cesga.es

Demo Source Code

Stay up to date subscribing to our mailing list

Scalable Machine Learning Spark ML on BD|CESGA platform

1.048.576

BD|CESGA

Scalable

Hadoop Platform

PaaS Platform

Spark

A fast and general engine for large-scale data processing

Speed

Easy

Spark ML

New Spark’s machine learning library

Spark ML vs MLlib

Why switching?

Spark ML

Demo

Summary

Q&A

Scalable Machine Learning
Spark ML on BD|CESGA platform