BD|CESGA
Providing quick access to ready-to-use Big Data solutions
Because Big Data doesn't have to be complicated
Scalable
- Storage capacity 816TB
- Aggregated I/O throughtput 30GB/s
- Aggregated RAM: 2432GB
- 10GbE connectivity between all nodes
- 456/912 cores
Hadoop Platform
- Ready to use Hadoop ecosystem
- Covers most of the uses cases
- Fully optimized for Big Data applications
- Production ready
PaaS Platform
- When you need something outside the Hadoop ecosystem
- Includes a catalog of products ready to use: eg. Cassandra, MongoDB, PostgreSQL
Spark
A fast and general engine for large-scale data processing
Speed
Easy
Spark ML
New Spark’s machine learning library
Spark ML vs MLlib
As of Spark 2.0, the RDD-based APIs in the spark.mllib package have entered maintenance mode. The primary Machine Learning API for Spark is now the DataFrame-based API in the spark.ml package.
Why switching?
- DataFrames provide a more user-friendly API than RDDs
- The DataFrame-based API for MLlib provides a uniform API across ML algorithms and across multiple languages
- The RDD-based API is expected to be removed in Spark 3.0
Summary
Using BD|CESGA and Spark ML you can easily scale your machine learning problems to larger datasets
Take advantadge of parallelism!!