By the end of this course, you will have learnt:
- Apache Spark architecture
- How to use Spark with Java
- How to integrate Spark with NoSQL and other Big Data technologies
- How to scale calculations to a cluster of servers
- How to deploy Spark projects to the Cloud
- Machine Learning with Spark
Who should attend
This course is aimed at Data Scientists, Analysts, Software Developers, Database Developers, Data Warehouse Managers & Business Intelligence Specialists, Software Architects.
During the Spark course we will give you some fairly simple Scala code examples to run and edit.
It would be ideal if you have some experience of software development / scripting, or database development using a SQL-based RDBMS (e.g. SQL Server, Oracle, MySQL, DB2...).
If you are from a more dashboard-oriented Business Intelligence background (or have good knowledge of a platform such as Excel, SAS etc) you should also benefit from this course - please let us know if you have any questions or concerns.
Preparing for the course
If you wish to participate in the hands-on exercises you should sign up for an Amazon AWS account at least 48 hours prior to the course: http://aws.amazon.com/ - and don't forget to save your login details! You may incur around $10 - $20 of cloud storage and computation usage during the course.
If you are interested in custom / on-site training in Spark / real time Big Data design and analysis training for any size of team, please get in touch.
We can take into account your existing technical skills, project requirements and timeframes, and specific topics of interest to tailor the most relevant and focussed course for you.
This can be particularly useful if you need to learn just the new features and Best Practices with Spark or need to include extra topics to help with pre-requisite skills.
Real-time Analysis Course Syllabus
Big Data Fundamentals
- Overview of Hadoop/HDFS and Amazon AWS/S3
Spark and TDD
- Creating a Spark Project in Intellij
- Running and debugging a Spark project
- Building and deploying a Spark Project with SBT on AWS
- Spark Core (RDD)
Spark Streaming with Kafka
- Kafka and Soak Streaming Examples
Spark Machine Learning
Real-time Analytics with Spark
- Spark architecture
- Installation and running Spark in theCloud
- Programming with Spark
- Streaming data with Spark
- Integrating Spark with NoSQL and other Big Data technologies
- Spark demo + avro + pig + hive
- Spark and Kafka integration
- Dynamic sampling
- Distinct count, cardinality estimation
- Moving average
Integration with third-party applications and languages
- R – examples for beta distribution
- Lambda architecture