Apache Spark Training Course

Learn to create real-time data analysis & ML solutions with Spark

images/2015/09/spark-logo-real-time-analytics-training-course.pngOur Apache Spark Training Course is aimed at Data Scientists, Analysts, software developers, architects, who need to gain hands-on experience using Apache Spark to create real-time Data Stream analysis and large-scale Machine Learning solutions.

The Spark training course has recently been expanded to include even more hands-on exercises. You are encouraged to bring along your own laptop so you can learn in a familiar environment and take away everything you have worked on during the class, to implement in your own projects or to display in your portfolio of work.

You'll be guided by an industry expert who has first-hand experience of designing and implementing commercial-scale Big Data analysis solutions. We run our Spark training in London, and through on-site customised courses.

By the end of this course, you will have learnt:

  • Apache Spark architecture
  • How to use Spark with Scala
  • How to integrate Spark with NoSQL and other Big Data technologies
  • How to scale calculations to a cluster of servers
  • How to deploy Spark projects to the Cloud
  • Machine Learning with Spark

Who should attend

This course is aimed at Data Scientists, Analysts, Software Developers, Database Developers, Data Warehouse Managers & Business Intelligence Specialists, Software Architects.


During the Spark course we will give you some fairly simple Scala code examples to run and edit.

It would be ideal if you have some experience of software development / scripting, or database development using a SQL-based RDBMS (e.g. SQL Server, Oracle, MySQL, DB2...).

If you are from a more dashboard-oriented Business Intelligence background (or have good knowledge of a platform such as Excel, SAS etc) you should also benefit from this course - please let us know if you have any questions or concerns.

Preparing for the course

If you wish to participate in the hands-on exercises you should sign up for an Amazon AWS account at least 48 hours prior to the course: http://aws.amazon.com/ - and don't forget to save your login details! You may incur around $10 - $20 of cloud storage and computation usage during the course.


If you are interested in custom / on-site training in Spark / real time Big Data design and analysis training for any size of team, please get in touch.

We can take into account your existing technical skills, project requirements and timeframes, and specific topics of interest to tailor the most relevant and focussed course for you.

This can be particularly useful if you need to learn just the new features and Best Practices with Spark or need to include extra topics to help with pre-requisite skills.

Apache Spark Course Syllabus

Big Data Fundamentals

  • Overview of Hadoop/HDFS and Amazon AWS/S3

Spark and TDD

  • Creating a Spark Project in Intellij
  • Running and debugging a Spark project
  • Building and deploying a Spark Project with SBT on AWS
  • Spark Core (RDD)

Spark SQL

Spark Streaming with Kafka

  • Kafka and Soak Streaming Examples

Spark Machine Learning

Real-time Analytics with Spark

  • Spark architecture
  • Installation and running Spark in theCloud
  • Programming with Spark
  • Streaming data with Spark
  • Integrating Spark with NoSQL and other Big Data technologies
  • Spark demo + avro + pig + hive
  • Spark and Kafka integration

Streaming algorithms

  • Dynamic sampling
  • Distinct count, cardinality estimation
  • HyperLogLog
  • Moving average

Integration with third-party applications and languages

  • Python
  • R – examples for beta distribution
  • Hadoop
  • Lambda architecture

Call us on 020 3137 3920 to find out how we can help

Attendee Full name.