Data Mining with R Training Course

Learn to wrangle data with R!

Our Data Mining with R training course is essentially about exploring and generating value from your data. This course provides an overview of methods of data exploration and covers some of the more important techniques in data analysis, i.e. looking for patterns and meaning in your data.

R is a language designed expressly to provide a statistical programming environment that includes extensive graphical capabilities. It has rapidly become the de facto standard for analysts across many disciplines.

You will learn how to produce general summary statistics and how to explore your data visually using graphical methods. You’ll also learn how to rearrange your data using cross tabulation and contingency tables to help spot potential patterns. You will also learn some of the basics of machine learning by cluster analysis (unsupervised machine learning) and regression analysis (supervised machine learning).

By the end of this course, you will have learnt:

  • Summarising data numerically with R

  • Tabulating data with R
  • Visualisation - Graphical summary of data with R
  • Unsupervised machine learning with R
  • Supervised machine learning with R

Who should attend

Anyone who needs to analyse data will find this course useful. The methods demonstrated here are applicable to multiple disciplines.


You need to know some basics of working with R but knowledge of statistics or analytics is not essential. You’ll need a laptop computer with R installed. No other software is needed although a spreadsheet program would be a useful asset.

On site

We are very happy to put together an on-site Data Mining with R Training workshop based on your specific requirements, and can take into account your existing programming experience level and types of statistical analysis project to which you will be applying R.

Course syllabus

Summarising data numerically

  • Getting averages and other summary statistics
  • R-commands - mean, median, summary, apply, tapply, aggregate
  • Gaining some initial insights into your data.
  • Averages and other summary statistics

Tabulating data

  • Frequency (contingency) tables. Cross-tabulation
  • R commands - table, ftable, xtabs, colMeans, prop.table, margin.table, addmargins
  • How to transform data using contingency tables and cross-tabulation

Graphical summary of data

  • Summary graphs. Exploratory graphs
  • R commands - stripchart, dotchart, plot
  • Visualizing data
  • Graphs to help summarize data
  • Graphs to help explore data and view potential patterns

Unsupervised machine learning

  • Dissimilarity, Hierarchical clustering, K-means, Partitioning around Medoids, Fuzzy analysis
  • R commands - dist, hclust, cutree
  • Looking for clusters in your data
  • Methods of cluster analysis include hierarchical clustering, partitioning methods and agglomerative nesting
  • R commands - kmeans, pam, diana, agnes, fanny

Supervised machine learning

  • Regression analysis, Linear models, Curvilinear models, Non-Gaussian models
  • R commands - lm, coef, resid, plot, abline
  • Exploring relationships between factors in your data
  • Regression analysis includes linear and curvilinear models as well as non-Gaussian regression
  • R commands - glm

Call us on 020 3137 3920 to find out how we can help

Attendee Full name.