About the course:
Our Machine Learning and Visualisation for Data Mining with R training course, while something of a mouthful, is essentially about exploring and generating value from your data.
This course provides an overview of methods of data exploration and covers some of the more important techniques in data analysis, i.e. looking for patterns and meaning in your data.
R is a language designed expressly to provide a statistical programming environment that includes extensive graphical capabilities. It has rapidly become the de facto standard for analysts across many disciplines.
You will learn how to produce general summary statistics and how to explore your data visually using graphical methods. You’ll also learn how to rearrange your data using cross tabulation and contingency tables to help spot potential patterns. You will also learn some of the basics of machine learning by cluster analysis (unsupervised machine learning) and regression analysis (supervised machine learning).
Learning outcomes
- Summarising data numerically with R
- Tabulating data with R
- Visualisation - Graphical summary of data with R
- Unsupervised Machine Learning with R
- Supervised Machine Learning with R
Who should attend
Anyone who needs to analyse data will find this course useful. The methods demonstrated here are applicable to multiple disciplines.
Prerequisites
You need to know some basics of working with R but knowledge of statistics or analytics is not essential. You’ll need a laptop computer with R installed. No other software is needed although a spreadsheet program would be a useful asset.
On site
We are very happy to put together an on-site Data Mining with R Training workshop based on your specific requirements, and can take into account your existing programming experience level and types of statistical analysis project to which you will be applying R.
Course syllabus
Summarising data numerically
- Getting averages and other summary statistics
- R-commands - mean, median, summary, apply, tapply, aggregate
- Gaining some initial insights into your data.
- Averages and other summary statistics
Tabulating data
- Frequency (contingency) tables. Cross-tabulation
- R commands - table, ftable, xtabs, colMeans, prop.table, margin.table, addmargins
- How to transform data using contingency tables and cross-tabulation
Graphical summary of data
- Summary graphs. Exploratory graphs
- R commands - stripchart, dotchart, plot
- Visualizing data
- Graphs to help summarize data
- Graphs to help explore data and view potential patterns
Unsupervised machine learning
- Dissimilarity, Hierarchical clustering, K-means, Partitioning around Medoids, Fuzzy analysis
- R commands - dist, hclust, cutree
- Looking for clusters in your data
- Methods of cluster analysis include hierarchical clustering, partitioning methods and agglomerative nesting
- R commands - kmeans, pam, diana, agnes, fanny
Supervised machine learning
- Regression analysis, Linear models, Curvilinear models, Non-Gaussian models
- R commands - lm, coef, resid, plot, abline
- Exploring relationships between factors in your data
- Regression analysis includes linear and curvilinear models as well as non-Gaussian regression
- R commands - glm