Data Literacy in R > About the course > Indicative Syllabus

Indicative Syllabus

Module 1

RStudio IDE; R language; Data classification and summary statistics.

In this module you will set up the working environment, connect to GitHub and pass the first big hurdle of importing data; you will learn how to do it in the proper way with a command in R. You will learn how to use RStudio IDE for R from its installation to RStudio customisation and files navigation. You will learn good habits and practice of workflow in an R project. Once you get comfortable with the RStudio working environment, you will move on to mastering the key features of R language and be introduced to fundamental statistical concepts. We will not stop there. You will be shown how to turn analyses into high quality documents and presentations with R Markdown. With the knowledge from this lesson you will be able to create reproducible reports straight from your R code allowing you to document your analysis and its results as an HTML, pdf, slideshow or Microsoft Word document.

What you will learn:

Basic use of R/RStudio console
Good habits for workflow
Inputting and importing different data types
R environment: record keeping
Data classification
Descriptive summary statistics
base R graphics
Authoring R Markdown Reports; Embedding R Code; LaTex to incorporate mathematical expressions
knitr to compile dynamic R code

Module 2

Data Wrangling and Visualising Data

In this module you will learn some of the fundamental techniques for data exploration and transformation through the use of the dplyr package. This tidy verse package helps make your exploration intuitive to write and easy to read. You will learn dplyr’s key verbs for data manipulation that will help you uncover and shape the information within the data that is easy to turn into informative plots. You will be introduced to fundamental principles behind effective data visualisation. Through the use of the grammar of graphics plotting concepts implemented in the ggplot2 package you will be able to create meaningful exploratory plots. You will develop understanding about the way in which you should be able to think about the necessary data transformations and summaries that can lead to an informative visualisation.

What you will learn:

dplyr’s key data manipulation verbs: select, mutate, filter, arrange and summarise/summarize
to aggregate data by groups
to chain data manipulation operations using the pipe operator
basic principles of effective data visualisation
to specify ggplot2 building blocks and combine them to create graphical display
about the philosophy that guides ggplot2: grammatical elements (layers) and aesthetic mapping
visualising data with maps

Module 3

Statistical Modelling I

Introduction to DA Methodology and bivariate data analysis

In this module you will learn the fundamental concepts of statistical modelling, starting with exploring the data by using appropriate plots and computation of descriptive statistics, and moving on to inferential statistics of parameter estimation and hypothesis testing. You will learn how to match up data types with an appropriate statistical model with the focus on the ‘Measured vs Attribute’ and ‘Measured vs Measured’ types of a bivariate data analysis problem. With the knowledge from this lesson you will be able to conduct basic ‘MvA’ and ‘MvM’ types of statistical analysis, interpret and report its outcomes in an appropriate manner.

What you will learn:

The concept of statistical distribution
Exploring different data types
Common data-analysis methodology; hypothesis testing
To investigate relationships between M and A variables
- Two tail t-test
- One-way ANOVA
To investigate relationships between two M variables
- Simple linear regression
Statistical reporting

Module 4

Statistical Modelling II

Machine learning: Multifactor linear regression and classification modelling

You will be introduced to broad ideas of supervised and unsupervised learning algorithms, as well as a number of core machine learning concepts. In this module, regression modelling is the key modelling construct that will be first introduced and then developed. You will learn the importance of selecting an appropriate causal model depending upon particular circumstances. You will also become familiar with the fundamental models and algorithms used in classification. With the knowledge from this lesson you will not just be able to conduct regression analysis, and interpret and report its outcomes in an appropriate manner, but you will be able to implement key learning algorithms for uncovering patterns and structure within data.

What you will learn:

Multifactor linear regression modelling:
- fitting a linear model
- validating model: the coefficient of determination
- interpretation of the parameters and reporting of the nature of the relationships
Classification modelling
Assessment of the efficacy of the fitted models using a rigorous training and testing framework