Indicative Syllabus

Module 1

RStudio IDE; R language; Data classification and summary statistics.

In this module you will set up the working environment, connect to GitHub and pass the first big hurdle of importing data; you will learn how to do it in the proper way with a command in R. You will learn how to use RStudio IDE for R from its installation to RStudio customisation and files navigation. You will learn good habits and practice of workflow in an R project. Once you get comfortable with the RStudio working environment, you will move on to mastering the key features of R language and be introduced to fundamental statistical concepts. We will not stop there. You will be shown how to turn analyses into high quality documents and presentations with R Markdown. With the knowledge from this lesson you will be able to create reproducible reports straight from your R code allowing you to document your analysis and its results as an HTML, pdf, slideshow or Microsoft Word document.

What you will learn:

  • Basic use of R/RStudio console
  • Good habits for workflow
  • Inputting and importing different data types
  • R environment: record keeping
  • Data classification
  • Descriptive summary statistics
  • base R graphics
  • Authoring R Markdown Reports; Embedding R Code; LaTex to incorporate mathematical expressions
  • knitr to compile dynamic R code

Module 2

Data Wrangling and Visualising Data

In this module you will learn some of the fundamental techniques for data exploration and transformation through the use of the dplyr package. This tidy verse package helps make your exploration intuitive to write and easy to read. You will learn dplyr’s key verbs for data manipulation that will help you uncover and shape the information within the data that is easy to turn into informative plots. You will be introduced to fundamental principles behind effective data visualisation. Through the use of the grammar of graphics plotting concepts implemented in the ggplot2 package you will be able to create meaningful exploratory plots. You will develop understanding about the way in which you should be able to think about the necessary data transformations and summaries that can lead to an informative visualisation.

What you will learn:

  • dplyr’s key data manipulation verbs: select, mutate, filter, arrange and summarise/summarize
  • to aggregate data by groups
  • to chain data manipulation operations using the pipe operator
  • basic principles of effective data visualisation
  • to specify ggplot2 building blocks and combine them to create graphical display
  • about the philosophy that guides ggplot2: grammatical elements (layers) and aesthetic mapping
  • visualising data with maps

Module 3

Statistical Modelling I

Introduction to DA Methodology and bivariate data analysis

In this module you will learn the fundamental concepts of statistical modelling, starting with exploring the data by using appropriate plots and computation of descriptive statistics, and moving on to inferential statistics of parameter estimation and hypothesis testing. You will learn how to match up data types with an appropriate statistical model with the focus on the ‘Measured vs Attribute’ and ‘Measured vs Measured’ types of a bivariate data analysis problem. With the knowledge from this lesson you will be able to conduct basic ‘MvA’ and ‘MvM’ types of statistical analysis, interpret and report its outcomes in an appropriate manner.

What you will learn:

  • The concept of statistical distribution
  • Exploring different data types
  • Common data-analysis methodology; hypothesis testing
  • To investigate relationships between M and A variables
    • Two tail t-test
    • One-way ANOVA
  • To investigate relationships between two M variables
    • Simple linear regression
  • Statistical reporting

Module 4

Statistical Modelling II

Machine learning: Multifactor linear regression and classification modelling

You will be introduced to broad ideas of supervised and unsupervised learning algorithms, as well as a number of core machine learning concepts. In this module, regression modelling is the key modelling construct that will be first introduced and then developed. You will learn the importance of selecting an appropriate causal model depending upon particular circumstances. You will also become familiar with the fundamental models and algorithms used in classification. With the knowledge from this lesson you will not just be able to conduct regression analysis, and interpret and report its outcomes in an appropriate manner, but you will be able to implement key learning algorithms for uncovering patterns and structure within data.

What you will learn:

  • Multifactor linear regression modelling:
    • fitting a linear model
    • validating model: the coefficient of determination
    • interpretation of the parameters and reporting of the nature of the relationships
  • Classification modelling
  • Assessment of the efficacy of the fitted models using a rigorous training and testing framework

© 2020 Tatjana Kecojevic