Overview
An introduction to the statistical learning techniques commonly used to analyse high-dimensional (or multivariate) data. Penalised regression, classification trees, clustering, dimension-reduction, bagging, stacking, boosting, random forests and ensemble learning.
The aim of this paper is to introduce students to many of the statistical learning techniques that are now used to analyse high-dimensional data. Students will learn the underlying rationale for each method and gain practice.
About this paper
| Paper title | Modelling High Dimensional Data |
|---|---|
| Subject | Statistics |
| EFTS | 0.15 |
| Points | 18 points |
| Teaching period | Semester 2 (On campus) |
| Domestic Tuition Fees ( NZD ) | $1,103.10 |
| International Tuition Fees | Tuition Fees for international students are elsewhere on this website. |
- Prerequisite
- One of (ECON 210 or FINC 203 or STAT 210 or STAT 241) and STAT 260
- Restriction
- STAT 242, STAT 342, STAT 425
- Schedule C
- Arts and Music, Science
- Contact
- Teaching staff
- Paper Structure
The main topics of this paper are:
- Principal component analysis.
- Exploratory factor analysis.
- Clustering methods.
- Dimensionality reduction.
- Classification methods.
- Tree-based methods.
- Penalised regression.
- Multiple testing.
- Textbooks
Textbooks are not required for this paper.
- Graduate Attributes Emphasised
- Lifelong learning, Communication, Critical thinking, Research.
View more information about Otago's graduate attributes. - Learning Outcomes
On completion of this paper, students will be able to:
- Describe the issues that arise when analysing high-dimensional data
- Use a range of statistical learning techniques to analyse real, high-dimensional data using R
- Determine which is the most appropriate technique for a give research objective
- Interpret the results of the analysis, including any assumptions or limitations
- Write a clear and succinct report on an analysis of high-dimensional data collaborator or potential client