Red X iconGreen tick iconYellow tick icon


    An introduction to the techniques used to prepare, integrate, manage and visualise complex data using modern software environments. Essential to students needing to manage data in science, business, or humanities.

    Data Science skills are in increasing demand in both industry and academia. Being able to effectively and safely manipulate and model data will be a key strategic advantage for future employment. COMP 120 introduces the fundamental concepts of data science to students through practical use of the industry-standard software environment R. You will learn how to program in R, how to effectively manage and manipulate data in R, and be exposed to the "round trip" of data science (Import, Tidy, Transform, Visualise, Model, and Communicate).

    Upon completion of COMP 120, you will be well-equipped to embark on your own data acquisition and management in R, as well as be excellently-prepared for other papers that use R for analysis and modelling.

    About this paper

    Paper title Practical Data Science
    Subject Computer and Information Science
    EFTS 0.15
    Points 18 points
    Teaching period(s) Semester 1 (On campus)
    Semester 2 (On campus)
    Domestic Tuition Fees ( NZD ) $1,173.30
    International Tuition Fees Tuition Fees for international students are elsewhere on this website.
    Schedule C
    Arts and Music, Commerce, Science

    Associate Professor Tony Savarimuthu
    Department of Information Science

    Teaching staff

    Associate Professor Tony Savarimuthu
    Department of Information Science

    Paper Structure

    This paper covers the following key themes:

    1. Introduction to programming in R and RStudio
    2. Importing and tidying data in R ("Data Wrangling")
    3. Plotting and visualising data in R
    4. Data aggregation and summarisation in R
    5. Semi-structured data manipulation using Web Scraping as an example
    6. Building models in R

    Note that the modelling aspect in this paper introduces the framework through which models are built in R, and is not intended as a substitute for other modelling papers.

    Teaching Arrangements

    2 x one-hour lectures per week

    1 x two-hour lab per week


    Wickham and Grolemund, R for Data Science O’Reilly, 2016 (available online)

    Course outline

    View the most recent Course Outline here

    Graduate Attributes Emphasised
    Communication, Information literacy, Research.
    View more information about Otago's graduate attributes.
    Learning Outcomes

    Upon completion of COMP 120, students should be able to:

    1. Automate data manipulation tasks using a contemporary software package
    2. Develop basic scripts to perform data management tasks
    3. Use relevant software to clean, manage and integrate data
    4. Create visualisations from data sources using appropriate software
    5. Manage and share data projects using version control systems and repositories


    Semester 1

    Teaching method
    This paper is taught On Campus
    Learning management system

    Computer Lab

    Stream Days Times Weeks
    Attend one stream from
    A1 Tuesday 14:00-15:50 9-13, 15-22
    A2 Tuesday 16:00-17:50 9-13, 15-22


    Stream Days Times Weeks
    A1 Monday 14:00-14:50 9-13, 15-22
    Tuesday 09:00-09:50 9-13, 15-22

    Semester 2

    Teaching method
    This paper is taught On Campus
    Learning management system

    Computer Lab

    Stream Days Times Weeks
    Attend one stream from
    A1 Monday 17:00-18:50 30
    Friday 10:00-11:50 30-35, 37-42
    A2 Tuesday 17:00-18:50 30
    Friday 12:00-13:50 30-35, 37-42


    Stream Days Times Weeks
    A1 Monday 14:00-14:50 29-35, 37-42
    Tuesday 09:00-09:50 29-35, 37-42
    Back to top