An introduction to the techniques used to prepare, integrate, manage and visualise complex data using modern software environments. Essential to students needing to manage data in science, business, or humanities.
Data Science skills are in increasing demand in both industry and academia. Being able to effectively and safely manipulate and model data will be a key strategic advantage for future employment. COMP 120 introduces the fundamental concepts of data science to students through practical use of the industry-standard software environment R. You will learn how to program in R, how to effectively manage and manipulate data in R, and be exposed to the "round trip" of data science (Import, Tidy, Transform, Visualise, Model, and Communicate).
Upon completion of COMP 120, you will be well-equipped to embark on your own data acquisition and management in R, as well as be excellently-prepared for other papers that use R for analysis and modelling.
Paper title | Practical Data Science |
---|---|
Paper code | COMP120 |
Subject | Computer and Information Science |
EFTS | 0.15 |
Points | 18 points |
Teaching period(s) | Semester 1
(On campus)
Semester 2 (On campus) |
Domestic Tuition Fees (NZD) | $1,141.35 |
International Tuition Fees | Tuition Fees for international students are elsewhere on this website. |
- Schedule C
- Arts and Music, Commerce, Science
- Contact
Associate Professor Tony Savarimuthu
Department of Information Science
tony.savarimuthu@otago.ac.nz- Teaching staff
Associate Professor Tony Savarimuthu
Department of Information Science- Paper Structure
This paper covers the following key themes:
- Introduction to programming in R and RStudio
- Importing and tidying data in R ("Data Wrangling")
- Plotting and visualising data in R
- Data aggregation and summarisation in R
- Semi-structured data manipulation using Web Scraping as an example
- Building models in R
Note that the modelling aspect in this paper introduces the framework through which models are built in R, and is not intended as a substitute for other modelling papers.
- Teaching Arrangements
2 x one-hour lectures per week
1 x two-hour lab per week
- Textbooks
Wickham and Grolemund, R for Data Science O’Reilly, 2016 (available online)
- Course outline
- Graduate Attributes Emphasised
- Communication, Information literacy, Research.
View more information about Otago's graduate attributes. - Learning Outcomes
Upon completion of COMP 120, students should be able to:
- Automate data manipulation tasks using a contemporary software package
- Develop basic scripts to perform data management tasks
- Use relevant software to clean, manage and integrate data
- Create visualisations from data sources using appropriate software
- Manage and share data projects using version control systems and repositories