Back to documents

Document 18: 02_tutorial.pdf

Status: ready

S3 bucket: comp5349-pdf-596451156796

S3 key: uploads/1780636829_24ffdbfa7c9441d8ad57fc23b5e70b3d_02_tutorial.pdf

Uploaded: 2026-06-05 05:20:29.123754+00:00

Processing Runs

Strategy Status Chunks Average length Processing time Error
Fixed-size chunking completed 11 948.0 0.417 sec
Paragraph-aware chunking completed 9 935.3 0.404 sec

Sample Chunks

Fixed-size chunking

Chunk 0 - 1000 characters

# Page 1

COMP5310 Tutorial Week 2
Main Goals: 
▪ Differences between types of data and applicable measures of dispersion and central
tendency.
▪ Idea and first experience with data cleaning.
▪ Data exploration using Microsoft Excel Live.
Exercise 1: Open the file and clean the data. 
1. Download the file WFH-Survey-Responses-NSW-dirty.xlsx from Canvas (Edited version 
from original 2020 survey found at https://data.nsw.gov.au/data/dataset/nsw-
remote-working-survey).
2. Open the file with Microsoft Excel Live.
a. Open your University email account.
b. Click on the blue cloud icon from the left-hand side menu. This will open 
OneDrive on a new tab.
c. On the top menu, click on Upload. A pop-up window will open to search for 
the file you want to upload to OneDrive. Navigate to the folder where you 
saved the file you downloaded from Canvas and select it.
d. The file will now be uploaded ...

Chunk 1 - 1000 characters

here you 
saved the file you downloaded from Canvas and select it.
d. The file will now be uploaded to your OneDrive.
e. Wait until the file is uploaded, and click on it once it appears on the lists of 
files. This will open the file on Excel Live.
3. Look at the responses and see if you need to clean the data.
Some observations: 
- Some responses have NA or an empty cell when the person didn’t respond to that
specific question. When cleaning the data, you can decide to keep all those responses,
fill the empty cells with NA, and count that as a possible answer . Otherwise, you can
remove the whole row of responses. It all depends on what your goal is and if the
number of affected rows would have a great impact on the result.
- Some questions have responses in different formats. You need to choose a preferred
format and update all the responses to that format.
- Multi-valued data (comma s...

Paragraph-aware chunking

Chunk 0 - 8 characters

# Page 1

Chunk 1 - 2024 characters

COMP5310 Tutorial Week 2
Main Goals: 
▪ Differences between types of data and applicable measures of dispersion and central
tendency.
▪ Idea and first experience with data cleaning.
▪ Data exploration using Microsoft Excel Live.
Exercise 1: Open the file and clean the data. 
1. Download the file WFH-Survey-Responses-NSW-dirty.xlsx from Canvas (Edited version 
from original 2020 survey found at https://data.nsw.gov.au/data/dataset/nsw-
remote-working-survey).
2. Open the file with Microsoft Excel Live.
a. Open your University email account.
b. Click on the blue cloud icon from the left-hand side menu. This will open 
OneDrive on a new tab.
c. On the top menu, click on Upload. A pop-up window will open to search for 
the file you want to upload to OneDrive. Navigate to the folder where you 
saved the file you downloaded from Canvas and select it.
d. The file will now be uploaded to your On...

Query Comparison