Analyzing Collected Data
As part of the Web-CAT Project, we have collected a large pile of data on student submission activities. ManuelPerez is very interested in the data from his Virginia Tech CS2 course.
As part of PeteDePasquale's (web page, depasqua at tcnj.edu) dissertation research, he has also collected a large pile of data about student development and compilation activities.
We are trying to devise a plan for how to analyze this data in order to prepare two SIGCSE papers for submission this fall.
This page is a working area where we can write up and inter-connect our ideas.
Matt Jadud's BlueJ Data Collection Extension
We are trying to work out a plan for getting access to and reusing Matt Jadud's work on a BlueJ extension that collects data on student development activities and POSTs the raw data to a web server. He's agreeable to working with us on using his tools, although of course he doesn't have time to spend maintaining or distributing them "officially". Currently, there are several questions we need to figure out to know how best to use Matt's stuff:
- Exactly what data is sent via POST (i.e., what are the parameter names and value types, and what are they intended to contain)?
- Assuming all of the POST requests are being funneled into one (or a few) sql tables, what do these tables look like?
- Do the POST requests contain student identity information?
- What happens when the student is working without being net-connected?
- What steps/actions are involved in cleaning the data and massaging it for data mining?