Automated analysis of cognitive presence in MOOC discussions
The Community of Inquiry (CoI) framework  has been broadly used to analyse learning experience in online discussion forums for two decades. Cognitive presence, which is a primary dimension of the CoI framework, manifests the reflection of (re)constructing knowledge and problem-solving processes in the learning experience . Researchers doing text analysis using machine learning techniques are making promising contributions to analysing phases of cognitive presence automatically – in online discussions. However, most studies of automated cognitive analysis focus on improving the accuracy and reliability of the classifiers. They ignored that another purpose of applying machine learning techniques in educational research should be to pinpoint research bias that scholars neither intended to nor can have found without computer support. This session will present the example of ‘research bias’ discovered from both manual and automated classification of cognitive phases, provoking scholars to rethink and improve the conflicting part in the taxonomies of cognitive presence under MOOC context.
The manual-classification rubric that used to label discussion messages of a target MOOC combines Garrison, Anderson and Archer’s  scheme with Park’s  revised version. The rubric describes four phases of cognitive presence (i.e. triggering event, exploration, integration and resolution), and indicators of each phase in online discussions. We reported the average inter-rater reliability between two human raters achieved 95.4% agreement (N = 1002) with a Cohen’s weighted kappa of 0.96. Interestingly, we found the average inter-rater reliability decreased to 80.1% after increasing the size of data samples (N = 1918) and the number of human raters to three. After training the automated classifiers to predict phases of cognitive presence, the confusion matrix implies that most of the disagreements between computer raters occurred between adjacent phases of cognitive presence. The disagreements between human raters also have the same problems. We assume the additional categories may exist between cognitive phases in such MOOC discussion messages. These details will be discussed during the presentation.
 D. Garrison, T. Anderson, and W. Archer, “Critical Inquiry in a Text-Based Environment: Computer Conferencing in Higher Education,” Internet High. Educ., vol. 2, no. 2, pp. 87–105, 1999.
 D. Garrison, T. Anderson, and W. Archer, “Critical thinking, cognitive presence, and computer conferencing in distance education,” Am. J. Distance Educ., vol. 15, no. 1, pp. 7–23, 2001.
 V. Kovanović, S. Joksimović, D. Gašević, and M. Hatala, “Automated cognitive presence detection in online discussion transcripts,” in Automated cognitive presence detection in online discussion transcripts’ CEUR Workshop Proceedings (vol. 1137), 2014.
 V. Kovanović et al., “Towards automated content analysis of discussion transcripts,” Proc. Sixth Int. Conf. Learn. Anal. Knowl. - LAK ’16, pp. 15–24, 2016.
 E. Farrow, J. Moore, and D. Gasevic, “Analysing discussion forum data: a replication study avoiding data contamination,” 9th Int. Learn. Anal. Knowl. Conf., no. March, 2019.
 C. Park, “Replicating the Use of a Cognitive Presence Measurement Tool,” J. Interact. Online Learn., vol. 8, no. 2, pp. 140–155, 2009.