Friday, February 8, 2013

Review of Coursera's "Computing for Data Analysis" MOOC

I signed up for this course primarily because I wanted to experience a Coursera massively open online course (MOOC), and also because I wanted to improve upon my very rudimentary knowledge of the R programming language.  For those of you too impatient to read the entire review the summary is that I did learn a few things from the course and I'm glad that I took the time to participate, but there were many things about the course that really could be improved.

First, a comment about the word "open" in the acronym MOOC.  Although some of the first MOOCs really were open in the sense that the course content was made available under open source licenses, the Coursera course content is not made available under any open license.  This makes it effectively impossible for students and other instructors to build upon the MOOC and improve it.  Many of my criticisms of Computing for Data Analysis could be very effectively addressed by incremental improvements made in the typical fashion of source software projects.

It is interesting that there are already many open resources for learning R programming in the form of online tutorials and notes. These clearly aren't as interactive and engaging as the Coursera course, but it seems likely that the authors who produced these materials have enough knowledge of the subject so that with the support of instructional designers and producers they could also produce a MOOC on R programming.  Thus the special sauce is clearly not in Roger Peng's expertise in R- lots of instructors have that background.

I also don't believe that the Coursera platform is a unique distinguishing feature- Coursera and its competitors have all been able to produce MOOC platforms that can scale to handling hundreds of thousands of students with very little difficulty.  The technologies involved in these platforms, including lecture videos with subtitles and indexing, online multiple choice tests, banks of test questions, algorithmically generated questions, and even more sophisticated adaptive instruction techniques are already available in many computer based instruction systems.

Many universities already make use of the same textbooks and instructional software- they're not competing with each other by controlling access to the content that they're teaching.  Rather, they're competing with each in other areas, including the quality of interaction between students and faculty, the scholarly reputation of faculty, and various aspects of the college experience unrelated to academics.  In fact, faculty are often encouraged to publish textbooks because this helps to bolster the reputation of an institution.  If universities don't want to compete in the area of content, but they're concerned about Coursera dominating the world of MOOCs, then they have an easy alternative.  They can cooperatively develop truly open content in the form of Open Educational Resources (OER).

OER also offers an escape route for faculty who fear that they may be reduced to nothing more than tutors and graders for courses in which an academic star is featured in recorded lectures.  With OER, instructors have the opportunity to incorporate previous work by other instructors into their own courses, but they also have an opportunity to bring their own work into the mix.  Everyone gets to participate, and instructors can focus on finding the best ways to present the
material that they are most interested in.

For these reasons, I think that OER represent a real and significant alternative to the commercial MOOCs.

Now on to the Computing for Data Analysis (hereafter I'll just use the acronym CFDA)  course itself.

One obvious issue in constructing a MOOC is whether to set the course up in self paced format or to run it on a schedule.  CFDA was run on a four week schedule, with weekly quizzes and programming assignments.  Given the way that the video lectures, quizzes, and programming assignments were presented, there was no technical reason that the course couldn't be presented in a self paced format with students working at their own pace.  The scheduled format did make it easier for participants to discuss the course in the online forums associated with the course.  However, it also caused problems for students who had to travel or were otherwise busy at some point during the four weeks of the course.  To help with this, students were given loose deadlines and a small number of extra days that they could spend on assignments.  I personally found the scheduled format to be overly restrictive, and I would have preferred to see the course presented in a self paced format.

The lectures in this course were presented as online videos of up to about 1/2 hour length.  The videos consisted of prepared slides with the lecturers voice in the soundtrack.  To keep the videos from becoming too boring, there were multiple choice questions interspersed throughout the videos.  There were also some activities for the viewer to try at the R command line.  The production values were comparable to what I regularly see in lecture capture videos from conventional courses-  these were not highly polished videos.

In most conventional courses, lectures are at least loosely tied to a textbook.  Although there are many open source texts on R and also many commercially available textbooks, CFDA was not tied to any particular text.  Using a  commercial text would obviously have been problematic in a "free" course.  Using one of the open source resources would have been equally problematic given Coursera's closed source approach.

Not having a reference is problematic in a course about a programming language, since anyone learning a programming language is going to have to frequently look up the details of syntax and library functions.  It was quite irritating to me in doing the programming assignments that I frequently had a vague memory of some function from the lecture but didn't remember it in enough detail.  I then had the unpleasant choice of going back through the video or going to some other resource to find that information.

I was disappointed by the lack of depth in the lecture material.  This was a course in R programming rather than statistics, so any in-depth discussion of statistical issues would have been inappropriate, but I was surprised by how little discussion there was of the structure and semantics of the R programming language.  For example, R uses a fairly unusual scheme of lexical scoping that can be particularly useful in statistical work.  This was mentioned briefly in one  lecture but not fully explained.

The quizzes were presented in multiple choice format, with the student given  the opportunity to retake a quiz up to 3 times to get a higher score.  Some of the  questions were written in such a way that it was easy to guess the correct answer without actually understanding the question.  On retakes, the same questions reappeared, sometimes with the order of the answers randomized.  The system did not appear to be using algorithmic question generation or even a large library of similar questions.   Although multiple choice questions like these are easy to implement in a MOOC, they simply don't have the depth of questions that require a written answer or actual coding.

The programming assignments were automatically graded- the student would simply run test scripts that tested his code on various example inputs and then upload the resulting output for checking.  Although this kind of automated grading of programming assignments can be quite rigorous if its done with carefully designed tests, the test cases used in this course seemed to be pretty easy.  None of my codes ever lost a point.

In my opinion, this kind of automated grading of programming assignments often leads to poor programming practices, since it pushes students to debug until they "get the right answer", rather than developing a program that know to be correct by analysis.  Furthermore, there's no opportunity to give the student feedback on the proper style of programming in the particular language.  The
programs that I wrote for CFDA were not "pretty", and probably would have come across to an expert R programmer as being non-idiomatic R code.  All that mattered within the course is that they produced the correct answers on the test cases.

Although the online discussion forums were billed as one of the most important features of the course, I was very disappointed by them.  The way that the assignments were set up, students were focused on completing particular tasks rather than on developing a broader understanding of the R system.  As a result, much of the discussion in the forums was at the level of "How do I do in R?"  I found that I was able to more efficiently answer my questions of this sort by using other online resources on R and a bit of Google searching.

Although I think this course could have been much better, I certainly did learn some things about R programming that I wanted to pick up, and overall I'm glad that I took the time to participate in the course.


  1. I believe the course you chose to review was not a good example since my understanding was that it was really a prep course for the Data Analysis course that required R programming.
    The rigid schedule was to ensure students would complete the prep before the follow on course began.
    I agree that these courses would improve if they were opened but really considering how far we have come in such a short time it is hard to fault the system.

    1. OK, but then is there anything that goes on in the Data Analysis course that requires everyone to take the course at the same pace? I believe that it's taught using the same structure of automatically graded quizzes and assignments.

  2. I also participated in the course and totally agree. Thanks for sharing your ideas on OER related to MOOCs, I had not thought of this connection before.

    If you're interested in my opinion to the course, maybe Google can help you translate my blog posts about it: