Plagiarism detection – is technology the solution?

I went to a lunchtime seminar by John Barrie on Turnitin, the plagiarism detection software from iParadigms. I can see the practical merits of Turnitin and I can see that it is scalable into the near future, but I wonder if we have actually identified the correct problem to solve, and whether the Turnitin approach is scalable or even sensible into the future? The idea of the entire internet being fingerprinted is reminiscent of the scenario of enough monkeys and keyboards to produce Mozart … at what level is anything truly original?

The title of the presentation was “Vetting academic work for originality: Saving the world from unoriginality” – very catchy for sure, but perhaps not particularly interesting or realistic. At an undergraduate level, the content of most submitted work is not primarily focused on originality, but on accuracy. When writing a first year psych lab report on “The Stroop Effect”, perhaps there is a real limit to the number of ways of expressing the content before the information actually becomes incorrect in pursuit of originality. Maybe instead of detecting plagiarism, we should be trying to generate assessment tasks which are not affected by plagiarism – rather than have one academic grade one thousand papers, perhaps we would do better to have one academic produce ten papers of different quality on the one topic and have 1000 students grade those 10 papers.

Alternatively, if university staff / student ratios were appropriate so that proper assessment of individual undergraduate students could take place (eg presenting a paper to a tutorial group and then submitting a written version for marking, and having shared marking across tutorials), there would be a disincentive to cheat. The thing that would alert teaching staff to plagiarism would be a mismatch between the ability to present the content orally and in written form. If end-of-semester assessment was by essay style (hand-marked) exams requiring generative capabilities, there would be more opportunity to match student voice with their written output.

At the undergraduate level, there is a serious question to be asked about whether it is more important to be able to generate an original piece of work than to recognise which piece of work most accurately reflects “the right answer” (assuming there is such a thing)? If we can string together appropriate pieces of information wherever they come from to produce a coherent article (be it a “term paper”, an essay, a lab report, a computer program), we at least are showing that we understand the content area appropriately. I believe this is a necessary but not sufficient precursor to being able to produce something original. I actually have grave doubts as to whether true originality at the undergraduate level would be recognised by the average tutor, let alone encouraged or rewarded. It requires substantial academic expertise to evaluate the quality of original work in a discipline area.

It seems that plagiarism is considered a serious issue because we like to claim that a prime objective in teriary teaching is to instill in our students the concept of academic integrity and of scholarship. However, to my mind, academic integrity (coupled with academic freedom) is associated with a whole moral philosophy regarding knowledge, sharing of knowledge and how academic work contributes to the greater good of human endeavour. Academic values and moral philosophy are taught by example (intellectual and behavioural modelling) rather than by policing. If plagiarism is rampant in the younger generation, we should be looking to the values implicit in our education system rather than to policing strategies to effect cultural change.

If you look at the highly structured curriculum favoured by our secondary education sector and the templated way much of the “knowledge” is presented, it is not surprising that plagiarism is rampant – what is the difference between plagiarising and rote-learning? What is the difference between a “fact” and an “idea” and do facts as well as ideas have citable sources?

In terms of values and behavioural modelling, if you also look at business ethics (or attitudes to speed cameras) in the past 15 years, the emergent theme is that anything that is not expressly forbidden is implicitly allowed. No matter what the written rules say, if it isn’t policed, you’re allowed to do it. And if you’ve got away with doing it for a while, it violates your rights to suddenly start policing it. Steve Vizard and Rene Rivken come to mind on the business front … what did they do that was wrong???

Intellectual property and copyright law seems not to be about integrity and moral philosophy at all, but are much more about how to protect the ability of an individual or institution to make money from creative endeavours rather than to share that creative output with the rest of the community (which in the past funded academic institutions to pursue the creation of new knowledge for the greater good of humanity).

Two other factors which have affected academic integrity in a subtle but seriously insidious way are mentioned in passing below. Both of them affect the behaviour of academics which is then modelled by those who are learning from them creating a different academic culture and set of standards.

1) Measuring research output by number of publications rather than quality of publications (counting is easier than assessing quality) so that there is a strong career incentive to make as much publication mileage as possible out of each random academic idea no matter whether it leads to institutionally-endorsed rampant self-plagiarism, a proliferation of poor-quality journals, and/or a sense of dissatisfaction with the entire peer-review and publishing system.

2) A strong push to “reusable content”, without ever clarifying the difference between acceptable / appropriate reuse and plagiarism – acknowledgement of the source is an obvious difference to a trained academic, but the fine line between paraphrasing and substituting synonyms is a tougher call to make for a layperson. Maybe, in the end, the only difference is the wider vocabulary available to most academics – an academic’s lexicon already contains the synonyms that a layperson searches for in a thesaurus, but the paraphrasing process is still the same – when does restating an idea “in your own words” become stating something original? And do I have to cite that Tom asked this question of me in the corridor tonight or can you believe that I thought of it first? And if I did, have I now “beaten Tom to press” so that he will have to cite me in the future?

Back to reuse of content, consider particularly the concept of reuse and acknowledgement of source in the teaching context (which is often the only context in which students see academics at their work). Clearly the ideas being presented in the classroom are not original because we are teaching people about the current state of agreed-upon knowledege in a field.

Lectures and visual aids associated with lectures provide a context for assigned reading and other research activities. Often, lectures provide a specific context or elaboration on material sourced from “the textbook”. If you now consider how the process of generating lecture resources for a “traditional lecture” has changed during my 20 years as an academic

– (circa 1985) I gathered together a set of slides or overheads illustrating key points, and wrote key points on the blackboard

– (circa 1990) I prepared overhead transparencies with illustrations and key points

– (circa 1995) I prepared Powerpoint presentations which were distributed via an intranet

– (circa 2000) I prepared Powerpoint presentations which were placed on the web

By the early 2000s, in common parlance, the Powerpoint presentation became “The Lecture”, and because it resided in a public place free of the context in which it was presented and the words which were uttered explaining the origin and content of each idea and image, issues of copyright and intellectual property started to arise. The overheads of annotating each idea and image became a disincentive to preparing interesting additional resources for teaching, and the idea that providing enrichment to one group of students but not to all students (where different staff taught different streams) undermined the sense of academic responsibility for teaching material as well as undermining the atmosphere of collegiality.

It seems to me that institutions have only recently become deeply interested in the issue of plagiarism detection in the context of selling curriculum, selling degrees, selling research output and gaining competitive advantage from the intellectual property of their workforce of academics. The sense of academic integrity and moral philosophy associated with being part of an international community of scholars whose combined knowledge belongs to humanity has been seriously eroded by treating academic output as a saleable commodity and applying “business models” to academia using totally inadequate analogies.

I guess one aspect of writing in a blog that is simultaneously a real strength and a serious weakness is about to be demonstrated – I want to post this now because I know I won’t come back to it properly in the next few weeks to fill out the gaping holes in the line of argument. I think I know how to fill them, but I don’t have the time right now. Is it better to put the half baked idea “out there” (even if I’m the only person who goes back to read it) or is it best to let it drown in a sea of other half baked ideas? And furthermore, is this enough to ensure that I at least mark a line in the sand to say “I thought like this on this day, even if I don’t get to rethink it and publish it properly until a lot later on …”

Leave a Reply Cancel reply