Reference checking: do you check references manually or do you use a machine to save five working weeks of your year? Your choice1
I want to talk to you about references — the bibliographic sort that almost every academic paper and book contains at the end — and citations, the callouts for their reference partners. They’re not even a new invention, so at the dawn of the 21st century, I remember thinking: “Why have I had to check these for so long? Why hasn’t anyone tried to make it easier to check them?” I’d sought in vain to find a simple and affordable program to help with checking references and citations. In the information age, there must be something out there to do this, but my searches drew a blank. There were a small number of in-house and commercially available programs to work on references, but these seemed prohibitively expensive, inflexible, and restrictive in their functionality. Most seemed to be bibliographic authoring software to help authors compile reference lists and citations as they prepared their work prior to submission for publication, but none provided a simple solution to check references and citations after the text had been written.
With a growing number of in-house and freelance copy-editors, there seemed to be a niche in the market for this type of software. After lengthy discussions with a programmer colleague, we came up with the name “ReferenceChecker” with the following desiderata:
- It should be affordable
- It should be easy to install and use, with minimum prior knowledge of using add-ins in Word
- It should be flexible yet require minimum input from the user
- It should be fast to use
- Its user interface should be clean, simple, and easy to understand
- It should present clear results and point the user directly to the exact place in the text where the discrepancies can be found
- It should understand the character sets of most European languages, including letters with diacritics that are variations on letters in the Latin alphabet
- It should recognize and check the most commonly used referencing systems, APA, Harvard, and Vancouver, and their numerous variants.
Could all this be done? Would it prove too much for a machine, to be able to carry out the manual checking tasks of an experienced copy-editor? We set about programming the code, testing, retesting, and reconfiguring the code many times to produce a working prototype. It worked in the early stages, and with the variety of examples of references and citations, but we soon found variants of names, years, punctuation, ordering, and so on that would warrant revisions to the software. Testing and development took about 5 months; finally, in early 2005, we were ready to unleash the beast into the Wild World Web.
It has been a considerable challenge, though not insurmountable, to iron out problems along the way. Things arose that we didn’t think would arise, and there were less-than-straightforward elements in bibliographic referencing that would require sophisticated code. Talking of code, the software comprises a Visual Basic application implemented in several thousand lines of code. Its size and complexity belie its simple interface and experience for the user; the user shouldn’t have to worry about how big or complex the software is, only how quick and easy it is to use.
Throughout its development, we introduced a number of additional useful features in ReferenceChecker:
- hyperlinked results that could be clicked on or scrolled through to take the user to the exact place in the text where the mismatch was found: either a reference item with no matching citation or a citation with no matching item in the reference list;
- the option to check with or without case sensitivity in author names;
- a feature to copy and paste the results;
- the option to view the results as a list of either (a) every single citation and reference item listed and checked or (b) mismatches only.
In conceiving the idea for ReferenceChecker, I admit I had a few qualms. Would the “brain” of a machine perform as accurately and intelligently in this case as the brain of a human? When a human checks references and citations, they have learned how to recognize and compare the separate elements that constitute each reference and citation. My underlying apprehension was that if software could do the amount of work that would normally take a human, say, 15 minutes in an average-sized paper in a small fraction of the time, i.e. a few seconds, could this open a Pandora’s box of expectations? The answer is quite complex. By using, and still being in full control of, software to check the parts of a text that can be processed more quickly than by the laborious manual method, the copy-editor can concentrate on other, more important tasks that cannot be done by a machine, while maintaining a high level of accuracy. We’re still a long way off from machines being able to do the complete work of a copy-editor on a text, because of the vast complexities of human written language. With automated grammar, consistency, spell-checking, and text-analysis software, there are frequently false errors highlighted, because the software hasn’t been programmed to “look around” either side of a word or phrase to detect and understand the specific context and meaning intended by the author. In some cases, the software can’t possibly know if a spelling or punctuation is correct or not in a specific context, because it doesn’t have a human’s life experience.
- “man eating shark” or “man-eating shark”? Which is correct in the context of the subject matter? Either could be used, depending on the context. Should the software flag either of these as being incorrect?
- “It’s one mistake” or “Its one mistake”? Either could be correct, with or without the apostrophe, depending on the surrounding text.
- “… his parents, John Allen, and Rose Wood” or “… his parents, John Allen and Rose Wood”? Does this refer to two or four people? It could be either.
- “principal was investigated” or “principle was investigated”? Either is acceptable, but would it be appropriate for an automated checker to question the usage of either of these? Would it be remiss for it not to report a possible spelling error?
We encountered a few challenges while developing ReferenceChecker, and to deal with these, several sets of rules were implemented. There were rules to detect references and citations; rules to parse the references and citations; and rules to extract author surnames and years of publication. A number of “post-processing rules” were then incorporated to clean up the extracted references and citations, ignore spurious names, and make sense of all surnames and years of publication. A final set of rules compared references and citations, and generated the list of mismatches for the user to look through. In so doing, ReferenceChecker has been developed with intelligent design, working in a way as close as possible to how a human would recognize references and citations, that is, if it looks like a citation or reference to the human eye, ReferenceChecker will recognize and check it. It’s heartening to know we’ve saved many people many hours of working time, and to receive feedback and suggestions from our users. For the average full-time copy-editor, who might work on, say, 15 average-sized papers in a working week (e.g. 30 pages of A4 with five pages of references), and estimating about 15 minutes of reference-checking time per paper, we’ve calculated time savings of up to 174 working hours per year—that’s almost five working weeks!
Paul Sensecall is a full-time freelance (www.pseditorialservices.com) with over 20 years’ experience of copy-editing and proofreading academic material in the areas of science, technical, medical; social sciences and humanities. A free, fully functional, trial version of ReferenceChecker can be downloaded from www.goodcitations.com and licences for unlimited use of the software can be purchased from the site.