Plagiarism Detection Software and Its Appropriate Use

Megan von Isenburg, Marilyn H. Oermann, and Valerie Howard

Nurse Author & Editor, 2019, 29(1), 4

Plagiarism by authors in manuscripts and by students in assignments is not a new problem, but it has been compounded by the ease of “copying and pasting” content from articles and other information sources on the Internet. Using content developed by others may be intentional but more often, this is an inadvertent error. With software tools that are currently available, editors, teachers, reviewers and others can more easily identify plagiarism and do not need to rely on their own abilities to recognize similarities to previously published text (Higgins, Lin, & Evans, 2016). Plagiarism detection software can be used to scan author manuscripts and student papers in a few minutes, matching what they have submitted to already published work. This article examines plagiarism detection software and provides some recommendations for appropriate use by editors and authors.

What is Plagiarism Detection Software?

Plagiarism detection software assesses the similarity of content in papers with published literature and other types of information. The software compares the author’s text against abstracts and citations in PubMed/MEDLINE; literally millions of journal articles, conference proceedings, and books from leading publishers (including Elsevier, Lippincott, Ovid, Sage, Springer, and Wiley Blackwell, among many others); and varied databases such as EBSCOHost, Gale InfoTrac, and ProQuest. In addition, plagiarism detection software searches the Internet for similar content. iThenticate, a leading software program, has its own web crawler that indexes more than “10 million web pages daily” (iThenticate, 2018). With plagiarism detection software, all types of documents can be checked:  manuscripts, written assignments for courses, grants, theses and dissertations, other scholarly projects, and other types of reports.

With some plagiarism detection products, such as iThenticate, documents uploaded for assessment and the results are stored in their private and secure database, available only to the author who can delete the files from the system. Documents are not shared and cannot be used by others. However, this is not true for some other software programs, and authors are cautioned to investigate this before using.

Turnitin is a product designed to check the originality of student papers. It was made specifically for classroom use and for reviewing student work. Turnitin can be integrated within learning management systems for students to review their own papers before submitting them and teachers to then assess those papers for plagiarized content. In addition to searching databases and the Internet similarly to iThenticate, Turnitin compares students’ papers against millions of other student papers in its database (Turnitin, 2018).

Use in Academic Publishing

Numerous health sciences and biomedical journals worldwide use software to detect plagiarism in manuscripts submitted for publication (Butler, 2010; Garner, 2012; Roach, Gospe, Ng, & Sahin, 2014). Despite this widespread adoption, there is not a standard for what similarity rate, or percentage of text matching that of another document, constitutes plagiarism. Similarity rates in manuscripts as low as 15% and as high as 80% have been used to operationally define plagiarism (Garner, 2012; Higgins et al., 2016; Kalnins, Halm, & Castillo, 2015; Park et al., 2017).

The prevalence of plagiarism has been measured in different settings. Research using plagiarism detection software has uncovered plagiarism rates of between 6% and 23% of scanned manuscripts (Butler, 2010). One study suggested that 3000 papers added to MEDLINE each year have strong similarity to papers already indexed in MEDLINE (Garner, 2011). Park et al. (2017) detected plagiarism (as defined as a 20% similarity index) in 9% of articles scanned, though this rate declined over a 5-year period. Despite the labor and economic costs of scanning articles with plagiarism detection software, one study of plagiarism in two radiology journals concluded that the process is a cost-effective and important step for avoiding the risks of publishing plagiarized articles (Kalnins et al., 2015).

Editors and journal staff using software to detect plagiarism should consider the role of text recycling or “unattributed rehashing” (Garner, 2012) from authors’ prior manuscripts. Recent guidance from the publishing platform BioMed Central and the Committee on Publication Ethics (COPE) notes that some text recycling, also known as self-plagiarism, is justified. Those interpreting similarity rate reports from plagiarism detection software should take into consideration the amount of text recycling, where it occurs in the article, and whether it is acknowledged (BioMed Central, n.d.). In some cases, text recycling is positive, as it can provide a standard way of describing the same methods that resulted in multiple publications (Moskovitz, 2017).

Use in Student Papers

In an integrative review on plagiarism in nursing education, Lynch et al. (2017) argued that it is important to address plagiarism in the academic setting because of potential links to professional misconduct. They identified self-reported plagiarism rates among undergraduate nursing students as being between 38% to 60%, but caution that not all plagiarism is deliberate. Some accidental plagiarism can be due to poor organization, a lack of skills, and a lack of knowledge about good writing practices.

Similar to publishers, many graduate and professional academic programs have implemented plagiarism detection software as a means to detect plagiarism in assignments that are turned in for a grade (Dahl, 2007; Marshall, Taylor, Hothersall, & Perez-Martin, 2011; Marusic, Wager, Utrobicic, Rothstein, & Sambunjak, 2016; Walker, 2010); as a tool for teaching writing skills (Hampton, 2018); and as a method for providing formative feedback to students on writing assignments before the assignment is due (Whittle & Murdoch-Eaton, 2008). Student perceptions of the use of plagiarism detection software to screen their writing are reported as being fairly positive (Dahl, 2007; Whittle & Murdoch-Eaton, 2008), but teachers should be mindful to respect student privacy and should gain student approval if the software stores a copy of the student’s work for future use (Brinkman, 2013).

In teaching about plagiarism and ethics in writing, one study of two master’s level health programs found that software alone is not enough to reduce the rate of plagiarism in assignments, but that adding lectures and modules on plagiarism resulted in lower plagiarism rates (Marshall et al., 2011). A systematic review of interventions to promote publication integrity found that while results varied across studies, plagiarism software, practical exercises, and other active learning techniques reduced plagiarism in some settings (Marusic et al., 2016).

How Does It Work?

After uploading the document at the product’s website, the software highlights text that matches text in its database. This alerts authors to check this content in their own paper before submitting it, and editors and teachers to review the text that is similar to published work. The text that is highlighted includes content that has been cited properly by the author and thus is not plagiarized as well as text that is similar to published content. The highlighted text is used to generate a score that represents the percentage of similarity between the author’s document and published content. Varied colors are used for the highlighting to represent different percentages of similar text.

How to Use It

For responsible use of plagiarism detection software, editors and teachers cannot make a decision about text being plagiarized based only on the score generated by the software:  they need to review every segment of highlighted text to determine if the content is too similar to published work or has been properly cited and can be used. Some similarity is acceptable, such as commonly used terms and phrases in nursing, lists of author names in the text (for example, when using the American Psychological Association reference style), and some text recycling. Decisions are needed by editors and publishers as to whether students’ theses and dissertations submitted to ProQuest are considered a form of published content. Note that both the title page and reference list should be excluded from the document when checking for plagiarism.

If using the software as an author, it is worthwhile to check the document for plagiarism about half way through the paper. This may help identify some phases and sentences that are too similar to published work, and need to be revised, before the entire paper is written.

Comparison of a Few Products

There are many plagiarism detection software programs including some that are free. Tables 1 and 2 provide a comparison of iThenticate and Turnitin with two free products.

Table 1. Comparison of iThenticate and Turnitin

Table 2. Comparison of Two Free Products, Grammarly and HelioBlast

Plagiarism remains an ongoing issue in publishing and in academia, and software programs are available to detect similarities to published content, previously submitted student papers, thesis, dissertations, and websites. These free or for-purchase software programs can assist new writers with identifying instances of ‘unintentional’ plagiarism, and help editors and faculty members to identify all types of plagiarism. While using any plagiarism detection software, the user must be vigilant about reviewing the detailed results. A similarity score or index does not, alone, indicate plagiarism as properly referenced materials may be documented as ‘similar’. In addition, some plagiarism detection sites retain ownership of the draft submissions in the database and use these submissions for future similarity checks. While the software programs can be extremely helpful in detecting and counseling authors about plagiarism, we recommend that editors, faculty, researchers, and students receive proper instruction on the use, ultimate retention of submitted documents, meaning of results, and consequences of using any plagiarism detection system before implementation.


 About the Authors

Megan von Isenburg, MSLS, is Associate Dean for Library Services and Archives at Duke University School of Medicine, Durham, North Carolina, USA.

Marilyn H. Oermann, PhD, RN, ANEF, FAAN, is Thelma M. Ingles Professor of Nursing, Duke University School of Nursing, Durham, North Carolina, USA. She is Editor of Nurse Educator and the Journal of Nursing Care Quality.

Valerie Howard, EdD, MSN, RN, CNE, FAAN, is Associate Dean for Academic Affairs and Professor of Nursing at Duke University School of Nursing, Durham, North Carolina, USA.

