
Daily impact, powered by iThenticate:
The challenge and scale of maintaining trustworthy information
The Wikimedia Foundation is a non-profit organization that operates a range of open-access content creation projects–most notably, Wikipedia, the online encyclopedia. Of the belief that reliable information should be free and accessible to everyone around the world, Wikimedia relies on its community of volunteers to write and maintain content in support of its mission.
Since its first encyclopedic entry in 2001, Wikipedia has established an enduring legacy as the largest, most-read reference work in history, requiring a tremendous amount of work behind the scenes to sustain the quality and integrity of published content. Honoring their commitment to human-led content moderation, dedicated volunteers or "Wikipedians" take on this responsibility to manage ongoing edits in this living repository and address violations such as vandalism, copyright infringement, plagiarism, or citation issues when they occur.
Wikipedia’s volunteer community includes nearly 260,000 active users every month, and the site is edited an average 342 times per minute, pointing to the sheer scale of Wikipedia’s content moderation efforts and why automation is vital to their workflow.
In 2014, the Foundation adopted iThenticate, Turnitin’s research integrity solution, to integrate with their internal moderation tool—CopyPatrol, for tracking edits and possible violations.
Looking to address gaps in their capacity to surface copyright violations, the Foundation sought to:
- Safeguard against content integrity breaches including copyright violations across Wikipedia.
- Embed automation to improve consistency and efficiency in content moderators’ workflows.
- Distinguish genuine violations from false positives within mirrored content republished from Wikipedia itself.
The formalization of the Wikimedia Foundation and Turnitin’s partnership in 2024 was a natural progression to improve scalability of content checking across every edit and all possible languages, upholding Wikipedia’s standards as knowledge evolves.
iThenticate brings newfound reliability to Wikipedia’s content moderation workflow
The stakes couldn’t be higher in maintaining a collaborative encyclopedia in which people worldwide expect trustworthy, timely information.
Beyond the fact-checking in Wikipedia’s content moderation, there are further integrity risks to overcome:
- Plagiarized text
- Unattributed material
- Copyright violations
Wikipedia enforces zero tolerance for such breaches to uphold community guidelines and the Wikimedia Foundation’s legal obligations.
Automation is key to making this moderation process consistent and scalable, with Wikimedia’s volunteers determining that iThenticate outperformed competitors in both accuracy and reliability. “The users spoke for us, and it was clear that people preferred iThenticate”, says Wikimedia’s software engineer, Leon Ziemba.
How iThenticate adds value to Wikipedia’s content moderation workflow
Similarity checking
Wikipedia’s CopyPatrol tool tracks every user edit and submits to iThenticate for comprehensive similarity checking against a database of 54 billion current and archived web pages, and premium content from publishers in every major discipline and in dozens of languages.
Source verification
Previously, Wikipedia relied on community-developed tools that compared text similarity against search engine results but offered no insight into the original or archived source. Detailed source data in iThenticate’s Similarity Report helps moderators better understand matches and investigate possible copyright violations.
Match exclusions
iThenticate’s match exclusion features cater to Wikipedia’s unique challenge of flagging and excluding ‘backwards copy’—content copied from Wikipedia itself, which is not a violation. By allowlisting these ‘mirror sites’ in match comparison and running checks for plagiarism on remaining content, they can avoid false positives.
Managing backwards copy
Given Wikipedia encourages distribution of its content provided there is attribution back to the site, iThenticate is a game changer in managing backwards copy. It overcomes the need for tedious, manual investigation methods that were rarely straightforward: “I don’t know how, but iThenticate knows better!”, remarks Leon.
Wikipedia recognizes that while automation is critical to surface content violations, editorial decision-making must lie with humans who can apply their judgment and make that final determination. Community-developed tools afford users some insight into match sources, but these steps are slower and less intuitive than iThenticate. Leon shares that “Volunteers generally go straight to iThenticate’s Similarity Report to make their final conclusion on whether it’s a copyright violation.” He adds that while content moderators are a far smaller portion of their total users, their efforts and enthusiasm for iThenticate make a huge impact: “if they’re happy, we’re happy—our job is to make their lives easier.”
Biggest results and impact observed:
- Widespread adoption of iThenticate means content violations are being caught more consistently, saving moderators’ time and ensuring nothing goes undetected.
- Insights from iThenticate’s Similarity Report remove source ambiguity and inform editorial decision-making with unprecedented speed and accuracy.
- iThenticate’s match exclusions heralded as a solution for mirror sites and filtering out backwards copy to refocus efforts on flagging true violations.
The path ahead for Wikimedia in preserving online knowledge
The Wikimedia Foundation has witnessed many changes in the digital landscape, and now, the increasing role of generative AI and automation in content creation. Wikipedia’s value as a human-driven source of truth is an important anchor while embracing innovation including AI, and as they put it, “we remain committed to ensuring that these technologies support—not replace—human judgment, accountability, and ethical decision-making.”
Wikimedia and Turnitin’s partnership is an alignment of shared values around quality and integrity of information. With iThenticate integrated into Wikipedia’s moderation workflow, Wikimedia strengthens its ability to uphold content integrity at scale and provide content stewardship in an evolving AI landscape.