The Effect of Authenticity of Research Data on Compliance with Benford's Law

Author(s)

Luke B. Marazzo

School Name

Spring Valley High School

Grade Level

10th Grade

Presentation Topic

Math and Computer Science

Presentation Type

Non-Mentored

Abstract

Scientific research fraud can impact patient care, divert research resources, influence health care policies and contribute to an erosion of trust in research institutions. In this global world, fabricated or falsified scientific research can be published online immediately. Fraudulent data has also been published in well-respected medical journals and relied upon to make financial, scientific, and health care decisions. The purpose of this experiment was to determine whether data from authentic scientific research papers would comply with Benford’s Law more closely than data from fabricated and falsified papers. The goal was to determine whether compliance to Benford’s Law could be effective at identifying false or fabricated data in scientific research papers. Benford’s Law is currently used by accountants and investigators to identify fraudulent data in financial transactions. Recent studies suggested that fraudulent research data deviated from Benford’s Law, but no study has examined this finding in context; by testing similar sets of authentic and fraudulent research data. It was hypothesized that if fraudulent data and authentic data were tested for compliance to Benford’s Law, the authentic data would comply more closely. The experiment compared two sets of data to the expected Benford’s Law distributions. Data from thirty-two published research papers were examined. 3,578 datasets from “authentic” scientific papers and 3,472 datasets from “falsepapers were extracted. Two groups of data were then created: one containing aggregate data from the “false” papers; and the other, aggregate data from the “authentic” papers. The results were analyzed with Chi-Square, Z-statistic and Mean Absolute Deviation (MAD) tests. The distributions predicted by Benford’s Law were used as the control. The hypothesis was partially supported. The fraudulent data complied more closely to Benford’s Law in the general “first digit test,” while the authentic data complied more closely in all three statistical measures for the “first order test,” which is the preferred and more comprehensive of the two measures.

Start Date

4-11-2015 1:45 PM

End Date

4-11-2015 2:00 PM

COinS
 
Apr 11th, 1:45 PM Apr 11th, 2:00 PM

The Effect of Authenticity of Research Data on Compliance with Benford's Law

Scientific research fraud can impact patient care, divert research resources, influence health care policies and contribute to an erosion of trust in research institutions. In this global world, fabricated or falsified scientific research can be published online immediately. Fraudulent data has also been published in well-respected medical journals and relied upon to make financial, scientific, and health care decisions. The purpose of this experiment was to determine whether data from authentic scientific research papers would comply with Benford’s Law more closely than data from fabricated and falsified papers. The goal was to determine whether compliance to Benford’s Law could be effective at identifying false or fabricated data in scientific research papers. Benford’s Law is currently used by accountants and investigators to identify fraudulent data in financial transactions. Recent studies suggested that fraudulent research data deviated from Benford’s Law, but no study has examined this finding in context; by testing similar sets of authentic and fraudulent research data. It was hypothesized that if fraudulent data and authentic data were tested for compliance to Benford’s Law, the authentic data would comply more closely. The experiment compared two sets of data to the expected Benford’s Law distributions. Data from thirty-two published research papers were examined. 3,578 datasets from “authentic” scientific papers and 3,472 datasets from “falsepapers were extracted. Two groups of data were then created: one containing aggregate data from the “false” papers; and the other, aggregate data from the “authentic” papers. The results were analyzed with Chi-Square, Z-statistic and Mean Absolute Deviation (MAD) tests. The distributions predicted by Benford’s Law were used as the control. The hypothesis was partially supported. The fraudulent data complied more closely to Benford’s Law in the general “first digit test,” while the authentic data complied more closely in all three statistical measures for the “first order test,” which is the preferred and more comprehensive of the two measures.