Accuracy Matters: Measuring Value in a Redaction Solution
Accuracy is central to the value a redaction vendor offers.
Vendors responding to an RFP have a responsibility to be as transparent as possible when it comes to their accuracy rate, providing clear and concise answers.
Below is an actual excerpt of a competitor’s response to the San Diego County Redaction RFP:
Provide a description of the method used to measure accuracy. Example: if software missed two (2) social security numbers on a single image, would that count as one or two?
“Software” has several measures of accuracy. For “page accuracy” the above example (i.e.multiple errors on one page) is counted as one error. For “redaction field accuracy” the above example is counted as two errors.
Page Accuracy Percentage = 100 – (Images with errors / Total number of processed images). Field Accuracy Percentage = 100 – (Total field errors / Total processed fields).
Field accuracy percentages are suspect as to their absolute accuracy as such requires you have complete knowledge on the total number of fields processed and the total number that are in error. If you are attempting to calculate an accuracy level to determine imperfections in the processing, obtaining these numbers at 100% accuracy itself is a suspect process. As such“Software” accuracy is measured as the total number of images that are located in the grey queue (undistinguished flagged as not having redaction) divided by the total number of processed images, which is the true indication of images that have errors on them and would be “missed” in any processing. “Software” Accuracy Percentage = 100 – (Images in grey queue that require redaction / total number processed images) In a certification run, grey queue images are validated 2 times by team leads and “Redaction Software” automated reports compute the accuracy of the processing to exceed contracted for levels.
Description of Color coded queues – “Redaction Software” provides for 4 colored queue classes upon completion of automated processing. The queue classes are red, yellow, green, and grey. Red is for mandatory inspection of instrument type that is expected to contain a SSN but none was found; Yellow is for suggested inspection as complex algorithms were invoked to inspect the image (i.e. such as cursive script); Green is for undistinguished having redactions, and Grey is for undistinguished not having redactions. All images in “Redaction Software’s” red, yellow, and grey queues will be inspected, and images in the grey queue that are under a specified quality level will be inspected as well. Inspection in this fashion by “Redaction Vendor” will produce an accuracy result in excess of 99.5%.
“How do you measure accuracy?”
This is a question that appears in every RFP. We believe a straight forward question such as this deserves a straight forward answer. Extract’s method of calculating accuracy is mathematically correct, precise, and unambiguous.
Scenario: Test sample contains 100 images, 25 of the images contain a total of 35 Social Security numbers.
Extract’s Pre-Manual Verification Calculation
If Extract automatically finds 33 SSNs, the calculation is simply 33/35 = 94.3%
Post Verification Calculation
If Extract finds 33 and the manual verification finds the remaining two, the calculation is 35/35 = 100%
And that’s how we calculate accuracy for every project. The math is simple and accurate. Our approach goes a long way toward demystifying the process of measuring redaction accuracy.
We offer you two ways of thinking about accuracy: our way with Extract and our competitor’s way.
A Critique of our Competitor’s Approach to Accuracy:
Beyond the difficulty we had deciphering what claims are actually being made, we find several flaws in this response.
-
The claim that “Field accuracy is suspect as to their absolute accuracy…”. This is only true if you are unwilling to spend the time and resources to make the calculation.
-
The competition clearly advocates page accuracy over field accuracy. Extract disagrees with this logic because it is the social security number that we are protecting, not the page. With page accuracy, if two social security numbers are stolen on one page, this counts as only one error. But to potential identity theft victims and the county, this could represent two lawsuits.
-
The competition calculates accuracy by eliminating – not looking at – whole segments of the sample set. If the county allows the vendor to eliminate whole segments of the sample set, there is no contractual recourse available to force improvements to the redaction process.