the power of machine learning

June 11, 2009

Your standard dot matrix printer often found in doctors offices.

A new study by Backes et al. in the process of being published describes how sensitive information like medical records printed on a dot-matrix printer can be recovered simply by listening to to the printer print the words.

The authors used an array of techniques in this process, including various signal processing and speech recognition techniques. Most importantly, though, they used machine learning, which allows the software–in a sense–to “figure out” what the translation from sound to word is, rather than having humans define a translation. This process, of course, requires an initial (and often long) evolution phase, where the it’s doing the learning, but it ultimately leads to more effective software.

Here’s an example of text they printed from a dot matrix printer:

In computing, a printer is a peripheral which produces a hard copy
(permanent human-readable text and/or graphics) of documents stored
in electronic form, usually on physical print media such as paper or
transparencies. Many printers are primarily used as local
peripherals, and are attached by a printer cable or, in most newer
printers, a USB cable to a computer which serves as a document
source. Some printers, commonly known as network printers, have
built-in network interfaces (typically wireless or Ethernet), and
can serve as a hardcopy device for any user on the network.
Individual printers are often designed to support both local and
network connected users at the same time.

And here’s the reconstructed text they produced with software that listened to the original printing:

In computing, a printer in a peripheral which produces a hard body
(permanent human-readable text and/or graphics) of documents source
in electronic form. usually as physical print media such as pages or
transparencies. Many Printers are primarily used go local
peripherals, end are attached go A printer could or, in most newer
printers; a USB cable go A computer which served de = document
source. some printers, commonly known go network printers; have
built-in network interfaces (typically wireless as ethernet), god
way serve As a hardcopy device for out year we who network.
Individual Printers use often designed so support born local god
network connected users as too some tree.

Not perfect, but pretty darn good, right? The authors report being able to recreate prescriptions (from dummy patients, of course) printed in a busy and crowded doctor’s office under very realistic conditions.

The authors’ main purpose is to demonstrate the vulnerabilities of our medical records to potential eavesdropping from potential insurance companies and other evil people. Perhaps there is some distant danger of this eavesdropping, but–I’ll be honest–I’m not that worried about it.

I’m mostly incredibly encouraged by the promise that machine learning like this demonstrates. I’m somewhat familiar with the ideas and techniques used in this study but never would have imagined they could yield such impressive results. I’m usually not one for the “gee wiz, isn’t technology wonderful?”, but if this study doesn’t make you optimistic about what we’ll be able to in the future, as our techniques and processing power become all the more advanced, I don’t know what does.


One Response to “the power of machine learning”

  1. Fascinating notion this machine learning, I agree. More than the implications for medical records privacy though, I wonder about possible consequences for national security and intelligence gathering. Back in the Cold War years when dot-matrix was king, I’m sure the CIA or NSA (not to mention the KGB) would have loved to exploit such technology in their efforts to penetrate foreign government communications networks. Since our office still employs at least one dot-matrix printer for the printing of law citations, I wouldn’t put it past some FBI field office somewhere to have one as well. Is it possible a similar process could be used to eavesdrop on the sound of someone typing?

