New theory could yield more-reliable communication protocols.
Communication protocols for digital devices are very efficient but also very brittle: They require information to be specified in a precise order with a precise number of bits. If sender and receiver โ say, a computer and a printer โ are off by even a single bit relative to each other, communication between them breaks down entirely.
Humans are much more flexible. Two strangers may come to a conversation with wildly differing vocabularies and frames of reference, but they will quickly assess the extent of their mutual understanding and tailor their speech accordingly.
Madhu Sudan, an adjunct professor of electrical engineering and computer science at MIT and a principal researcher at Microsoft Research New England, wants to bring that type of flexibility to computer communication. In a series of recent papers, he and his colleagues have begun to describe theoretical limits on the degree of imprecision that communicating computers can tolerate, with very real implications for the design of communication protocols.
โOur goal is not to understand how human communication works,โ Sudan says. โMost of the work is really in trying to abstract, โWhat is the kind of problem that human communication tends to solve nicely, [and] designed communication doesnโt?โ โ and letโs now see if we can come up with designed communication schemes that do the same thing.โ
One thing that humans do well is gauging the minimum amount of information they need to convey in order to get a point across. Depending on the circumstances, for instance, one co-worker might ask another, โWho was that guy?โ; โWho was that guy in your office?โ; โWho was that guy in your office this morning?โ; or โWho was that guy in your office this morning with the red tie and glasses?โ
Similarly, the first topic Sudan and his colleagues began investigating is compression, or the minimum number of bits that one device would need to send another in order to convey all the information in a data file.
Uneven odds
In a paper presented in 2011, at the ACM Symposium on Innovations in Computer Science (now known as Innovations in Theoretical Computer Science, or ITCS), Sudan and colleagues at Harvard University, Microsoft, and the University of Pennsylvania considered a hypothetical case in which the devices shared an almost infinite codebook that assigned a random string of symbols โ a kind of serial number โ to every possible message that either might send.
Of course, such a codebook is entirely implausible, but it allowed the researchers to get a statistical handle on the problem of compression. Indeed, itโs an extension of one of the concepts that longtime MIT professor Claude Shannon used to determine the maximum capacity of a communication channel in the seminal 1948 paper that created the field of information theory.
In Sudan and his colleaguesโ codebook, a vast number of messages might have associated strings that begin with the same symbol. But fewer messages will have strings that share their first two symbols, fewer still strings that share their first three symbols, and so on. In any given instance of communication, the question is how many symbols of the string one device needs to send the other in order to pick out a single associated message.
The answer to that question depends on the probability that any given interpretation of a string of symbols makes sense in context. By way of analogy, if your co-worker has had only one visitor all day, asking her, โWho was that guy in your office?โ probably suffices. If sheโs had a string of visitors, you may need to specify time of day and tie color.
Existing compression schemes do, in fact, exploit statistical regularities in data. But Sudan and his colleagues considered the case in which sender and receiver assign different probabilities to different interpretations. They were able to show that, so long as protocol designers can make reasonable assumptions about the ranges within which the probabilities might fall, good compression is still possible.
For instance, Sudan says, consider a telescope in deep-space orbit. The telescopeโs designers might assume that 90 percent of what it sees will be blackness, and they can use that assumption to compress the image data it sends back to Earth. With existing protocols, anyone attempting to interpret the telescopeโs transmissions would need to know the precise figure โ 90 percent โ that the compression scheme uses. But Sudan and his colleagues showed that the protocol could be designed to accommodate a range of assumptions โ from, say, 85 percent to 95 percent โ that might be just as reasonable as 90 percent.
Buggy codebook
In a paper being presented at the next ITCS, in January, Sudan and colleagues at Columbia University, Carnegie Mellon University, and Microsoft add even more uncertainty to their compression model. In the new paper, not only do sender and receiver have somewhat different probability estimates, but they also have slightly different codebooks. Again, the researchers were able to devise a protocol that would still provide good compression.
They also generalized their model to new contexts. For instance, Sudan says, in the era of cloud computing, data is constantly being duplicated on servers scattered across the Internet, and data-management systems need to ensure that the copies are kept up to date. One way to do that efficiently is by performing โchecksums,โ or adding up a bunch of bits at corresponding locations in the original and the copy and making sure the results match.
That method, however, works only if the servers know in advance which bits to add up โ and if they store the files in such a way that data locations correspond perfectly. Sudan and his colleaguesโ protocol could provide a way for servers using different file-management schemes to generate consistency checks on the fly.
โI shouldnโt tell you if the number of 1โs that I see in this subset is odd or even,โ Sudan says. โI should send you some coarse information saying 90 percent of the bits in this set are 1โs. And you say, โWell, I see 89 percent,โ but thatโs close to 90 percent โ thatโs actually a good protocol. We prove this.โ
โThis sequence of works puts forward a general theory of goal-oriented communication, where the focus is not on the raw data being communicated but rather on its meaning,โ says Oded Goldreich, a professor of computer science at the Weizmann Institute of Science in Israel. โI consider this sequence a work of fundamental nature.โ
โFollowing a dominant approach in 20th-century philosophy, the work associates the meaning of communication with the goal achieved by it and provides a mathematical framework for discussing all these natural notions,โ he adds. โThis framework is based on a general definition of the notion of a goal and leads to a problem that is complementary to the problem of reliable communication considered by Shannon, which established information theory.โ
If our reporting has informed or inspired you, please consider making a donation. Every contribution, no matter the size, empowers us to continue delivering accurate, engaging, and trustworthy science and medical news. Independent journalism requires time, effort, and resourcesโyour support ensures we can keep uncovering the stories that matter most to you.
Join us in making knowledge accessible and impactful. Thank you for standing with us!