Citation for this page in APA citation style.

Philosophers
Mortimer Adler Rogers Albritton Alexander of Aphrodisias Samuel Alexander William Alston Anaximander G.E.M.Anscombe Anselm Louise Antony Thomas Aquinas Aristotle David Armstrong Harald Atmanspacher Robert Audi Augustine J.L.Austin A.J.Ayer Alexander Bain Mark Balaguer Jeffrey Barrett William Belsham Henri Bergson George Berkeley Isaiah Berlin Richard J. Bernstein Bernard Berofsky Robert Bishop Max Black Susanne Bobzien Emil du Bois-Reymond Hilary Bok Laurence BonJour George Boole Émile Boutroux F.H.Bradley C.D.Broad Michael Burke C.A.Campbell Joseph Keim Campbell Rudolf Carnap Carneades Ernst Cassirer David Chalmers Roderick Chisholm Chrysippus Cicero Randolph Clarke Samuel Clarke Anthony Collins Antonella Corradini Diodorus Cronus Jonathan Dancy Donald Davidson Mario De Caro Democritus Daniel Dennett Jacques Derrida René Descartes Richard Double Fred Dretske John Dupré John Earman Laura Waddell Ekstrom Epictetus Epicurus Herbert Feigl John Martin Fischer Owen Flanagan Luciano Floridi Philippa Foot Alfred Fouilleé Harry Frankfurt Richard L. Franklin Michael Frede Gottlob Frege Peter Geach Edmund Gettier Carl Ginet Alvin Goldman Gorgias Nicholas St. John Green H.Paul Grice Ian Hacking Ishtiyaque Haji Stuart Hampshire W.F.R.Hardie Sam Harris William Hasker R.M.Hare Georg W.F. Hegel Martin Heidegger Heraclitus R.E.Hobart Thomas Hobbes David Hodgson Shadsworth Hodgson Baron d'Holbach Ted Honderich Pamela Huby David Hume Ferenc Huoranszki William James Lord Kames Robert Kane Immanuel Kant Tomis Kapitan Jaegwon Kim William King Hilary Kornblith Christine Korsgaard Saul Kripke Andrea Lavazza Keith Lehrer Gottfried Leibniz Leucippus Michael Levin George Henry Lewes C.I.Lewis David Lewis Peter Lipton C. Lloyd Morgan John Locke Michael Lockwood E. Jonathan Lowe John R. Lucas Lucretius Alasdair MacIntyre Ruth Barcan Marcus James Martineau Storrs McCall Hugh McCann Colin McGinn Michael McKenna Brian McLaughlin John McTaggart Paul E. Meehl Uwe Meixner Alfred Mele Trenton Merricks John Stuart Mill Dickinson Miller G.E.Moore Thomas Nagel Friedrich Nietzsche John Norton P.H.Nowell-Smith Robert Nozick William of Ockham Timothy O'Connor Parmenides David F. Pears Charles Sanders Peirce Derk Pereboom Steven Pinker Plato Karl Popper Porphyry Huw Price H.A.Prichard Protagoras Hilary Putnam Willard van Orman Quine Frank Ramsey Ayn Rand Michael Rea Thomas Reid Charles Renouvier Nicholas Rescher C.W.Rietdijk Richard Rorty Josiah Royce Bertrand Russell Paul Russell Gilbert Ryle Jean-Paul Sartre Kenneth Sayre T.M.Scanlon Moritz Schlick Arthur Schopenhauer John Searle Wilfrid Sellars Alan Sidelle Ted Sider Henry Sidgwick Walter Sinnott-Armstrong J.J.C.Smart Saul Smilansky Michael Smith Baruch Spinoza L. Susan Stebbing Isabelle Stengers George F. Stout Galen Strawson Peter Strawson Eleonore Stump Francisco Suárez Richard Taylor Kevin Timpe Mark Twain Peter Unger Peter van Inwagen Manuel Vargas John Venn Kadri Vihvelin Voltaire G.H. von Wright David Foster Wallace R. Jay Wallace W.G.Ward Ted Warfield Roy Weatherford William Whewell Alfred North Whitehead David Widerker David Wiggins Bernard Williams Timothy Williamson Ludwig Wittgenstein Susan Wolf Scientists Michael Arbib Bernard Baars Gregory Bateson John S. Bell Charles Bennett Ludwig von Bertalanffy Susan Blackmore Margaret Boden David Bohm Niels Bohr Ludwig Boltzmann Emile Borel Max Born Satyendra Nath Bose Walther Bothe Hans Briegel Leon Brillouin Stephen Brush Henry Thomas Buckle S. H. Burbury Donald Campbell Anthony Cashmore Eric Chaisson Jean-Pierre Changeux Arthur Holly Compton John Conway John Cramer E. P. Culverwell Charles Darwin Terrence Deacon Louis de Broglie Max Delbrück Abraham de Moivre Paul Dirac Hans Driesch John Eccles Arthur Stanley Eddington Paul Ehrenfest Albert Einstein Hugh Everett, III Franz Exner Richard Feynman R. A. Fisher Joseph Fourier Lila Gatlin Michael Gazzaniga GianCarlo Ghirardi J. Willard Gibbs Nicolas Gisin Paul Glimcher Thomas Gold A.O.Gomes Brian Goodwin Joshua Greene Jacques Hadamard Patrick Haggard Stuart Hameroff Augustin Hamon Sam Harris Hyman Hartman John-Dylan Haynes Martin Heisenberg Werner Heisenberg John Herschel Jesper Hoffmeyer E. T. Jaynes William Stanley Jevons Roman Jakobson Pascual Jordan Ruth E. Kastner Stuart Kauffman Martin J. Klein Simon Kochen Stephen Kosslyn Ladislav Kovàč Rolf Landauer Alfred Landé Pierre-Simon Laplace David Layzer Benjamin Libet Seth Lloyd Hendrik Lorentz Josef Loschmidt Ernst Mach Donald MacKay Henry Margenau James Clerk Maxwell Ernst Mayr Ulrich Mohrhoff Jacques Monod Emmy Noether Abraham Pais Howard Pattee Wolfgang Pauli Massimo Pauri Roger Penrose Steven Pinker Colin Pittendrigh Max Planck Susan Pockett Henri Poincaré Daniel Pollen Ilya Prigogine Hans Primas Adolphe Quételet Juan Roederer Jerome Rothstein David Ruelle Erwin Schrödinger Aaron Schurger Claude Shannon David Shiang Herbert Simon Dean Keith Simonton B. F. Skinner Roger Sperry John Stachel Henry Stapp Tom Stonier Antoine Suarez Leo Szilard William Thomson (Kelvin) Peter Tse Vlatko Vedral Heinz von Foerster John von Neumann John B. Watson Daniel Wegner Steven Weinberg Paul A. Weiss John Wheeler Wilhelm Wien Norbert Wiener Eugene Wigner E. O. Wilson H. Dieter Zeh Ernst Zermelo Wojciech Zurek Presentations Biosemiotics Free Will Mental Causation James Symposium |
Claude Shannon
Claude Shannon is properly described as "the father of information theory" although he described his work as "communication theory." While others had loosely connected the idea of
information to its opposite, entropy, it was Shannon who put the communication of signals in the presence of noise on a sound mathematical basis.
In 1871, James Clerk Maxwell showed how an intelligent being could in principle sort out the disorder in a gas of randomly moving molecules, by gathering information about their speeds and sorting them into hot and cold gases, in apparent violation of the second law of thermodynamics. William Thomson (Lord Kelvin) called this being "Maxwell's intelligent demon."
As early as the 1890's, Ludwig Boltzmann, who established the statistical physics foundation of thermodynamics, had described entropy as "missing information." Boltzmann chose the
S = k log W
where
In 1929, Leo Szilard imagined a gas with but a single molecule in a container. He then devised a mechanism that could behave like Maxwell's demon. It would insert a partition into the middle of the container, then gather the information about which of the two sides of the partition the molecule was in. This was a binary decision and it allowed Szilard to develop the mathematical form for the amount of entropy
S = k log 2
The base-2 logarithm reflects the binary decision. The amount of entropy generated by the measurement may, of course, always be greater than this fundamental amount of negative entropy (information) created, but not smaller, or the second law - that overall entropy must increase - would be violated. The earlier work of Maxwell, Boltzmann, and Szilard did not figure directly in Shannon's work. Shannon studied the design of early analog computers (specifically Vannevar Bush's differential analyzer at MIT, which was used by Coolidge and James to calculate the wave functions of the hydrogen molecule in 1936). Then, with John von Neumann and Alan Turing, he helped design the first digital computers, based on the Boolean logic of 1's and 0's and binary arithmetic. Shannon analyzed telephone switching circuits that used electromagnetic relay switches, then realized that the switches could solve some problems in Boolean algebra. During World War II, Shannon worked at Bell Labs on cryptography and sending control signals in the presence of noise. Alan Turing visited the labs for a couple of months and showed Shannon his 1936 ideas for a universal computer (the "Turing Machine"). Shannon's work on communications, control systems, and cryptography were initially classified, but they contained almost all of the mathematics that eventually appeared in his landmark 1948 article "A Mathematical Theory of Communication," that is the basis for modern information theory. Norbert Wiener's work on probability theory in Cybernetics had an important influence on Shannon. There can be no new information in a world of certainty. Probability and statistics are at the heart of both information theory and quantum theory. Shannon developed his expression for an information (Shannon) entropy, which he showed has the same mathematical form as thermodynamic (Boltzmann) entropy. He wrote:
Shannon Entropy and Boltzmann Entropy
Shannon entropy is the average (expected) value of the information contained in a received message. If there are many possible messages, we get a lot more information than when there are only two possibilities (one bit of information). It is the base 2 logarithm of the number of possibilities. Entropy thus characterizes our uncertainty about the information in an incoming message, and increases for more possibilities with greater randomness. The less likely an event is, the more information it provides when it occurs. Shannon defined his entropy or information as the negative of the logarithm of the probability distribution. One bit of information is also known as one "shannon."
Boltzmann entropy is maximized when the particle distribution is maximally random among positions in phase space, when the number of microstates
Counterintuitively, maximum Boltzmann entropy (no information) is maximal uncertainty
Historical Background
Information in physical systems was connected to a measure of the structural order in a system as early as the nineteenth century by William Thomson (Lord Kelvin) and Ludwig Boltzmann, who described an increase in the thermodynamic entropy as “lost information.”
In 1877, Boltzmann proved his “H-Theorem” that the entropy or disorder in the universe always increases. He defined entropy S as the logarithm of the number W of possible states of a physical system, an equation now known as Boltzmann’s Principle,
In 1929, Leo Szilard showed the mean value of the quantity of information produced by a 1-bit, two-possibility (“yes/no”) measurement as S = k log 2, where k is Boltzmann’s constant, connecting information directly to entropy. Following Szilard, Ludwig von Bertalanffy, Erwin Schrödinger, Norbert Wiener, Claude Shannon, Warren Weaver, John von Neumann, and Leon Brillouin, all expressed similar views on the connection between physical entropy and abstract “bits” of information. Schrödinger said the information in a living organism is the result of “feeding on negative entropy” from the sun. Wiener said “The quantity we define as amount of information is the negative of the quantity usually defined as entropy in similar situations.” Brillouin created the term “negentropy” because he said, “One of the most interesting parts in Wiener’s Cybernetics is the discussion on “Time series, information, and communication,” in which he specifies that a certain “amount of information is the negative of the quantity usually defined as entropy in similar situations.” Shannon, with a nudge from von Neumann, used the term entropy to describe his estimate of the amount of information that can be communicated over a channel, because his mathematical theory of the communication of information produced a mathematical formula identical to Boltzmann’s equation for entropy, except for a minus sign (the negative in negative entropy).
Shannon described a set of H,
where
To see the connection between the two entropies, we can note that Boltzmann assumed that all his probabilities were equal. For
The sum over n x 1/n x log (1/n) = log (1/n) = - log n.
If we set Shannon's number of possible messages
Shannon’s entropy Shannon showed that a communication that is certain to tell you something you already know (one of the messages has probability unity) contains no new information. For Teachers
For Scholars
The Mathematical Theory of Communication (excerpts)
Introduction
The recent development of various methods of modulation such as PCM and PPM which exchange bandwidth for signal-to-noise ratio has intensified the interest in a general theory of communication. A basis for such a theory is contained in the important papers of Nyquist^{1} and Hartley^{2} on this subject. In the present paper we will extend the theory to include a number of new factors, in particular the effect of noise in the channel, and the savings possible due to the statistical structure of the original message and due to the nature of the final destination of the information.
The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design.
If the number of messages in the set is finite then this number or any monotonic function of this number can be regarded as a measure of the information produced when one message is chosen from the set, all choices being equally likely. As was pointed out by Hartley [and Szilard and Boltzmann] the most natural choice is the logarithmic function. Although this definition must be generalized considerably when we consider the influence of the statistics of the message and when we have a continuous range of messages, we will in all cases use an essentially logarithmic measure. The logarithmic measure is more convenient for various reasons: 1. It is practically more useful. Parameters of engineering importance such as time, bandwidth, number of relays, etc., tend to vary linearly with the logarithm of the number of possibilities. For example, adding one relay to a group doubles the number of possible states of the relays. It adds 1 to the base 2 logarithm of this number. Doubling the time roughly squares the number of possible messages, or doubles the logarithm, etc. 2. It is nearer to our intuitive feeling as to the proper measure. This is closely related to (1) since we intuitively measure entities by linear comparison with common standards. One feels, for example, that two punched cards should have twice the capacity of one for information storage, and two identical channels twice the capacity of one for transmitting information. 3. It is mathematically more suitable. Many of the limiting operations are simple in terms of the logarithm but would require clumsy restatement in terms of the number of possibilities.
The choice of a logarithmic base corresponds to the choice of a unit for measuring information. If the base 2 is used the resulting units may be called binary digits, or more briefly
log
a decimal digit is about 3+1/2 bits. A digit wheel on a desk computing machine has ten stable positions and therefore has a storage capacity of one decimal digit. In analytical work where integration and differentiation are involved the base e is sometimes useful. The resulting units of information will be called natural units. Change from the base a to base b merely requires multiplication by log_{2} M = log_{10}M/log_{10}2 = 3.32 log _{10} M,
_{b}a.
By a communication system we will mean a system of the type indicated schematically in Fig. 1. It consists of essentially five parts:
Fig. 1 Schematic diagram of a general communication system.
1. An
2. A
4. The
5. The We wish to consider certain general problems involving communication systems. To do this it is first necessary to represent the various elements involved as mathematical entities, suitably idealized from their physical counterparts. We may roughly classify communication systems into three main categories: discrete, continuous and mixed. By a discrete system we will mean one in which both the message and the signal are a sequence of discrete symbols. A typical case is telegraphy where the message is a sequence of letters and the signal a sequence of dots, dashes and spaces. A continuous system is one in which the message and signal are both treated as continuous functions, e.g., radio or television. A mixed system is one in which both discrete and continuous variables appear, e.g., PCM transmission of speech. We first consider the discrete case. This case has applications not only in communication theory, but also in the theory of computing machines, the design of telephone exchanges and other fields. In addition the discrete case forms a foundation for the continuous and mixed cases which will be treated in the second half of the paper.
6. Choice, Uncertainty and Entropy
We have represented a discrete information source as a Markoff process. Can we define a quantity which will measure, in some sense, how much information is "produced" by such a process, or better, at what rate information is produced?
Suppose we have a set of possible events whose probabilities of occurrence are
If there is such a measure, say
1. p are equal, _{n}p, then _{i} = 1/nH should be a monotonic increasing function of n. With equally likely events there is more choice, or uncertainty, when there are more possible events.
3. If a choice be broken down into two successive choices, the original
Fig. 6.— Decomposition of a choice from three possibilities.
At the left we have three possibilities p
H(1/2, 1/3, 1/6) = H(1/2, 1/2) + 1/2 H(2/3, 1/3)
The coefficient 1/2 is the weighting factor introduced because this second choice only occurs half the time. In Appendix 2, the following result is established:
H = K Σ p
_{i} log p_{i}
Quantities, of the form K merely amounts to a choice of a unit of measure) play a central role in information theory as measures of information, choice and uncertainty. The form of H will be recognized as that of entropy
as defined in certain formulations of statistical mechanics8 where p_{i} is the probability of a system being in cell i of its phase space.
p. If _{1}, p_{2}, • • • , p_{n}x is a chance variable we will write H(x) for its entropy; thus x is not an argument of a function but a label for a number, to differentiate it from H(y) say, the entropy of the chance variable y.
The quantity
1.
2. For a given
Normal | Teacher | Scholar |