# CompSust09 Impressions, Part II

Suppose you are given a discrete probability distribution. If a random draw from the distribution occurs, at most how many yes/no questions would it take to figure out what value was drawn? If all the probability mass is concentrated in one value, the answer is 0. If it is split over two values with equal likelihood, the answer is 1; over four equally likely values, then it is 2, etc. The more spread out the distribution, the greater the number of questions required. Generalizing the concept slightly to allow for distributions that are not uniform and/or do not have a power-of-two number of values leads to the notion of **entropy**. The entropy of a probability distribution is a measure of how spread out the distribution is. Note this is not the same as thermodynamic entropy. If the distribution of a random variable **X** consists of *n* values *x _{1}* through

*x*with corresponding probabilities

_{n}*p*through

_{1}*p*, then the entropy

_{n}**H(x)**is given by:

(Logarithmic bases other than 2 can be used.) When I was working in the computer network modeling realm I came across a great paper by Katabi and Blake that used the entropy of interpacket times to passively monitor network traffic.

Getting back to CompSust09, **Steve Phillips** of AT&T applied an approach known as **maximum entropy modeling** (maxent) towards species distributions. Maxent has been heavily applied to natural language processing, but also to other areas as well. Phillips’ work starts with a number of observations of weather and of occurrences of a particular species (he used a vireo) over a region. The goal is to estimate the probability distribution of the occurrence points. According to the method, the desired distribution must first satisfy some simple constraints, such as its mean must equal the sample mean of an observed variable, or simple functions of such quantities must be equal. Among all distributions satisfying the constraints, choose the one with the maximum entropy. The underlying idea is that species tend to spread out, so this notion is captured in the distribution by maximizing entropy. The fairly restrictive locking of means and such mentioned above can be relaxed by instead requiring the distribution mean (for instance) to lie in a confidence region about the sample mean. Phillips showed some nice results of the model applied to finding potential new habitat protection areas in Madagascar. There was a bit more to it, but that is it in a nutshell. Videos of all of the conference talks are now available on the conference schedule page, so see that for more information.

A nice use of the **knapsack problem** in **species reserve locations** came up in a couple of talks. Given a fixed budget (the size of the knapsack), which potential reserve areas should be obtained in order to maximize overall reserve value? The areas have a cost of acquisition and a value in terms of species preservations.

All told, it was quite an interesting conference. In the call to form the computational sustainability community, OR is among the disciplines mentioned. And as my posts indicate, operations researchers and others using OR techniques were active at the conference. Hopefully contributions from the OR community to some of the problems raised can increase over time. In addition, it would be nice to see CompSust10 branch out more fully beyond the ecological sustainability issues to cover energy, emissions, production, supply chains, transportation, etc. in more depth. Also, one point raised at the conference was that the attendees were primarily academics. So increasing the presence of industry and government (especially administrators) ought to be a goal going forward. Overall though, it seems like the computational sustainability movement is off to a good start.