## 07 August 2014

### Coursera Courses

In this post I review some of the Coursera courses that I completed.

Algorithms: Design and Analysis, Part 2. The course covers greedy, dynamic programming, and NP-completeness. The lectures were very clear, not too fast and not too slow. Caveat: I watched lectures only when I needed. My strategy was to try to solve the problems. If I had trouble, I looked at the slides. If the slides were cryptic, then I looked at the lectures. So, I ended up seeing very few lectures. I even learned an algorithm that I knew I should know but didn't: Karger's. I think it's my favorite example of a randomized algorithm.

Introduction to Genetics and Evolution. The course covers the basic math of genetics. I'll give two examples of things I remember now, more than one year after.

One idea is the Hardy-Weinberg equilibrium. Suppose each individual has one of the three genes $aa$, $aA$, and $AA$. I'll use the fancy word allele: it means one of $a$ and $A$. Reproduction works as follows: Randomly choose an allele from each of the two parents, and put them together. Hardy-Weinberg says that in an infinite population, the percent of each allele is constant. So, if you start with half $a$s and half $A$s, you'll remain with half $a$s and half $A$s.

The other idea is genetic drift. It says what happens when the population is not infinite. What happens is that you'll have random variations. Eventually, these lead to the extinction of one of the alleles.

Programming Languages. This course uses ML, Racket, and Ruby to illustrate basic principles of programming languages. As usual in this kind of courses, you write a few interpreters. The best time to take this course is after you've learned your second language. (I learned BASIC when I was 6; Pascal and C++ came when I was 14. I think I would've benefited most from this course when I was 15. So, if you are 15, then you should try it!)

The Modern World: Global History since 1760. What I remember most vividly about this course is rather strange. It is the soothing voice of the lecturer. Even when I was disagreeing with what he said, I felt compelled to keep listening.

From what I recall, the quizzes were boring: They simply tested a bunch of facts that happened to be mentioned in lectures. The videos, on the other hand, always told a nice story. These stories were clearly addressed to someone living in 2013. For example, the lectures not only said that the West was weak in 1760. They stressed this fact repeatedly, because a world in which the West is weak and insignificant is really difficult to imagine for the typical user of Coursera.

Natural Language Processing. For this course, like for the Algorithms one, I did problems first. I looked at the slides and watched the lectures only when I needed. I ended up looking at most slides and almost no lecture video. Which, I guess, means that the slides are self-sufficient. From what I recall, doing the homeworks involved a substantial effort.

The topics covered are things like probabilistic grammars, tagging problems (e.g., noun, verb, …), log-linear models, learning.

Discrete Optimization. Like for the other computer-related courses, I went directly for the homeworks. In addition, I delayed doing most homeworks for the last week. That turned out to be a huge mistake. I simply couldn't do the homeworks at the rate of one per day, as I planned. In fact, I thought at some point that I won't even get a passing grade. I did, but I also learned that Coursera sometimes has tough courses. So, what were the homeworks like? I recall: knapsack, graph coloring, euclidean traveling salesman. There was also some combination of bin packing and euclidean traveling salesman. More importantly, I recall some tricks that I learned while doing the homeworks. For example, for knapsack I used two tricks to get a good solution. The standard dynamic programming runs in time proportional to the total weight. This is troublesome because weight can be very large. In that case, use a trick from physics: pick your units of measurement. Strictly speaking, the unit of measure doesn't matter. But, when one works in kilometers, one usually doesn't keep track of millimeters. Thus, by ‘picking the unit’ I actually mean ‘ignore the less important bits’. When ignoring bits, you'll want to round up. This way, you never try to fit more than possible in the knapsack. The other trick is to repeat this process using the items that remained out, with better precision.

For travelling salesmen, I wrote some complicated solution that barely got medium points. It did two main things, as I recall. First, it used dynamic programming to do local optimizations. Second, it greedily improved the tour by using a set of moves that I noticed empirically. For an example of move, think of a crossing of edges. These two edges are far apart in the current best tour, so local optimizations won't notice the crossing. But, of course, you can look at all pairs of edges and check for intersections. Once you notice the interesection, it's clear what to do to shorten the tour.

A Brief History of Humankind. This course was like the other history course: trivial (useless?) quizzes, but fascinating lectures. Where the Modern History course would talk about personalities and their decisions, the Humankind History course would talk about the way of life of a typical person. In this sense, the two courses complement each-other nicely.

The most vivid image that this course left in my mind is that of ‘shared imaginary worlds’. Why do money have value? Because we believe that others believe that money have value. This kind of circular reasoning has a certain appeal for a computer scientist.

Linear and Integer Programming. This course was too easy for my taste. They do recommend a resource that looks intriguing: the convex optimization book by Stephen Boyd.

Automata. This course had two kinds of homework: multiple-choice quizzes and programming assignments. Unexpectedly, I enjoyed the quizzes more. Each quiz question tries to test whether you understood some concept. You get treated as a computer program: The quiz presents you with some (smallish) input data, you have to provide the right output. If you repeat the quiz, you get some other input data that you have to process. For one of the first quizzes, I managed to get a question wrong three times in a row. I was trying to do it quickly, and I ended up being very slow. The fourth time I moved slowly and checked my work. I think this phenomenon appears often in real life: People make fun of those that appear slow, but the slow ones often finish first.

Back to the course. Why did I not like the programming assignments? Because I was supposed to modify some Python scripts that were kinda crap.

To answer some questions, I had to look at the slides. These were extremely clear — I never needed to look at the video lectures.

Social and Economic Networks: Models and Analysis. I learned something new from almost every video lecture. The quizzes were rather easy, but only after I learned the vocabulary. It wasn't possible to grasp the definitions from the slides, but the videos were very clear. In other words, if you take this course and you don't already work in social networks, you'll have to watch the videos. Which is a good thing.

Here's an example of the kind of stuff I remember. Social networks are graphs. In various circumstances it makes sense to talk about the utility of a social network. A simple way to define this is as the sum of utilities of each vertex. For an example, think of the co-author graph. The vertices are authors, the edges represent an ongoing collaboration. One particular author benefits from the time given by its co-authors. In turn, this particular author offers his time to their co-authors. In the simplest model, the time is divided equally. Thus, the utility of a vertex $x$ is $1+\sum_{y \in N(x)} 1/|N(y)|$. (The $1$ comes from the time spent by $x$ on their own stuff, and $1/|N(y)|$ appears because $y$ divides its time equally between its neighbors $N(y)$, one of those neighbours being $x$.) Once you have such a simple model, you can ask questions like: Which configurations are pairwise stable (meaning that there is no local incentive to create or delete edges)? Which configurations correspond to global optima? For the answers, you'll have to take the course. (Well, you could find out the answers yourself, of course.)

Statistical Mechanics: Algorithms and Computations. This course is mostly about Markov chains. You can think of Markov chains as a way to sample complicated probability distributions, or as a way to estimate integrals. Come to think of it, the course is more about sampling and integrating than about Markov chains. For example, when you get to the quantum stuff, you see that computing path integrals is better done using Levy paths. These are a direct sampling technique, which works better than the Markov chain approach, in this case.

Also, these techniques can be used for optimization problems. One example is the traveling salesman problem, which makes an appearance in one of the reviews above. I was conceited, and expected that the program I wrote with more than one day of work would do better. After all, I looked at the Python program provided by the instructors and it seemed very similar, except: (1) it did no local optimization using dynamic programming, (2) the vocabulary of moves was a small subset of the one I accumulated. And yet, their implementation worked much better. Wat?? I had to take a look. It turns out there was only one difference: They sometimes allowed a move that made the tour worse. (They used simulated annealing, rather than greedy.) In other words, my solution was getting stuck in local optima. I'm not sure why I didn't think of this when I was doing the discrete optimization course. (It seems obvious with hindsight.) A good rule of thumb: Whenever you use greedy for an optimization problem, at least consider trying simulated annealing.

Starting Soon. Out of these courses, the ones that start soon are the following: