## Notation, spanning sets, and gradients

I tried to post this last night, but for some reason the blog was unreachable. Perhaps I need to switch hosting providers. Anyway, on to the good stuff.

One of the reasons I struggle with reading math books is their use of notation. I should say that a different way. Based on my training, I'm expecting a certain notation, and when something different crops up, it takes me a little while to adjust. This is somewhat ironic, since notational flexibility is something that I harp on my students about, but there it is.

Case in point, there may be some technical reason that the authors of [1] use constructions like $w^\top v_i$ rather than a standard dot product notation, but I'm not sure what it might be. After a little thought, it's clear that's what is going on here, but my physics brain would be happier with bold vectors and a dot, or failing that, with the Dirac notation $$.

Perhaps the key to this mystery is the parenthetical at the bottom of page 15:

(Many times it will be convenient for us in this book to regard a set of vectors as a matrix whose columns are the vectors in the set.)

If you're thinking in terms of matrices, then taking the transpose is perhaps more natural than the (to me) more traditional inner product. But that's only a guess.

Enough about that. The real meat here is the idea of a positive spanning set and a positive basis. At first blush positive bases seem more cumbersome than the more pedestrian variety of basis. First, there's the question of how many vectors to put in the positive basis. It can have anywhere between $n+1$ (a minimal positive basis) and $2 n$ (a maximal positive basis) for an $n$-dimensional space. Second, orthogonality (which is one of my favorite things, as my upper level physics students will tell you) is irrelevant. The useful, and therefore desirable property which takes its place is that the angles between the basis vectors be uniform. The utility of this property is not as immediately obvious as with orthogonality, but I'm beginning to warm to it.

And then there's the cosine measure. It was a little difficult for me to wrap my brain around this one. This applies to a spanning set (more general than a basis in that you don't need linear independence). It's defined as

So, unpacking this, you're given $D$, which is a set of vectors. You want to pick an arbitrary vector $v$ that minimizes the maximum of the projection along one of the vectors in the spanning set. If I'm reasoning through this correctly, this amounts to picking a vector v which bisects the largest angle between vectors in the spanning set (which is at least one reason that having equal angles is useful); the cosine measure is then given by the direction cosine (or, in other words, the normalized inner product) of that vector $v$ and the vector from the spanning set closest to parallel with it (which is either of the two vectors making the angle it bisects). Whew.

Why use this measure? I'm really not sure yet. It does give an unambiguous measure for any spanning set, which might be useful. I'll have to wait and see whether this becomes more clear as I work through the book.

Another thing I find maddening about math books (and this is partly a language issue, and partly just a "feature" of highly technical writing) is that any time there is a theorem, I have to stop and puzzle through what it means. Take, for example, Theorem 2.3:

Let $[v_i \ldots v_r ]$, with $v_i \neq 0$ for all $i \in {1,\ldots ,r}$, span $\mathbb{R}^n$. Then the following are equivalent:

(i) $[v_i \ldots v_r ]$ spans $\mathbb{R}^n$ positively.

(ii) for every $i = 1, \ldots, r$, the vector $-v_i$is in the convex cone positively spanned by the remaining $r-1$ vectors.

(iii) There exist real scalars $\alpha_1, \ldots \alpha_r$with $\alpha_i \gt 0$, $i \in {1,\ldots,r}$, such that $\sum_{i=1}^r \alpha_i v_i = 0$.

(iv) For every nonzero vector $w \in \mathbb{R}^n$, there exists an index $i$ in ${1,\ldots,r}$ for which $w^\top v_i \gt 0$.

It took some head scratching for me to unpack these. (i) and (ii) seem fairly straightforward. (iii) is simply saying that you can add scalar multiples of at least some of the vectors in the set together to get the zero vector, or, thinking geometrically, it's possible to construct a closed path which includes the origin. (iv) is just saying that you're guaranteed to have a positive projection along at least one of the vectors in the set. Now, I'm fond of precision, and I can understand why the authors used the language they did. I can even understand why they didn't include the dumbed-down (and less precise) interpretations I've just given. But I constantly need to remind myself that I can't understand what I'm reading if I don't go through the exercise of converting the precise mathematical language I find in the text into the more intuitive language that makes sense to my visceral reasoning.

[1] A. R. Conn, K. Scheinberg, and L. N. Vicente, Introduction to Derivative Free Optimization
[Bibtex]
@book{DFO:2008,
author = {Andrew R. Conn and Katya Scheinberg and Luis N. Vicente},
title = {Introduction to Derivative Free Optimization},
publisher = {Society for Industrial and Applied Mathematics}
year = {2009}
}

Comments are disabled for this post