It seemed to me that the resampling algorithm explained in the class could be improved on, and this is what I came up with. It rus about 40 times faster (on my machine on the test data in HM3-6) than the code in the homework and as described in the videos. Results seem to right on par with Sebastion's, though in the very long run, it seems to produce slightly better results.
Here's the code from HW 3-6:
And here's my fast version of the code:
As I was getting ready to post this, I see another person posted basically the same idea in this thread: Resampling wheel: is this really faster and better? His code had some minor problems that I believe I addressed in my code by not dealing with the starting values.
The improved algorithm runs faster for two reasons. First, it never calls random() where as the slow code calls random N times. Second, it makes only one loop around the wheel, instead of many.
Using random() is overkill because in any sort of real application of particle filters, we have plenty of random noise in the w array to stat with. We can leverage the randomness that already exists in the w array to drive our random sampling without having to add more randomness in the form of a PRNG like random(). By not using random() we greatly reduce the CPU load.
Second, as Sebastian codded the particle filter selection wheel, it has a real computational weakness whenever a few particles turn out to be much better guesses than all the others. This is typical of the starting condition, and can reaper whenever the filter "losses" the robot. What happens, is that the one or few particles with the best answer end up dominating a huge slice of the "wheel". If the max w value for example is 1/4 of the entire wheel (maxW / sumW == 1/4), then the slow code will use beta increment of 2.0 * max, or 1/2 of the wheel. This means, that for each resample, the beta value will move (on average), half way around the wheel. So in order to get N samples from the wheel, that inner loop will have to go around the wheel on average N/2 times. Or, for the N=500 from the assignment, that means it must loop around the wheel 250 times. Each loop around the wheel requires that inner loop stepping through all N of the W values. Whenever this happens, the algorithm changes from O(N) into O(N^2). So any time the filter gets "lost" the re-sampling becomes O(N^2) until the filter once again reacquires a good estimation of the state and returns to near O(N).
Someone in that other thread suggested both approaches were O(N). They are not.
I added timing code to the homework assignment code and could see this happen. When it started, the re-sampling code was significantly slower for that first loop (10 to 100 times slower). Due to the type of sensors used in the homework, our robot would not tend to get badly "lost" after it acquired a good guess, so this effect was mostly just seen on the first step. But any real world robot that could get badly confused by unexpected conditions could see this filter algorithm turn O(N^2) and with a large set of particles, that could be a significant draw back.
So to fix this, what I did was to make just one pass around the wheel, and instead of taking random steps, I simply divided the wheel into N fixed slices. These "steps" land on random w slices due to the fact that the location of each slice, is itself a random function of all the w values that comes before it. So by taking fixed non-random steps around the wheel, we are still randomly picking particles. We allow the randomness built into all the w values to create the "random" function that determines which slices we pick for any resampling cycle.
So the code is basically the same as Sebastian's, except we use a far simpler, and faster, "beta += step" for each step. Where "step" is simply "sum(w)/N". This causes the algorithm to make one simple pass around the wheel keeping the code O(N) under all conditions.
The second problem is where the algorithm starts. We can't start at index 0 every time because it would create a bias forcing w to be picked more often than all the other filters. That could be ok of the rest of the system had some known effect to cause the particle at w to be randomized in each cycle. But that would not be expected based on how the rest of the code works. So we need to add some more randomness to control where we start.
Sebastian's code picks a random starting index. That could work. But it's got a minor problem as well. The index it picks will have a high bias of being selected first. But because the index is selected randomly from 1..N, it's violating the goal of selecting biased by weight. What that will do, is cause low value weights to be selected at a higher frequency than they should be. The bias is likely to be so small as to be insignificant, but we can do better.
We need to pick randomly in the "w value" space, instead of the index space. We need to pick a space randomly around the wheel (which is w space not index space). But because once we pick a staring point, we will then select N fixed spaced "spokes" around the wheel, we don't really need to pick a random number from 0 to W. We only need to pick an offset from 0 to W/N for how to "turn" that set of spoke. We do this simply by setting the starting value of beta. We pick a random number from 0 to "step". Using random() we could write it like this:
That would work great. However, just to allow me to get rid of all calls to random() in the resampling code, I used the sum(w) value as a "free" source of random data bits, and wrote it like this instead:
I don't know for sure if that might be introducing a bias to the value of beta. It might be creating a worse bias then the way Sebastian coded it by picking a random index value instead of a random beta value. However, in my testing, it seemed to work well, and it's low performance cost was appealing to me.
This technique however does have significant different selection characteristics than if we truly picked randomly from the full set N times. However, I suspect, this algorithm is actually better than picking randomly (in addition to being much faster and better behaved). That would have to be proven.
It does behave correctly in the fact that probability of a particle being picked by this algorithm does match the required expected value for each particle. But what's significantly different, is the variance of the selection. By this, I mean if you pick each particle randomly from the full set, then the expected value of the number of times a particle will be picked is P*N where P is the normalized w value (the probability a given particle is selected each time), times N, the number of times we select particles randomly. So if we have an N=500, and a w/w_sum normalized value of .01, we expect on average that particle to be picked 5 times. However, when picked randomly, we might select it 20 times in once pass, and 0 in another, etc. The variance on each side of the expected value will be fairly large (I don't know the formula to calculate it, but there must be one).
My fast algorithm, has as fixed variance of 0 to +1. for each resampling pass. So if the expected probability of a particle being picked is 2.5, then the particle will be picked, 2 or 3 times, and nothing else. Ir will never be picked only once, and it will never be picked 4 times, etc.
So this means the fast sampling algorithm creates a very small random variance - much smaller than would be seen with true random sampling and much smaller than seen with the slow version of the code.
But is this bad? That's a complex question. I suspect it's not. I really don't see the advantage of selecting a particle 5 times, when it's "weight" suggests it should be sampled 30 times. But at least, if you selected a heavy weighted particle far fewer times, it means you could (in theory) select a lot of low weighted particles to allow the algorithm to explore a wider range of alternate "theories" about the location of the robot. However, with true random resampling, if one large w particle is selected only 10 when it should be 30, we can also expect another large value particle to be over sampled and selected 50 times, instead of 30. If our weight calculation gave to particles a large value of w, indicating each should be sampled 30 times, I see no advantage to sampling one 10 times, and the other 50 times. These samples end up forming a "cloud" of possible particles around those two starting particles, and there's just no good reason in my mind to allow one cloud to be of population size 10 and the other size 50, when both particles were evaluated as being equally good guesses. So I'm left suspecting that the limitations of this fast sampling code is actually a better than picking randomly.
The important fact of the code, is that it maintains the condition that all particles can be picked, no matter how small their w value is, and it's expected value for picking particles, is correct.
I tested this fast algorithm on both the test in the HW3-6. As far as I can see, it produces the same sort of results as the slow algorithm, except it runs about 40 times faster on average. However, that number of 40 could be a bit off because the speed for the fast function for an N=500 was near the clock tick speed of the time() function on my machine (.016 ms). In testing, I saw numbers from 20x faster to over 100x faster. It's at least an order of magnitude faster, and more important for real time application, it's perforce is highly constant, and does not transform form O(n) to O(n^2) based on how "lost" the filter becomes.
On test case 2), I ran both both algorithms 10,000 total times, using the same generate_ground_truth() values for both algorithms. The final results were:
The "rate" is the percentage of True vs False returns from the check_output() function. We see the the fast code actually performed a little bit better over these trials. In that run, I actually restarted it many times until I got a starting condition here the new code was performing worse than the old code. But as we see, after 10,000 trials, the new code still edged out the old code.
"time" is the total accumulated execution time of the two routines in seconds. This was just for the resampling code above, not all the other code executed for each trial. This run took many hours on my (old slow) PC. But the "fast" code only took up a bit over a 1 minute total of that time, where as the slow version of the code took about 55 minutes total.
rms is a Root Means Square error value for each answer produced by the two filters averaged over all 10,000 samples. The higher the RMS value, the more off the filter was from the real robot location. So again, we see the fast algorithm edged out the slow algorithm with producing slightly better answers.
An important fact about using this algorithm however is that it assumes there will be lots of randomness in the w values. It's the randomness in the w values themselves that this algorithm is exploiting to do its random sampling. It requires that there be randomness in the w values. For real world sensory data coming from real robot sensors, that should never be a problem. The real world is full of great randomness which will manifest itself in the w values. But for toy computer simulations, this could easily become highly biased by poor w values. However, to fix that, the main weakness is in the setting of the initial value of beta. Just by reverting to a single call to random() to initialize that I think would likely fix any bad selection bias created by poor w values (unless the w values were really really bad).
This question is marked "community wiki".
I am not sure avoiding random would not break the mathematical underpinnings of the particle filter. Does it?
With w initialized like this:
the even particles should be 4 times as likely to make it into the output as the odd particles. The proposed algorithm ends up simply copying all 1000 particles to the output.
One might argue that this is an extreme example, and I agree that it is fairly unlikely such a pattern would appear in w in practice, although if one started with fairly evenly distributed particles in an environment where the robot is localizing by sensing ground markings and those markings are in a pattern, it is not impossible to get this kind of regular pattern in w.
Also, in environments where there is not a lot of range in the values the robot can sense (e.g., where it is sensing something that is binary like whether it is on a light square or dark square, as opposed to something continuous such as distance to reference points), as the particles start to converge on the robot, then if the robot is near a boundary where the sense data changes, you might easily get half the particles with one value and half with the other, and that could lead to sections of w exhibiting the kind of pattern given above.
Hi @Curt Welch,
Firstly, a very interesting formulation (also to the others with similar ideas)!
Something in your proposition seems 'off' to me, and after spending a few hours trying to formulate a valid argument, the best I've been able to do is an intuitive argument.
The thing that seems intuitively 'off' to me is that beta is always increased by w_sum/N - this places an undue importance on the ordering of the particles. Two particles, ordered next to each other, with values of w that are less than half w_sum/N, can NEVER both be chosen, where as if by some quirk of implementation they appeared in another order this would be possible. The initial solution allows this to happen in some cases by incrementing beta by a random value.
I like the manner you 'randomly' choose an initial value of beta by using the mantissa of w_sum, maybe something similar using the mantissa of the value of w corresponding to the last chosen particle?
As I said, very interesting possibilities, but will probably need to be studied in relation to the behaviour of the evaluation functions in different scenarios.
I came up with this idea in another thread on particle filters, and though I'd just re-share here...
Particle filters suffer from not representing the long tails of their distributions. They don't keep a low level of particles spread around the state space because their w values fall to such lower levels that they are quickly removed and reallocated. The correct population density of the filter is so low, that most the space ends up with no particles at all once the filter "locks on" to a strong set of good answers.
This means that if the value the filter is attempting to track gets away from it (bad sensor readings, a sudden unexpected external force moving the robot to a new location, etc), the filter could have a lot of difficulty re-acquiring the lost "robot" because it's particles are not spread all over the state space, and they may not spread out quick enough due to move() noise to reacquire the value for an application.
To solve this for any application that suffered unpredictable tracking loss like that on a regular basis, we could scatter a small number of the particles randomly about the state space to create a "fake" long tail distribution. And a simple way to do that I thought, is to only resample a subset of the particles (say 90%), and then fill out the missing 10% by generating new random particles. This would just cause a random sampling of the lowest valued particles, to get thrown out across the state space each pass.
How we would do that, would be tuned to the needs of the application. If the value we tracked with the particle filter could randomly jump to anywhere in the space at times, we would need to scatter particles evenly over the entire space. But if the random jumps tended to be smaller, we could scatter the random particles over a more limited area nearer to the current location. Of course, if the jumps are small enough, then the randomness in the move function would deal with that. But if the jumps were randomly cussed by outside forces, and were too large for the normal spread of particles per the move function, then this technique could come be a way to make the filter reacquire the location much faster.
Hi @Curt Welch,
I went a different way in resampling also. I'll try to reproduce your runtimes on my system.
I posted my "Dutch Wheel" (my name, until I can find a published name to give it prior credit) in another thread. http://www.udacity-forums.com/cs373/questions/25286/fast-resampling-algorithm-dutch-wheel I'd be curious if it runs well in other environments.
I need to revisit my Resample Wheel implementation to make sure I didn't code it poorly.