Fast Resampling Algorithm

10

It seemed to me that the resampling algorithm explained in the class could be improved on, and this is what I came up with. It rus about 40 times faster (on my machine on the test data in HM3-6) than the code in the homework and as described in the videos. Results seem to right on par with Sebastion's, though in the very long run, it seems to produce slightly better results.

Here's the code from HW 3-6:

    # HW 3-6 code:
    p3 = []
    index = int(random.random() * N)
    beta = 0.0
    mw = max(w)
    for i in range(N):
        beta += random.random() * 2.0 * mw
        while beta > w[index]:
            beta -= w[index]
            index = (index + 1) % N
        p3.append(p[index])
    p = p3

And here's my fast version of the code:

    # Curt Welch Fast Resample Code
    p3 = [0]*N  # minor speed up by pre-allocating list
    index = 0
    w_sum = sum(w)
    step = w_sum / N
    beta = (w_sum * w_sum) % step
    for i in range(N):
        while beta > w[index]:
            beta -= w[index]
            index = (index + 1) % N
        beta += step
        p3[i] = p[index]
    p = p3

As I was getting ready to post this, I see another person posted basically the same idea in this thread: Resampling wheel: is this really faster and better? His code had some minor problems that I believe I addressed in my code by not dealing with the starting values.

The improved algorithm runs faster for two reasons. First, it never calls random() where as the slow code calls random N times. Second, it makes only one loop around the wheel, instead of many.

Using random() is overkill because in any sort of real application of particle filters, we have plenty of random noise in the w[] array to stat with. We can leverage the randomness that already exists in the w[] array to drive our random sampling without having to add more randomness in the form of a PRNG like random(). By not using random() we greatly reduce the CPU load.

Second, as Sebastian codded the particle filter selection wheel, it has a real computational weakness whenever a few particles turn out to be much better guesses than all the others. This is typical of the starting condition, and can reaper whenever the filter "losses" the robot. What happens, is that the one or few particles with the best answer end up dominating a huge slice of the "wheel". If the max w value for example is 1/4 of the entire wheel (maxW / sumW == 1/4), then the slow code will use beta increment of 2.0 * max, or 1/2 of the wheel. This means, that for each resample, the beta value will move (on average), half way around the wheel. So in order to get N samples from the wheel, that inner loop will have to go around the wheel on average N/2 times. Or, for the N=500 from the assignment, that means it must loop around the wheel 250 times. Each loop around the wheel requires that inner loop stepping through all N of the W values. Whenever this happens, the algorithm changes from O(N) into O(N^2). So any time the filter gets "lost" the re-sampling becomes O(N^2) until the filter once again reacquires a good estimation of the state and returns to near O(N).

Someone in that other thread suggested both approaches were O(N). They are not.

I added timing code to the homework assignment code and could see this happen. When it started, the re-sampling code was significantly slower for that first loop (10 to 100 times slower). Due to the type of sensors used in the homework, our robot would not tend to get badly "lost" after it acquired a good guess, so this effect was mostly just seen on the first step. But any real world robot that could get badly confused by unexpected conditions could see this filter algorithm turn O(N^2) and with a large set of particles, that could be a significant draw back.

So to fix this, what I did was to make just one pass around the wheel, and instead of taking random steps, I simply divided the wheel into N fixed slices. These "steps" land on random w slices due to the fact that the location of each slice, is itself a random function of all the w values that comes before it. So by taking fixed non-random steps around the wheel, we are still randomly picking particles. We allow the randomness built into all the w values to create the "random" function that determines which slices we pick for any resampling cycle.

So the code is basically the same as Sebastian's, except we use a far simpler, and faster, "beta += step" for each step. Where "step" is simply "sum(w)/N". This causes the algorithm to make one simple pass around the wheel keeping the code O(N) under all conditions.

The second problem is where the algorithm starts. We can't start at index 0 every time because it would create a bias forcing w[0] to be picked more often than all the other filters. That could be ok of the rest of the system had some known effect to cause the particle at w[0] to be randomized in each cycle. But that would not be expected based on how the rest of the code works. So we need to add some more randomness to control where we start.

Sebastian's code picks a random starting index. That could work. But it's got a minor problem as well. The index it picks will have a high bias of being selected first. But because the index is selected randomly from 1..N, it's violating the goal of selecting biased by weight. What that will do, is cause low value weights to be selected at a higher frequency than they should be. The bias is likely to be so small as to be insignificant, but we can do better.

We need to pick randomly in the "w value" space, instead of the index space. We need to pick a space randomly around the wheel (which is w space not index space). But because once we pick a staring point, we will then select N fixed spaced "spokes" around the wheel, we don't really need to pick a random number from 0 to W. We only need to pick an offset from 0 to W/N for how to "turn" that set of spoke. We do this simply by setting the starting value of beta. We pick a random number from 0 to "step". Using random() we could write it like this:

beta = random.random() * step

That would work great. However, just to allow me to get rid of all calls to random() in the resampling code, I used the sum(w) value as a "free" source of random data bits, and wrote it like this instead:

beta = (w_sum * s_sum) % step

I don't know for sure if that might be introducing a bias to the value of beta. It might be creating a worse bias then the way Sebastian coded it by picking a random index value instead of a random beta value. However, in my testing, it seemed to work well, and it's low performance cost was appealing to me.

This technique however does have significant different selection characteristics than if we truly picked randomly from the full set N times. However, I suspect, this algorithm is actually better than picking randomly (in addition to being much faster and better behaved). That would have to be proven.

It does behave correctly in the fact that probability of a particle being picked by this algorithm does match the required expected value for each particle. But what's significantly different, is the variance of the selection. By this, I mean if you pick each particle randomly from the full set, then the expected value of the number of times a particle will be picked is P*N where P is the normalized w value (the probability a given particle is selected each time), times N, the number of times we select particles randomly. So if we have an N=500, and a w/w_sum normalized value of .01, we expect on average that particle to be picked 5 times. However, when picked randomly, we might select it 20 times in once pass, and 0 in another, etc. The variance on each side of the expected value will be fairly large (I don't know the formula to calculate it, but there must be one).

My fast algorithm, has as fixed variance of 0 to +1. for each resampling pass. So if the expected probability of a particle being picked is 2.5, then the particle will be picked, 2 or 3 times, and nothing else. Ir will never be picked only once, and it will never be picked 4 times, etc.

So this means the fast sampling algorithm creates a very small random variance - much smaller than would be seen with true random sampling and much smaller than seen with the slow version of the code.

But is this bad? That's a complex question. I suspect it's not. I really don't see the advantage of selecting a particle 5 times, when it's "weight" suggests it should be sampled 30 times. But at least, if you selected a heavy weighted particle far fewer times, it means you could (in theory) select a lot of low weighted particles to allow the algorithm to explore a wider range of alternate "theories" about the location of the robot. However, with true random resampling, if one large w particle is selected only 10 when it should be 30, we can also expect another large value particle to be over sampled and selected 50 times, instead of 30. If our weight calculation gave to particles a large value of w, indicating each should be sampled 30 times, I see no advantage to sampling one 10 times, and the other 50 times. These samples end up forming a "cloud" of possible particles around those two starting particles, and there's just no good reason in my mind to allow one cloud to be of population size 10 and the other size 50, when both particles were evaluated as being equally good guesses. So I'm left suspecting that the limitations of this fast sampling code is actually a better than picking randomly.

The important fact of the code, is that it maintains the condition that all particles can be picked, no matter how small their w value is, and it's expected value for picking particles, is correct.

I tested this fast algorithm on both the test in the HW3-6. As far as I can see, it produces the same sort of results as the slow algorithm, except it runs about 40 times faster on average. However, that number of 40 could be a bit off because the speed for the fast function for an N=500 was near the clock tick speed of the time() function on my machine (.016 ms). In testing, I saw numbers from 20x faster to over 100x faster. It's at least an order of magnitude faster, and more important for real time application, it's perforce is highly constant, and does not transform form O(n) to O(n^2) based on how "lost" the filter becomes.

On test case 2), I ran both both algorithms 10,000 total times, using the same generate_ground_truth() values for both algorithms. The final results were:

10000 trials   old rate:  80.4% rms= 11.6   time=3271.06
               new rate:  80.9% rms= 11.3   time= 77.77   42.1 x faster

The "rate" is the percentage of True vs False returns from the check_output() function. We see the the fast code actually performed a little bit better over these trials. In that run, I actually restarted it many times until I got a starting condition here the new code was performing worse than the old code. But as we see, after 10,000 trials, the new code still edged out the old code.

"time" is the total accumulated execution time of the two routines in seconds. This was just for the resampling code above, not all the other code executed for each trial. This run took many hours on my (old slow) PC. But the "fast" code only took up a bit over a 1 minute total of that time, where as the slow version of the code took about 55 minutes total.

rms is a Root Means Square error value for each answer produced by the two filters averaged over all 10,000 samples. The higher the RMS value, the more off the filter was from the real robot location. So again, we see the fast algorithm edged out the slow algorithm with producing slightly better answers.

An important fact about using this algorithm however is that it assumes there will be lots of randomness in the w[] values. It's the randomness in the w values themselves that this algorithm is exploiting to do its random sampling. It requires that there be randomness in the w[] values. For real world sensory data coming from real robot sensors, that should never be a problem. The real world is full of great randomness which will manifest itself in the w values. But for toy computer simulations, this could easily become highly biased by poor w values. However, to fix that, the main weakness is in the setting of the initial value of beta. Just by reverting to a single call to random() to initialize that I think would likely fix any bad selection bias created by poor w values (unless the w values were really really bad).

This question is marked "community wiki".

asked 13 Mar '12, 22:51

Curt%20Welch's gravatar image

Curt Welch
4.9k124574

accept rate: 13%

Hi,
Please pardon me if I did not get it right cos I scanned through quickly your long text :)

Regarding the initial random starting value, beta:
step = w_sum / N
beta = (w_sum * w_sum) % step

Isnt w_sum a multiple of step, hence beta will always be zero?

Liked your nice wheel-spoke type idea for sampling :)

(14 Mar '12, 00:36)

KK

KK's gravatar image

Interesting. To be honest, I'm not completely sure why it'w working now that you point that out. I arrived at the formula by accident but stuck with it due to the fact my test code indicated it was working well to create a random offset. Maybe it needs more careful testing...

I had some idea as to why it would work, but now because of your question, I have to wonder if my thinking was all bogus.

I guess the reason it works is all related to the rounding errors that happen in the computer math. When you multiple two 32 bit numbers on a computer, it produces a 64 bit answer. But then the low order 32 bits are just thrown away. So when you divide to try and reverse the multiplication, you typically don't get back the starting number, but the error is often only one low order bit different.

I guess what's happening is that due to the fact that N is mixed in there as well with w_sum, it accumulates a few random rounding errors. But a few random bits from rounding would not create a good random offset. But honestly, the test results I seemed to be seeing indicated the random distribution between 0 and step was better than you would expect from a few bits of rounding error. But now I have to wonder why. I think more testing is needed.

The idea however is that there should be plenty of randomness in the W_sum value to work with to create the starting random offset with. However, my formula might not have been a good way to leverage that randomness. Or maybe it is, and I just don't understand why yet.

Maybe because W_sum is already part of step, the answer is to not use w_sum again, and just code it as something like:

x = <some fixed="" random="" number="">

beta = x % step.

To be safe, one can always just use: beta = random() * step

(14 Mar '12, 09:56)

Curt Welch

Curt%20Welch's gravatar image

After more testing, I did find that my attempt to randomly set the starting value of beta using:

beta = (w_sum * w_sum) % step

Wasn't as good as I thought. It does work fairly well (which I don't really understand), but it has a strong bias to pick low values of beta.

Only enough, using random() like this:

beta = random.random() * step

which I thought would be safe, doesn't work much better. It too had a odd strong bias to pick low values of beta. I guess it's just a weakness in the Python random() function?

This works the best of everything I've experimented with so far:

beta = (frexp(w_sum)[0] * 2.0 - 1.0) * step

That is using the frexp() function to get the mantissa part of the floating point number, which will be a value from .5 to <1.0., then rescaling it to the rage 0 <= beta < step. So it's using the mantissa of w_sum as the random number source and that seems to work very well. Even without real world sensor data feeding it, the large set of particle filters acts as a large state PRNG producing better quality random numbers than random() does. So by using w_sum as the source of random numbers, we get a better random distribution than if we had used random().

The particle filter seems to overall work equally well with any of these options. Even though some are better random numbers than the others, they all seem to be good enough to get the same quality of performance from the particle filter!

(14 Mar '12, 13:14)

Curt Welch

Curt%20Welch's gravatar image

Hi, I tried random() in the console and it looks random to me :)
Using the mantissa to generate a random float looks like a good idea too. But w_sum should vary enough each time else the mantissa will be close to the previous mantissa. By simple showing the mantissa for numbers 8 to 16, the mantissa is increasing in fixed increments. I wonder if this fixed granularity provides a truly good random number, but should be good enough I guess for this purpose of resampling.

(14 Mar '12, 22:16)

KK

KK's gravatar image

Yes, this algorithm assumes the w value will be very information rich. If they are not, then the choice of using a particle filter was a poor idea to start with. And certainly this re sampling algorithm can only work well in the cases where the w values are information-rich. For all typical real applications of particle filters, the w values will be highly information rich - as they are in our class assignment.

(15 Mar '12, 01:32)

Curt Welch

Curt%20Welch's gravatar image

I would imagine that using beta=random.random()2step (where step is what you have computed above) at every point in the inner loop would be get a more realistic variance while getting the consistency your algorithm gets (and side-stepping unlikely scenarios where w[] has become strangely structured). It will be slightly slower, of course, since it calls the PRNG at every step, but it will not ever become O(n^2) because beta is so much smaller.

Another possibility would be to randomize the order of w before running your systematic approach, which would call for an extra O(n) steps at most (since python implements lists as linked lists). In practice, you could just swap a constant number of indices before each resampling to prevent copies of the same particle from staying clustered together.

(16 Mar '12, 00:30)

David Rutter

David%20Rutter's gravatar image

Essentially your algorithms is an implementation of the Stochastic universal sampling, but I think that it should call random once for the initial beta assignment: beta = random.random() * w_sum.

(16 Mar '12, 11:42)

Eugene Baranov

Eugene%20Baranov's gravatar image

AH, there's a "show more comments" button! I had gotten email notification about the comments but when I came to the page to look for them, I couldn't find them! Now I have!

@quintopia - yes, if you used random()*step in the inner step, that would nicely increase the variance of the picks and the cost would not be all that great. The code would still, on average only take one loop around the "wheel" so the real cost would just be the extra calls to random() but that would slow you down by a factor of about 5 probably. However, if your particle filter input data is truly complex (as it normally will be), there's really no need to do that. The w values will already be better random numbers than a typical language random() function will be. There is no need for a high variance in the selection.

If you are going to call random for each selection, the code posted in the other link below for "An algorithm slightly better than the wheel" is an even better way to go! It's as fast as mine EXCEPT for the fact it has the overhead of calling random() for every particle, which makes it 5 times slower than donging it this way, without calling random(). His is 8 times faster than the algorithm from the class.

The built in random() routines for languages are selected for their trade off between fast and complex. Most applications for random() don't need very complex numbers so they select random functions with low internal state sizes so to keep them reasonably fast.

Here's a visual example of how poor the random() function in Sun's Java is by Tim Tyler:

http://www.alife.co.uk/nonrandom/

I don't know if Python's is any better.

The Python random() is described as: "Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1.". So it's got an effective 20,000 bits of internal state space. That's about 2.5K bytes of state space. In the class code, the random() function is already used for applying all the measurement, movement, and the calculation of the w[] values to start with, so the w values we get have already been generated by the random() function. In addition, the state space of the particle filter gets added to the 2.5K bytes of the random() function, to make the random() function combined with the state of the 500 particles even that much larger. Each particle has 3 floating point numbers in its state, so that's around 64 bits per particle (or 8 bytes), or 24 bytes per particle and 12K bytes of random number state space added to the 2.5K state space of the random() function giving a potential space of 14.5K bytes of PRN. The 500 particles operating as a particle filter is itself a better random() function than random() alone is. And if this were a real robot, with data coming in from the environment though real sensors instead of by a random() function, the state of the molecules in the real environmental would be the source of the randomness that would far exceed the state space of any computer PRNG.

The is a great performance advantage to be gained by not calling random() for every selection when we already given at set of random numbers to pick from to start with. To use random() to try and make it "more random", is a fallacy.

It would increase the variance as you point out, but I don't think there is any need for high variance in particle filters. I think in fact it should work better with the variance profile of this technique than if you created true high variance random sampling.

(16 Mar '12, 12:38)

Curt Welch

Curt%20Welch's gravatar image

Eugene - I looked at your stochastic universal sampling page and yes, it's EXACTLY the same as this algorithm. Thanks for sharing! I just choose to use the particle filter as my random number generator by using w_sum as the random number in the step of the SUS algorithm as:

Start := random number between 0 and F/N

Aka what I coded (after improvements as talked about on this page) as:

beta = (frexp(w_sum)[0] * 2.0 - 1.0) * (w_sum/N)

Particle filters are an example of a genetic algorithm so this is not just a similar application, but in fact the exact same application as SUS. Its fun how these things keep getting re-invented!

(16 Mar '12, 13:01)

Curt Welch

Curt%20Welch's gravatar image

@Curt Welch: You did not comment on my second approach, which would address the tendency of your algorithm to cluster copies of the same particle (plus noise) together in the list of particles, which will decrease the randomness in the particle set over each iteration of the particle filter algorithm, and that is randomness that you are depending on for the algorithm to work. As you know, the only way a typical GA avoids reaching local minima is by adding a suitable amount of mutation at every iteration. For our purposes, the only source of noise is the sensor and bearing noise. In practice, though, we expect the noise from these things to be low, and how do we measure whether they are high enough to compensate for the structure your algorithm will impose over time?

(16 Mar '12, 13:11)

David Rutter

David%20Rutter's gravatar image

Ah, sorry, for some reason I missed that second part of your comment!

I don't see how the particle clustering in the array is a problem. The mutation is not done by the sampling algorithm, it's doing by the sensor and move function. Every particle is mutated based on sensor and motion noise and it's location in the array doesn't bias that mutation process.

The selection's processes most important function is really more about what it chooses to let die, vs what it picks. Clustering in the same part of the array doesn't make bad particles less likely to die by this algorithm. In fact, a problem with true random sampling is that it will at times kill (not select) the very best mutations where as this algorithm is guaranteed never to allow that to happen. The best mutations will always survive with this algorithm, and likewise, some of the worst will be allowed to live as well. With true random sampling, there will be times where it will pick the worst wile killing off the best (those the times are very rare).

Instead of thinking about what this approach picks, it might be better understood as a good way of decimating the weakest particles without harming the strongest. It picks particles based on the average weight value (w_sum/N). Any particle with a w value which is >= the average is guaranteed to be picked at least once where as particles less than the average have an increasing chance of being killed based on how weak they are. The fact that a large number of very weak particles are clustered together in the same part of the array (because they share a common ancestor) or the fact that a very large numbers of very good particles are clustered together (again because of a common ancestor) does not change the fact that they are guaranteed to be picked if they are good, and if they are weak, they are in "trouble" no matter where they are located in the array.

Why do you see the array clustering by ancestor as a problem? There are no "sexual mutation" actions at work in particle filters that would be related to "who is close to who" in the process since it's a single inheritance mutation process. Nothing in the particle filter is dependent on located except this selection algorithm and the algorithms basic properties that controls what lives and what dies don't change based on how they are clustered in the array.

For example, lets say after 10 generations, all the particles left are decedents of only two of the starting particles. This means all the first decedents will be clustered in the first part of the array, and the second part will all be decedents of the second particle. There will be no intermixing of these "families" by location due to the how the algorithm works. But it doesn't matter. Any particle which is has a w value greater than the average, no matter which "family" it is in, is guaranteed to be selected at least one. Particles less than the average, will be randomly dropped, depending on where on the "wheel" they happened to have ended up.

Though there can be a lot of particles between two "spokes" of the "selection wheel", that will ONLY happen, if there are some very "strong" candidates elsewhere since the spacing of the spokes are set to the average W value of the particles. The odds of a weak particle being skipped over are strong, no matter whether it's adjacent to lots of other weak particles, or between two very strong particles.

We could sort the list by w value every time, and then do the selection on it, and it still wouldn't harm (or improve) the basic characteristics of this algorithm as far as I can tell.

(16 Mar '12, 15:33)

Curt Welch

Curt%20Welch's gravatar image

Wow, great work Curt!
You'd be right at home in the Stanford Cryptography course I'm taking online (or the one offered at Udacity later on this spring).

(21 Mar '12, 00:55)

Gibgezr

Gibgezr's gravatar image

@Curt Welch: You agree that the mutation rate from sensor and move noise are too low to force a particular particle to be significantly different from its ancestors. So, really, the whole PF algorithm's goal is just to search for the particle (picked from the outset) that is closest to modelling actuality. So here's another potential algorithm to consider:

Repeatedly pick the top N particles (by the obvious linear-time select-and-filter algorithm), but decrease N with every iteration (the rate used by simulated annealing would probably be adequate, but a more adaptive method would look at the variance in importance weightings, recognizing when a few particles have a much higher importance than the rest and rapidly decreasing N in these cases) until N<threshold. Once at this point, you are very confident that you've picked the best particle from your initial set, so you randomly generate a new set of particles in the vicinity of your best particle and repeat.

How do you think this would compare, in terms of selection, with repeated application of your deterministic resampling algorithm. (Possibly with the same provision of throwing in new particles near the current winners when confidence is high.)

I believe your resampling step would be faster, but less reliable, occasionally throwing out the best candidate early on when uncertainty is high.

(21 Mar '12, 01:18)

David Rutter

David%20Rutter's gravatar image

" So, really, the whole PF algorithm's goal is just to
search for the particle (picked from the outset) that is
closest to modelling actuality."

No, that's really not what is at work here.

Particle filters are a sparse representation of a probability distributions. The whole point is to stop thinking as if there were "one answer" we were trying to find and instead, understand that the entire set of particles IS the answer. This is what probabilistic robotics is all about. Our answer to "where is the robot" is never X,Y, it's always a probability distribution. It's a bell curve.

When we have good information about the robot, our probability distribution will automatically become very narrow. When the data starts to be questionable, the distribution will automatically start to spread out.

There is no such thing as an absolute measurement. All measurements have a certain amount of noise in them. So all measurements are really only a guess as to the value. So if a sensor tells us 12.6, that's is not an absolute fact. There's some chance that it should have reported 12.4, or maybe even 8.2. So the sensory number is not just a precise value. We understand it to be probability distribution centered on the measurement we got. We must learn to think of all measurements and all data not as just single numbers, but as probability distributions. We have to learn to think in terms of probability distributions. All our knowledge should be thought of as various probability distributions.

So our particle filter (all the particles in the set) is itself a complex probability distribution of our current understanding of where the robot is. When we get new measurement data, that measurement data is itself, a probability distribution.

When we combine our particle filter, with the new distribution we just go (sensory data), there's a very precise mathematically correct way to do this. It's the product of two probability distribution. The w[] values is the "answer" to what our new calculated probability distribution should be.

But instead of maintaining actual w[] values for each sample, the entire point of the particle filter is to use the population density of particles across the state space to represent the distribution. So we use the re-sampling algorithm to transform the probability distribution described by the w values, into a valid new population density distribution.

This isn't some random algorithm. It's done how it's done for precise mathematical reasons. It's the mathematics of doing information updates when everything is a probability distribution, instead of absolute values.

"So here's another potential algorithm to consider: ...
Repeatedly pick the top N particles ..."

If you change the resampling algorithm like that you BREAK the mathematics of how probability distributions are combined!

There is no point to it. If your measurement data is in fact fairly certain, then it will have a very narrow distribution, and that narrow distribution, will be reflected in the new w[] values, and will do what you as asking AUTOMATICALLY without you have having to pick random numbers like N out of thin air.

For example, we are localizing a robot in a simple environment, and we start the particle filter with a flat distribution, which means they are scattered evenly over the entire area. We then get a measurement that the robot is 1 m from a wall, and the variance of the measurement is 2 cm. It's a highly accurate measurement.

We then use the accuracy of the measurement as part of our calculation of w values. We do the math correctly to reflect what the odds of robot being 10 ft from the wall, when the measurement of variance 2 cm is reported as 1 m. The w values then mathematically correctly, report the probability of each particle. Most particles are highly unlikely, and will have extremely small w values (10^-6 like values). Where as the particles around the correct location, will have values like 10-2 (1000 times more likely to be correct).

The re sampling algorithm them MUST then convert w values, into a new population density model. It must select the likely particles at a rate 1000 times more likely than the others.

It does not just select the "N most likely" which is a very crud solutions. It very precisely, selects them based on the exactly correct mathematics of probabilities.

Due to the accuracy of the sensing, it might mostly select the top 10 filters. Or if the measurement was a bit less likely, it would mostly select the top 20, with fading numbers of particles selected the future they are away from the likely location.

The point is, it doesn't just use some N value we think might be right. It uses a value which is mathematically correct for the accuracy of the sensory data we used, and for our prior probability represented by the particle population density.

"but a more adaptive method would look at the variance
in importance weightings, recognizing when a few particles
have a much higher importance than the rest and rapidly
decreasing N in these cases"

That's exactly what particle filters are ALREADY DOING! They do that all automatically, and all according to exact mathematically correct procedures for combining probability distributions.

You are suggesting taking a system proven mathematically correct, and dumbing it down with add-hoc solutions that will only make the approach worse!

Now, particle filters, as defined, are already "correct" and automatically do all the adjusting you were getting at. But they depend on being feed accurate information, such as the variance of the measurement data. Those are often just sort of guessed at, and set by hand. If you don't set those correctly, the the particle filter won't produce as good of an answer.

Also, in general, if we want to report a single x,y location of the robot (our current best guess), from the particle filter. We DO NOT look at the particle that seems the most likely and use it's x,y value. The location of the robot is defined by all the particles together. We just average the x,y,theta values across all the particles, and that average is the particle filters current best "guess" as to the location. And if we want to know the variance of the guess, we compute that across all the particles as well (average sum of squares error from the mean).

So at any time, the particle filter can tell us where it thinks the robot is located, and it can t0ell us the accuracy of that guess with real, correctly computed, probabilities.

The point of the particle filter is not to "find the one particle that is the correct location of the robot". The purpose of the particle filter is to create an accurate representation of the location of the robot as a probability distribution!

(21 Mar '12, 17:06)

Curt Welch

Curt%20Welch's gravatar image
showing 10 of 14 show 4 more comments

7 Answers:

I am not sure avoiding random would not break the mathematical underpinnings of the particle filter. Does it?

link

answered 21 Mar '12, 02:57

Sebastian%20Thrun's gravatar image

Sebastian Thrun
21.6k183855

It doesn't seem to, because he is just replacing random with a value self-generated by the list w[]. His assumption is that if we are using a particle filter, we have lots of noise in our w[] list...and he figures if it is so noisy that we need a particle filter, it's noisy enough to use to generate 'random' values for the particle filter. It's an elegant approach, and as long as the measurement noise over all the w[] is not highly correlated, it looks like it works.

The re-sampling algorithm, other than replacing random, chooses numbers with a better correlation to the weighted probability, which shouldn't have an adverse impact.

(21 Mar '12, 11:44)

Gibgezr

Gibgezr's gravatar image
1

As long as the rest of the system is full of high information content noise which shows up in the w[] values, I see no need to add even more noise in the resampling step. The most important part of the mathematical underpending is that the population density of the particles across the state space must he adjusted according to the w values, and this algorithm certainly does that correctly.

Because we can't sample a particle 3.6 times when we translate the w values into a population density, we are forced to round our sampling down to 3, or up to 4, and which way we round that for each particle needs to be random to keep the error of that forced rounding distributed evenly across the filter. However, this algorithm does do that randomly, it just happens to use the w[] values themselves as the random numbers instead of calling a PRNG like random() to get the random numbers.

True random sampling doesn't just round the number down to 3 or up to 4, it adds far more randomness into the process and I honestly don't see the value in doing that. That looks to me to just be adding noise into the process for no good reason a all.

In addition, to add more randomness and noise to the particle filter as a whole, we have other better options we can call on. We can just increase the variance of the noise we are already adding in the move() step for example. I'm not seeing the advantage of adding more true random noise into the resampling step since it does not help the filter search a more disperse area of the state space, it only seems to add more noise to the shape of the filter's distribution curve and I don't understand why adding more noise in that dimension could help the filter perform better.

I'm fairly new to all this (and I learned everything I know from Sebastian's lectures) so there could certainly be something big I'm failing to understand at work here.

(21 Mar '12, 21:53)

Curt Welch

Curt%20Welch's gravatar image

Your proposed algorithms is the "systematic" version of resampling as shown here. Not only is its complexity better, the statistical performance is also suggested to be superior.

link

answered 13 Mar '12, 23:57

David%20Zhang's gravatar image

David Zhang
4.4k112343

Confirmation, I guess, that great minds think alike.

(13 Mar '12, 23:59)

melr

melr's gravatar image

Excellent! Nothing like reinventing the wheel eh! :)

(14 Mar '12, 00:01)

Curt Welch

Curt%20Welch's gravatar image

Here's another forum thread with another interesting implementation (not as fast as the one I talked about there, but still 8x faster than the one in the homework):

An algorithm slightly better than the wheel

link

answered 14 Mar '12, 00:06

Curt%20Welch's gravatar image

Curt Welch
4.9k124574

With w[] initialized like this:

N=1000
w=[1.0/N for i in range(N)]
for i in range(N/2):
    w[2*i] = w[2*i] * 2
    w[2*i+1] = w[2*i+1] / 2

the even particles should be 4 times as likely to make it into the output as the odd particles. The proposed algorithm ends up simply copying all 1000 particles to the output.

One might argue that this is an extreme example, and I agree that it is fairly unlikely such a pattern would appear in w[] in practice, although if one started with fairly evenly distributed particles in an environment where the robot is localizing by sensing ground markings and those markings are in a pattern, it is not impossible to get this kind of regular pattern in w[].

Also, in environments where there is not a lot of range in the values the robot can sense (e.g., where it is sensing something that is binary like whether it is on a light square or dark square, as opposed to something continuous such as distance to reference points), as the particles start to converge on the robot, then if the robot is near a boundary where the sense data changes, you might easily get half the particles with one value and half with the other, and that could lead to sections of w[] exhibiting the kind of pattern given above.

link

answered 15 Mar '12, 00:47

Tim%20Smith's gravatar image

Tim Smith
8565720

edited 15 Mar '12, 00:56

That's right! But I think you have missed an important concept of how this algorithm is used. It assumes the w[] values are a rich source of information and will not be highly structured as in your example. It uses the w[] values as its random information source. That's how it gets away with not using a random() function. If you hand pick a very non-random set of w[] values as you have done in this example, the algorithm will respond with a very poor selection of particles, just exactly as you would expect if you used a very bad random() function to pick them. This algorithm can only work for particle filter applications where the w[] values are a very information rich data source. Your example has what, maybe 2 bits of information total in the entire set of w[] values? That's not information rich. High information content w[] values are the normal type of application were particle filters are used. Particles filters don't get used for application where the type of example you have given above would be a "typical" set of values.

If someone wanted to us a particle filter for an application where all the w values would have one or two possible values (like your example), then this resampling algorithm would be a very poor choice. You would have to inject the randomness in the resampling code itself by using a random() function. But then again, choosing to use particle filters for that application I believe would be a poor choice as well. The whole point of particle filters is dealing large state spaces in high noise environments.

For real world robotics implications, the sensory data coming from the environment contains HUGE amounts of random data that floods the state values of all the particles and in turn, fills the w[] values as each particles "fit" to sensory data is calculated.

But even in a simulated environment like the code for the class, the complexity of the simulation, combined with all the calls to random() in the simulation fills the particle state space with plenty of complexity so that the W values have all the randomness they need to make the resampling function work just fine with this algorithm.

But anyone considering using this algorithm must make sure that their application does have plenty of complexity to it and that w[] values like in your example would never be the "norm" for their application. It's harmless if the they show up at special times, like at the start of the particle filter, as long as it's such highly structured values are only a rare exception.

(15 Mar '12, 01:24)

Curt Welch

Curt%20Welch's gravatar image

Hi @Curt Welch,

Firstly, a very interesting formulation (also to the others with similar ideas)!

Something in your proposition seems 'off' to me, and after spending a few hours trying to formulate a valid argument, the best I've been able to do is an intuitive argument.

The thing that seems intuitively 'off' to me is that beta is always increased by w_sum/N - this places an undue importance on the ordering of the particles. Two particles, ordered next to each other, with values of w[] that are less than half w_sum/N, can NEVER both be chosen, where as if by some quirk of implementation they appeared in another order this would be possible. The initial solution allows this to happen in some cases by incrementing beta by a random value.

I like the manner you 'randomly' choose an initial value of beta by using the mantissa of w_sum, maybe something similar using the mantissa of the value of w[] corresponding to the last chosen particle?

As I said, very interesting possibilities, but will probably need to be studied in relation to the behaviour of the evaluation functions in different scenarios.

link

answered 21 Mar '12, 19:22

yasho's gravatar image

yasho
336311

"Two particles, ordered next to each other, with
values of w[] that are less than half w_sum/N,
can NEVER both be chosen"

Yeah, that's the interesting ramification. So far however, I can't spot any reason that dependency is bad. It certainly does create a clear non-random selection bias for a given pass. The other thing at work with the algorithm, that ties to that, is that when when articles are selected multiple times, then end up next together in the array. So this means articles that are next together, are likely to have a common recent ancestor, which also implies they are likely to be close together in the state space. But again, I've not yet thought of an example of where any of this is bad. If 10 particles in a row all have w values less than half the average, that means somewhere else in the list, are 10 particles with 2* the average, and these 10 low values particles need to be thinned because they are "bad" guesses. So what's the harm that they are thinned following the Roman decimation policy of counting 123123 down the line and killing every one numbered 1?

The only way I can think that any of this would actually be a problem is if the state space itself, or the state transition (move()) functions were highly structured instead of chaotic. If particles became perfectly spaced out in a 3x10 grid of the state space, and then the re sampling picking 1 out of 3 removed all the particles from the right column because of it's patterned sampling, that would create problems.

I think the largest, and most likely, problem to run into would be if the algorithm for calculating w values produced highly structured low informational content w values, like assigning .5 or 1 to every particle and no other value. This approach would not work as well for such low information content w values. This approach assumes high information content w values to cause a fairly random logical scattering of the particle's position around the wheel.

(21 Mar '12, 20:52)

Curt Welch

Curt%20Welch's gravatar image

Hi,

After a night spent thinking about this, I have still not been able to come up with an non 'hand waving' explanation why it is important to keep this possibility as part of the resampling process, except that the position of a particle in a data structure should not affect the result, but then we are talking about probabilistic algorithms where lots of things seem are not straight forward at first sight.

I actually have another question/comment (though maybe @SebastianThrun can give a more informed opinion):
The primary benefit of this resampling is that it takes less time by avoiding the random function. , however it adds restrictions/expectations on the nature of w[]. I realize that any method of reducing the computational load is a good thing, but for particle filters implementations (used in SLAM for example), how critical is this?

Yasho

(22 Mar '12, 06:37)

yasho

yasho's gravatar image

I came up with this idea in another thread on particle filters, and though I'd just re-share here...

Particle filters suffer from not representing the long tails of their distributions. They don't keep a low level of particles spread around the state space because their w values fall to such lower levels that they are quickly removed and reallocated. The correct population density of the filter is so low, that most the space ends up with no particles at all once the filter "locks on" to a strong set of good answers.

This means that if the value the filter is attempting to track gets away from it (bad sensor readings, a sudden unexpected external force moving the robot to a new location, etc), the filter could have a lot of difficulty re-acquiring the lost "robot" because it's particles are not spread all over the state space, and they may not spread out quick enough due to move() noise to reacquire the value for an application.

To solve this for any application that suffered unpredictable tracking loss like that on a regular basis, we could scatter a small number of the particles randomly about the state space to create a "fake" long tail distribution. And a simple way to do that I thought, is to only resample a subset of the particles (say 90%), and then fill out the missing 10% by generating new random particles. This would just cause a random sampling of the lowest valued particles, to get thrown out across the state space each pass.

How we would do that, would be tuned to the needs of the application. If the value we tracked with the particle filter could randomly jump to anywhere in the space at times, we would need to scatter particles evenly over the entire space. But if the random jumps tended to be smaller, we could scatter the random particles over a more limited area nearer to the current location. Of course, if the jumps are small enough, then the randomness in the move function would deal with that. But if the jumps were randomly cussed by outside forces, and were too large for the normal spread of particles per the move function, then this technique could come be a way to make the filter reacquire the location much faster.

link

answered 21 Mar '12, 22:17

Curt%20Welch's gravatar image

Curt Welch
4.9k124574

I have in fact encountered a number of implementations (including a common implementation of SLAM using particle filters by Grisetti) that do something very similar: On recognizing a situation where none of the active particles has a value of w[] above a set threshold, the randomly pick new particles spread in the environment.

(22 Mar '12, 06:25)

yasho

yasho's gravatar image

I guess it's an obvious solution! Using low values of w to trigger it could be tricky depending on the application and how w was calculated. In a robotics system where you had multiple sensors feeding you data, having one sensor go bad (just lose GPS input for example under an overpass) could cause the average w values to drop, but not because you had lost the location, but just because your certainty had dropped (variance increased). So one would have to be careful how they used that. I was thinking that constant n% approach was attractive just because it avoided having to define what level of w was correct for triggering it. You could of course combine both, and use a very low l% level by default (1%) and then if w values got low increase the % level to speed up the search.

It seems to me however, that while the confidence was high, most the random particles would get filtered on the next pass. But as confidence drops, more and more of the randomly scattered particles would automatically be just as good as the rest, so they would start to survive automatically causing the particles to disperse throughout the space very quickly, with little to no need to be conditional on some level of w to trigger the "assist". It would be fun to experiment with....

(22 Mar '12, 10:54)

Curt Welch

Curt%20Welch's gravatar image

Hi @Curt Welch,

I went a different way in resampling also. I'll try to reproduce your runtimes on my system.
I like the straightforward approach you posted. It's a good mix of speedup, without introducing space overhead or complexity.

I posted my "Dutch Wheel" (my name, until I can find a published name to give it prior credit) in another thread. http://www.udacity-forums.com/cs373/questions/25286/fast-resampling-algorithm-dutch-wheel I'd be curious if it runs well in other environments.

I need to revisit my Resample Wheel implementation to make sure I didn't code it poorly.

#Codzart

link

answered 21 Mar '12, 23:38

Douglas%20Preston's gravatar image

Douglas Preston
2994922

Also, the effect of beta = (w_sum * w_sum) % step would be that the first loop always assigns p3[0] = p[0]

Not likely to affect the outcome.

(22 Mar '12, 02:58)

Douglas Preston

Douglas%20Preston's gravatar image
Your answer
Question text:

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags

×7,522
×117
×107
×37

Asked: 13 Mar '12, 22:51

Seen: 3,093 times

Last updated: 22 Mar '12, 10:54