|
I was thinking of a method for sampling that seems simpler and more intuitive than the resampling wheel method: 2) Randomly choose 1000 particles from the temp list. The particles with the highest probabilities will have the highest chance of being selected as there are more instances of them. What do you think of this method? If you need higher accuracy then you could do the same thing with a list 1000 bigger than the original. |
|
To distinguish 1000 particles you need 10 bits. For programming convenience, let's say you dedicated two bytes (16 bits) to identify each of 1000 particles. If the temporary list has 1,000,000 entries, we're talking about a grand total of 2MB. That's 0.002GB. Is 2MB an amount that runs the risk of using up all of memory these days? Lookup would be quite brisk using such a table, being O(1). There'd be one multiplication (that's a clock cycle) and one list entry retrieval (what's that, another clock cycle?). What are you aware of that's either faster or easier to understand? This approach should not be dismissed. As for losing out on some very low probability particles because we may not have chopped the list finely enough... so what? One entry in a list of 1,000,000 entries has a one-in-a-million chance of being selected. Are we supposed to be worrying about particles that have even less of a chance of being selected, too? If it aches at you that too much RAM is being used, consider making the list length 100,000 instead. Now you're using 200K (0.0002GB) and the process is even faster to set up (but probably runs at the same super-fast speed as before). Hardly any significance is lost when it comes to representing important particles. |
|
@max, following is one elegant way to implement your method #1. Features: 1. No extra memory needed for the new population or temporary array. 2.Preserve / No change to the value of existing arrays and,
2
Yes, that's the naive approach. It has O(N^2) complexity, so you pay the simplicity with run time. @auxV, I did it more or less the same way and I still think it's nicely simple, not naïve. ;-) But @mjl is of course right in that the worst-case complexity isn't very good. 1
Naive not used with a negative connotation here, it should be taken to mean "what first springs to mind and works". |
|
Why 1000 x 1000? I did it this way:
and tmp has ~1000 because sum(w)=1 so sum of all int(w[i]*N) is nearly N (or 1000) |
|
I did the same, first hand. Please note that you can normalize the weights and populate the bin at the same time:
The time complexity seems to be O(2N). Please note that the bin can be made smaller than the sample(without losing much precision), by example by setting N2 = N/10. In that case the time complexity will still be O(2N) but the time complexity of the append loop will only be O(0.1N). I think that the method is not bad, considering its simplicity. 1
Side note, the python random module has a couple of handy methods. One is random.choice(some_list) which just does what it says. @mjl, thanks. I've ran some comparisons(just counting number of loops) between this method (the bin/bucket method?) and the wheel method, and this method soon settles around O(2N), whereas the wheel method tends to O(1.5N) after a sufficiently large number of cycles. So, the wheel method is only marginally better in the long term, when particles are sufficiently grouped together. Regarding space/memory, this method is O(2N) while the wheel method is O(N), which again, is not a big difference. Not to be picky, but isn't writing "O(2N)" a severe abuse of notation? Because by mathematical definition O(aN)=O(N) for all a. Or is this common practice in computer science? It's of course quite clear what is meant. @Emil, I don't know if it's abuse of notation or not. I agree they share the same complexity class, regarding time and space complexity, and that's what O(aN) = O(N) implies. All that said, the real advantage of the wheel method seems to lie not in its performance, but in the fact that it seems to be relatively unbiased, whereas the bin/bucket method clearly isn't; it's discarding beforehand all the particles under a given threshold. @Emil, in the same way as it can be stated that a time complexity can be of the form O(N log N), I assume that it can be of the form O(2N), without necessarily incurring in abuse of notation. As you can see, it can be useful for comparing the performance of algorithms that belong to the same class. The important aspect is, nevertheless, the complexity class, that is, the form of the O() function on N, and in that aspect you're right, O(N) = O(aN) for all a. |
|
so you have 1000 particles, and your temporary list would then contain 1000000 particles. You'll run out of memory very soon. |
|
It's relatively easy to code but uses vastly more memory. Additionally, as you note, you sacrifice a bit of accuracy. Additionally, the runtime for k digit accuracy is Nlog((10^k)N) which is meaningfully worse even for k=2. |
|
My naive approach. Can someone tell me if it is correct?
|
So, as the resampling wheel already picks the particles according to their probabilities, what would be the advantage of this solution? (This is no critic, I'm just curious)