# Resampling wheel alternative

 6 2 I was thinking of a method for sampling that seems simpler and more intuitive than the resampling wheel method: 1) create a temporary list that is 100 times bigger than the original, where the particles are repeated according to their probabilities. So if a particle has a probability of 0.22, then there will be 22 (0.22*100) copies of that particle in the temp list. 2) Randomly choose 1000 particles from the temp list. The particles with the highest probabilities will have the highest chance of being selected as there are more instances of them. What do you think of this method? If you need higher accuracy then you could do the same thing with a list 1000 bigger than the original. asked 05 Mar '12, 19:47 Omer Yousuf 181●1●6●13 accept rate: 0% Anne Paulson 9.5k●32●76●98 So, as the resampling wheel already picks the particles according to their probabilities, what would be the advantage of this solution? (This is no critic, I'm just curious) (06 Mar '12, 12:00) uli

 5 To distinguish 1000 particles you need 10 bits. For programming convenience, let's say you dedicated two bytes (16 bits) to identify each of 1000 particles. If the temporary list has 1,000,000 entries, we're talking about a grand total of 2MB. That's 0.002GB. Is 2MB an amount that runs the risk of using up all of memory these days? Lookup would be quite brisk using such a table, being O(1). There'd be one multiplication (that's a clock cycle) and one list entry retrieval (what's that, another clock cycle?). What are you aware of that's either faster or easier to understand? This approach should not be dismissed. As for losing out on some very low probability particles because we may not have chopped the list finely enough... so what? One entry in a list of 1,000,000 entries has a one-in-a-million chance of being selected. Are we supposed to be worrying about particles that have even less of a chance of being selected, too? If it aches at you that too much RAM is being used, consider making the list length 100,000 instead. Now you're using 200K (0.0002GB) and the process is even faster to set up (but probably runs at the same super-fast speed as before). Hardly any significance is lost when it comes to representing important particles. answered 07 Mar '12, 12:47 melr 3.6k●19●30●41
 2 @max, following is one elegant way to implement your method #1. Features: 1. No extra memory needed for the new population or temporary array. 2.Preserve / No change to the value of existing arrays and, 3. Minimum calculations (just 2 loops and no multiplication / division) p3 = [] sumW = sum(w) # sum of all weights for i in range(N): # sample N times val = random.uniform(0,sumW) # generate a random number in range (0,sumW) tmp = 0 for j in range(N): # chose the index according to the marks tmp += w[j] if tmp >= val : # if found the proper index to p array p3.append(p[j]) # chose it as the sampling outcome, and break # skip to the next round of sampling  answered 07 Mar '12, 13:20 avxV 763●4●8●22 2 Yes, that's the naive approach. It has O(N^2) complexity, so you pay the simplicity with run time. (07 Mar '12, 14:56) Martin J. La... @auxV, I did it more or less the same way and I still think it's nicely simple, not naïve. ;-) But @mjl is of course right in that the worst-case complexity isn't very good. (08 Mar '12, 05:09) Emil 1 Naive not used with a negative connotation here, it should be taken to mean "what first springs to mind and works". (08 Mar '12, 07:37) Martin J. La...
 2 This is my approach, that seems simpler and more intuitive than the resampling wheel method p3 = [] s = sum(w) while len(p3) != N: r = int(random.random()* N) p_i = random.random() if (p_i < w[r]/s) : p3.append(p[r])  answered 09 Mar '12, 10:54 Mirko 43●4 1 Let's check the intuitive part, p_i may satisfy: p_i < w[r]/s and p_i < w[r-1]/s and p_i < w[r-2]/s and ... Let's say N-1 particles have α weight and 1 has 2α so that α(N-1)+2α=1. A. What is the probability you will pick up that particle? 1/N (from int(random.random()* N)) B. What is the probability for that particle to pass the test? 2α (from p_i < w[r]/s == p_i < 2α => P=(2α - 0)/(1 - 0)) Probability of that particle to be sampled? P(A and B) = 2α/N in one try => N tries P = 2α Should be? P = 2α It seems to work. (09 Mar '12, 11:13) Ruslan Ciurca Seems really brilliant!!! (09 Mar '12, 12:04) Vadim Lebedev But its time complexity is O(N²). See below, Pavel Kulikov's method is similar, and has some performance improvements. (09 Mar '12, 12:53) mauro-1
 1 Why 1000 x 1000? I did it this way: s = sum(w) for i in range(N): w[i] = w[i]/s tmp = [] for i in range(N): num = int(w[i]*N) for j in range(num): tmp.append(p[i]) for i in range(N): index = int(random.random() * len(tmp)) p3.append(tmp[index])  and tmp has ~1000 because sum(w)=1 so sum of all int(w[i]*N) is nearly N (or 1000) answered 06 Mar '12, 12:03 Ruslan Ciurca 2.9k●7●21●25
 1 I did the same, first hand. Please note that you can normalize the weights and populate the bin at the same time: s = sum(w) # Build a bin according to the (normalized) weights b = [] N2 = N for i in range(N): n = w[i] / s * N2 for j in range(int(n)): b.append(i) # Choose N times from the bin randomly for i in range(N): r=int(random.random()*len(b)) c=b[r] p3.append(p[c])  The time complexity seems to be O(2N). Please note that the bin can be made smaller than the sample(without losing much precision), by example by setting N2 = N/10. In that case the time complexity will still be O(2N) but the time complexity of the append loop will only be O(0.1N). I think that the method is not bad, considering its simplicity. answered 07 Mar '12, 13:04 mauro-1 244●2●8 1 Side note, the python random module has a couple of handy methods. One is random.choice(some_list) which just does what it says. (07 Mar '12, 14:57) Martin J. La... @mjl, thanks. I've ran some comparisons(just counting number of loops) between this method (the bin/bucket method?) and the wheel method, and this method soon settles around O(2N), whereas the wheel method tends to O(1.5N) after a sufficiently large number of cycles. So, the wheel method is only marginally better in the long term, when particles are sufficiently grouped together. Regarding space/memory, this method is O(2N) while the wheel method is O(N), which again, is not a big difference. (08 Mar '12, 04:17) mauro-1 Not to be picky, but isn't writing "O(2N)" a severe abuse of notation? Because by mathematical definition O(aN)=O(N) for all a. Or is this common practice in computer science? It's of course quite clear what is meant. (08 Mar '12, 05:42) Emil @Emil, I don't know if it's abuse of notation or not. I agree they share the same complexity class, regarding time and space complexity, and that's what O(aN) = O(N) implies. All that said, the real advantage of the wheel method seems to lie not in its performance, but in the fact that it seems to be relatively unbiased, whereas the bin/bucket method clearly isn't; it's discarding beforehand all the particles under a given threshold. (08 Mar '12, 05:48) mauro-1 @Emil, in the same way as it can be stated that a time complexity can be of the form O(N log N), I assume that it can be of the form O(2N), without necessarily incurring in abuse of notation. As you can see, it can be useful for comparing the performance of algorithms that belong to the same class. The important aspect is, nevertheless, the complexity class, that is, the form of the O() function on N, and in that aspect you're right, O(N) = O(aN) for all a. (10 Mar '12, 08:00) mauro-1
 0 so you have 1000 particles, and your temporary list would then contain 1000000 particles. You'll run out of memory very soon. answered 05 Mar '12, 19:48 David Zhang 4.3k●11●23●43
 0 It's relatively easy to code but uses vastly more memory. Additionally, as you note, you sacrifice a bit of accuracy. Additionally, the runtime for k digit accuracy is Nlog((10^k)N) which is meaningfully worse even for k=2. answered 05 Mar '12, 19:53 Adam Sherwin ♦♦ 17.9k●21●76●125
 0 Rusian Ciurca, That's essentially what I did, and since sometimes rounding can give less than N, I doubled it. I don't see any need to make it 100 or 1000 times larger, as max did. So memory-wise, it doesn't seem bad. And I don't think your approach loses any accuracy. I'm curious what flaws, if any, there are in this approach. answered 06 Mar '12, 22:34 Mike McGurrin 475●2●3●9 1 With Ciurca's approach, if any particle has less than 1/1000 probability, then it won't get added at all to the tmp list as int(1/1001 *1000) =0. So this approximates a probability of 0 to any particle with less than 1/1000 probability. (06 Mar '12, 23:36) Omer Yousuf In which case it probably has sense to use "while j < w[i] * N" rather than "for ..." against "int(w[i] * N)". (07 Mar '12, 06:33) Ruslan Ciurca
 0 How about this implementation? sumw=sum(w) maxw=max(w) for i in range(N): take=0 while not take: index=int(random.random()*N) if w[index]/sumw>random.random()*maxw*N/2.0: take=1 p3.append(p[index])  answered 08 Mar '12, 04:56 Pavel Kulikov 30●6 It works, but it's O(N^2). (08 Mar '12, 05:21) mauro-1 changing code a little with * maxw * N/2.0 is faster and still correct (08 Mar '12, 05:24) Pavel Kulikov Much better in terms of time complexity, but it's still around O(20N) (08 Mar '12, 05:30) mauro-1
 0 My naive approach. Can someone tell me if it is correct? normalizer = sum(w) w = [wt/normalizer for wt in w] s = 0 buckets = [] for wt in w: s+=wt buckets.append(s) random.seed() N = len(p) def inbetween(num, s, e, li): """binary search for the particle index""" pos = s+(e-s)/2 if num<=li[pos] and num>li[pos-1]: return pos if num>li[pos] and num<=li[pos+1]: return pos+1 (s,e) = (s,pos) if li[pos]>num else (pos,e) return inbetween(num,s,e,li) random.seed() p3 = [p[inbetween(random.random(), 0, N, buckets)] for i in range(N)]  answered 09 Mar '12, 02:07 Write a couple tests without random numbers. (09 Mar '12, 13:19) Charles Merriam
Question text:

Markdown Basics

• *italic* or _italic_
• **bold** or __bold__
• image?![alt text](/path/img.jpg "Title")
• numbered list: 1. Foo 2. Bar
• to add a line break simply add two spaces to where you would like the new line to be.
• basic HTML tags are also supported

×5,185
×88
×62
×57
×18