Fast Weighted Random Choice In PHP
Sometimes you may need to randomly select items from a list so that some items are selected more frequently than others. For example, you might take a list of applications and their download counts, and randomly pick a “featured application” based on the number of downloads.
There are several ways to accomplish this in PHP. In this post I’ll show you two approaches to weighted random selection – one suited for a small list of possible choices, and another optimized for a larger number of items.
Simple Weighted Random Selection
Here’s one simple and very common algorithm :
- Pick a random number between zero and the sum of all weights.
- Scan down the list of choices adding each element’s weight to a counter.
- Check if the counter is above or equal to the picked random number. If yes, return the current element. Otherwise go to Step #2.
This algorithm is easy to implement and pretty fast when the number of possible choices is small, or when you only need to do the selection once. Below is a function that takes an array of possible choices and a matching array of weights, and returns a randomly selected element from the first array. You can use any positive integer as a weight.
/** * weighted_random_simple() * Pick a random item based on weights. * * @param array $values Array of elements to choose from * @param array $weights An array of weights. Weight must be a positive number. * @return mixed Selected element. */ function weighted_random_simple($values, $weights){ $count = count($values); $i = 0; $n = 0; $num = mt_rand(0, array_sum($weights)); while($i < $count){ $n += $weights[$i]; if($n >= $num){ break; } $i++; } return $values[$i]; }
To illustrate, here’s an example script will output either A, B, or C with probabilities of 15%, 35% and 50% respectively :
$values = array('A', 'B', 'C'); $weights = array(3, 7, 10); echo weighted_random_simple($values, $weights);
Randomly Choosing From Thousands Of Elements
The above algorithm can become very slow when the list of choices is large and you need to do several selections. This is because it has to scan the entire array each time.
However, the algorithm can be extended to make it significantly faster. Instead of calculating the total weight (step #1) and the counter (step #2) every time, lets do it only once and store the counter values in a sorted array. Then we’ll be able to use binary search to quickly select the right element.
Here’s the modified script :
/** * weighted_random() * Randomly select one of the elements based on their weights. Optimized for a large number of elements. * * @param array $values Array of elements to choose from * @param array $weights An array of weights. Weight must be a positive number. * @param array $lookup Sorted lookup array * @param int $total_weight Sum of all weights * @return mixed Selected element */ function weighted_random($values, $weights, $lookup = null, $total_weight = null){ if ($lookup == null) { list($lookup, $total_weight) = calc_lookups($weights); } $r = mt_rand(0, $total_weight); return $values[binary_search($r, $lookup)]; } /** * calc_lookups() * Build the lookup array to use with binary search * * @param array $weights * @return array The lookup array and the sum of all weights */ function calc_lookups($weights){ $lookup = array(); $total_weight = 0; for ($i=0; $i<count($weights); $i++){ $total_weight += $weights[$i]; $lookup[$i] = $total_weight; } return array($lookup, $total_weight); } /** * binary_search() * Search a sorted array for a number. Returns the item's index if found. Otherwise * returns the position where it should be inserted, or count($haystack)-1 if the * $needle is higher than every element in the array. * * @param int $needle * @param array $haystack * @return int */ function binary_search($needle, $haystack) { $high = count($haystack)-1; $low = 0; while ( $low < $high ){ $probe = (int)(($high + $low) / 2); if ($haystack[$probe] < $needle){ $low = $probe + 1; } else if ($haystack[$probe] > $needle) { $high = $probe - 1; } else { return $probe; } } if ( $low != $high ){ return $probe; } else { if ($haystack[$low] >= $needle) { return $low; } else { return $low+1; } } }
The above script also contains two new utility functions – calc_lookups
which calculates the lookup data, and binary_search
which is used to find a randomly picked number in the lookup array. Use the new functions like this :
//Set up the lookups (once) list($lookup, $total_weight) = calc_lookups($weights); //.... //Each time you need a weighted random selection : $val = weighted_random($values, $weights, $lookup, $total_weight);
In Conclusion
To give you an idea of how fast these two algorithms are : I used each one to select a random entry from a list of 10 000 possibilities, 10 000 times in a row. The first algorithm took 13 seconds in total. The improved algorithm took only 0.09 seconds.
Of course, this is not the limit. You can find some interesting hints about even faster algorithms here (Python).
Related posts :
nice tutorial..
With the calc_lookups function, you can remove the count($weights) from the parameters of the for loop to speed up execution:
$cweights = count($weights);
for ($i=0; $i<count($weights); $i++) {
instead of:
for ($i=0; $i<count($weights); $i++) {
that way the 'count' function isn't called each iteration of the loop.
You could also do small things like remove function paramters from the main function and just force a call to list($lookup, $total_weight) = calc_lookups($values, $weights);, or use incrementation like ++$i; instead of $i++, which executes faster.
This is an amazing article. I really didn’t want to write this function myself! You saved me a lot of work 🙂
@Nitin: Compiler writers good at making such optimizations as assigning values only once, if the the variable is independent of the loop and many more. We just need to write the code that looks good for us. Even then with your solution, it would be more to write as
for ($i=0, $cweights = count($weights); $i < count($weights); $i++) {
than
$cweights = count($weights);
for ($i=0; $i<count($weights); $i++) {
btw, the article seems to be look. I haven't read it yet.
Nice. But biased to the first element.
function weighted_random_simple($values, $weights){
$total = array_sum($weights);
$n = 0;
$num = mt_rand(1, $total);
foreach ($values as $i => $value) {
$n += $weights[$i];
if ($n >= $num){
return $values[$i];
}
}
}
in mt_rand() the $min value must be 1, not zero, since $min and $max are both included. Begining with zero you give the first element the probability of zero plus the probability of the weight.
[…] for: php random string with weighting – Google Search Good call, this seems a good solution – Fast Weighted Random Choice In PHP | W-Shadow.com Cheers for the help! __________________ Flickr me | My WordPress […]
Am I overlooking something, or is $values not really needed by the calc_lookup() function?
Thanks for the code I think I’ll use it.
You’re right. Edited.
Btw using a function in the for statement will execute the function on every loop so it will be faster if you take count() outside of the loop.
$cweights = count($weights);
for ($i=0; $i<$cweights; $i++) {
Also you may want to give the parameters as key-value pairs, not two separate arrays.
Array (
‘id’ => ‘weight’
}
I think instead of
$num = mt_rand(0, array_sum($weights));
you should use
$num = mt_rand(1, array_sum($weights));
because mt_rand includes both min and max values in its outputs.
With the current version, if you have
$weights = [0, 1];
you will still get a fifty-fifty distribution among the two values (which is obviously not desired).
[…] This tutorial walks you through it, in PHP, with multiple cut and paste solutions. Note that this routine is slightly modified from what you’ll find on that page, as a result of the comment below. […]
[…] This tutorial walks you through it, in PHP, with multiple cut and paste solutions. Note that this routine is slightly modified from what you’ll find on that page, as a result of the comment below. […]
[…] 本教程将向您介绍PHP中的多种剪切和粘贴解决方案。请注意,由于下面的注释,对该例程进行了一些修改,与您在该页面上看到的有所不同。 […]
[…] This tutorial walks you through it, in PHP, with multiple cut and paste solutions. Note that this routine is slightly modified from what you’ll find on that page, as a result of the comment below. […]