Fast Weighted Random Choice In PHP

Sometimes you may need to randomly select items from a list so that some items are selected more frequently than others. For example, you might take a list of applications and their download counts, and randomly pick a “featured application” based on the number of downloads.

There are several ways to accomplish this in PHP. In this post I’ll show you two approaches to weighted random selection – one suited for a small list of possible choices, and another optimized for a larger number of items.

Simple Weighted Random Selection

Here’s one simple and very common algorithm :

  1. Pick a random number between zero and the sum of all weights.
  2. Scan down the list of choices adding each element’s weight to a counter.
  3. Check if the counter is above or equal to the picked random number. If yes, return the current element. Otherwise go to Step #2.

This algorithm is easy to implement and pretty fast when the number of possible choices is small, or when you only need to do the selection once. Below is a function that takes an array of possible choices and a matching array of weights, and returns a randomly selected element from the first array. You can use any positive integer as a weight.

/**
 * weighted_random_simple()
 * Pick a random item based on weights.
 *
 * @param array $values Array of elements to choose from 
 * @param array $weights An array of weights. Weight must be a positive number.
 * @return mixed Selected element.
 */
function weighted_random_simple($values, $weights){ 
    $count = count($values); 
    $i = 0; 
    $n = 0; 
    $num = mt_rand(0, array_sum($weights)); 
    while($i < $count){
        $n += $weights[$i]; 
        if($n >= $num){
            break; 
        }
        $i++; 
    } 
    return $values[$i]; 
}

To illustrate, here’s an example script will output either A, B, or C with probabilities of 15%, 35% and 50% respectively :

$values = array('A', 'B', 'C');
$weights = array(3, 7, 10);
echo weighted_random_simple($values, $weights);

Randomly Choosing From Thousands Of Elements

The above algorithm can become very slow when the list of choices is large and you need to do several selections. This is because it has to scan the entire array each time.

However, the algorithm can be extended to make it significantly faster. Instead of calculating the total weight (step #1) and the counter (step #2) every time, lets do it only once and store the counter values in a sorted array. Then we’ll be able to use binary search to quickly select the right element.

Here’s the modified script :

/**
 * weighted_random()
 * Randomly select one of the elements based on their weights. Optimized for a large number of elements. 
 *
 * @param array $values Array of elements to choose from 
 * @param array $weights An array of weights. Weight must be a positive number.
 * @param array $lookup Sorted lookup array 
 * @param int $total_weight Sum of all weights
 * @return mixed Selected element
 */
function weighted_random($values, $weights, $lookup = null, $total_weight = null){
	if ($lookup == null) {
		list($lookup, $total_weight) = calc_lookups($values, $weights);
	}
 
	$r = mt_rand(0, $total_weight);
	return $values[binary_search($r, $lookup)];
}
 
/**
 * calc_lookups()
 * Build the lookup array to use with binary search
 *
 * @param array $values 
 * @param array $weights
 * @return array The lookup array and the sum of all weights
 */
function calc_lookups($values, $weights){
	$lookup = array();
	$total_weight = 0;
 
	for ($i=0; $i<count($weights); $i++){
		$total_weight += $weights[$i];
		$lookup[$i] = $total_weight;
	}
	return array($lookup, $total_weight);
}
 
/**
 * binary_search()
 * Search a sorted array for a number. Returns the item's index if found. Otherwise 
 * returns the position where it should be inserted, or count($haystack)-1 if the
 * $needle is higher than every element in the array.
 *
 * @param int $needle
 * @param array $haystack
 * @return int
 */
function binary_search($needle, $haystack)
{
    $high = count($haystack)-1;
    $low = 0;
 
    while ( $low < $high ){
	$probe = (int)(($high + $low) / 2);
	if ($haystack[$probe] < $needle){
            	$low = $probe + 1;
	} else if ($haystack[$probe] > $needle) {
		$high = $probe - 1;
	} else {
		return $probe;
	}
    }
 
    if ( $low != $high ){
    	return $probe;
    } else {
	if ($haystack[$low] >= $needle) {
		return $low;
	} else {
		return $low+1;
	}
    }
}

The above script also contains two new utility functions – calc_lookups which calculates the lookup data, and binary_search which is used to find a randomly picked number in the lookup array. Use the new functions like this :

//Set up the lookups (once)
list($lookup, $total_weight) = calc_lookups($values, $weights);
//....
//Each time you need a weighted random selection :
$val = weighted_random($values, $weights, $lookup, $total_weight);

In Conclusion

To give you an idea of how fast these two algorithms are : I used each one to select a random entry from a list of 10 000 possibilities, 10 000 times in a row. The first algorithm took 13 seconds in total. The improved algorithm took only 0.09 seconds.

Of course, this is not the limit. You can find some interesting hints about even faster algorithms here (Python).

Share :
  • Reddit
  • del.icio.us
  • Digg
  • StumbleUpon
  • DZone
  • Ping.fm
  • Sphinn
Related posts :

One Response to “Fast Weighted Random Choice In PHP”

  1. 1
    cecill's says:

    nice tutorial..

Leave a Reply