just normal random? that's 1999? what time it is now?
Weighted Random allows one thing have more chance to pop up than the other! and you define the weight(chance)!
When I first met PHP, I tried to make a lot of scripts, weighted random is one of them. I soon forgot about it after I build it. Now, after a long time of reading other people's code, I think I can improve my first version a lot.
My first version
function rand_string_pro($array){
foreach($array as $var){
$i = 0;
while($i < $var['w']){
$string[] = $var['s'];
$i++;
}
}
return $string[array_rand($string)];
}
The input is a array like this(where $i can be any interger)
$array[$i]['s'] = 'string';
$array[$i]['w'] = 30;//only interger
I thought that code is perfect, until weeks ago I saw the flaws and start writing improvements. The list of flaws:
*huge weight can slow the script down, even produce memory overflow
*huge string with reasonable amount of weight can produce memory overflow too
*it does not seems FAST...
This list does not include error handling, because I don't think I need to program something to notice kind experienced programmers. Yes Yes, I know it is a bad act.
I did some improvement and released my 1.0 version
/*
Weighted Random Ver 1.0 by Mgccl(mgcclx@gmail.com)
Usage: input an numbered array
$array[$i][?s?] is the string you want to return
$array[$i][?w'] is the weight of the string
you can allow the function to find the GCD
sometimes it speeds up the script
To use GCD, use the function like
rand_string_pro($array, true);
Note: You need the math functions I wrote in
order to use GCD, you can find them here:
/2006/11/21/some-math-functions/
*/
function rand_string_pro($seed, $gcd = false){
$count = count($seed);
if($gcd == true){
foreach($seed as $var){
$gcd_array[] = $var[?w?];
}
$gcd = math_gcd_array($gcd_array);
if($gcd != 1){
$i = 0;
while($i < $count){
$seed[$i][?w?] /= $gcd;
++$i;
}
}
unset($gcd, $gcd_array);
}
$i = 0;
while($i < $count){
$n = 0;
while($n < $seed[$i][?w?]){
$key[] = $i;
++$n;
}
++$i;
}
return $seed[$key[mt_rand(0, count($key)-1)]][?s?];
}
The new version have these improvements:
* Weight divided by the GCD so we have smaller weights
* Produce a new array to store the key, and the key refer to the "key name" of the variable seed. No more huge array in the size of value
* Used ++$i instead of $i++, small speed up
But there is still a flaw in the system. It still creates a huge array if the GCD is small compare to the weight. The performance could still be slow.
I released the version 1.1, more like a usability improvement version
/*
Weighted Random Ver 1.1 by Mgccl(mgcclx@gmail.com)
Update: Nov/25/06 allow non-weighted random, separate the
string storing array and the weight storing array
Usage: input $array[$i] = 'string' format(where $i is a number)
$array[$i]is the string you want to return
$weight[$i]is the weight of the string
if one of the $array[$i] does not have a
$weight[$i] as a match, it automatically
set $weight[$i] as 1
To allow use weighted function, call the function like this
rand_string_pro($array, $weight);
you can allow the function to find the GCD
sometimes it speeds up the script
To use GCD, use the function like
rand_string_pro($array, $weight, true);
Note: You need the math functions I wrote in
order to use GCD, you can find them here:
/2006/11/21/some-math-functions/
*/
function rand_string_pro($seed, $weighted = false, $gcd = false){
$count = count($seed);
if($weighted === false){
return $seed[mt_rand(0, $count - 1)];
}else{
if($gcd === true){
if(count($weighted) != $count){
}else{
foreach($weighted as $var){
$gcd_array[] = $var;
}
$gcd = math_gcd_array($gcd_array);
if($gcd != 1){
$i = 0;
while($i < count($gcd_array)){
$weighted[$i] /= $gcd;
++$i;
}
}
unset($gcd, $gcd_array);
}
}
$i = 0;
while($i < $count){
if(empty($weighted[$i])){
$key[] = $i;
}else{
$n = 0;
while($n < $weighted[$i]){
$key[] = $i;
++$n;
}
}
++$i;
}
return $seed[$key[mt_rand(0, count($key)-1)]];
}
}
this version separated the input of the string array and the weight array
it also made the function usable when there is no weight present.
And finally, Version 1.2 is where I made the speed to the max.
/*
Weighted Random Ver 1.2 by Mgccl(mgcclx@gmail.com)
http://mgccl.com
Update: Nov/29/06
Faster Speed
allow non-weighted random, separate the
string storing array and the weight storing array
Usage: input $array[$i] = 'string' format(where $i is a number)
$array[$i]is the string you want to return
$weight[$i]is the weight of the string
if one of the $array[$i] does not have a
$weight[$i] as a match, it automatically
set $weight[$i] as 1
To allow use weighted function, call the function like this
rand_string_pro($array, $weight);
*/
function rand_string_pro($seed, $weighted = false){
$count = count($seed);
if($weighted === false){
return $seed[mt_rand(0, $count - 1)];
}else{
$i = 0; $n = 0;
$num = mt_rand(0, array_sum($weighted) + ($count-count($weighted)));
while($i < $count){
if(empty($weighted[$i])){
++$n;
}else{
$n += $weighted[$i];
}
if($n >= $num){
break;
}
++$i;
}
return $seed[$i];
}
}
This code is way more faster than the old version because:
1. The loop will break at the selected string
2. It use one numeric variable instead of a huge array
Basically, this one will be the fastest code possible until anyone find anything faster. And I have t o post my final copy of the code. Version 1.3 Alpha, it have some more functions:
/*
Weighted Random Ver 1.3 by Mgccl(mgcclx@gmail.com)
Update: Dec/4/06
Allow user to input the amount of the array to be chose
Allow user to chose only unique items
Update: Nov/29/06
Faster speed
allow non-weighted random, separate the
string storing array and the weight storing array
Usage: input $array[$i] = 'string' format(where $i is a number)
$array[$i]is the string you want to return
$weight[$i]is the weight of the string
if one of the $array[$i] does not have a
$weight[$i] as a match, it automatically
set $weight[$i] as 1
To allow use weighted function, call the function like this
rand_string_pro($array, $weight);
chose the amount you want to pick
rand_string_pro($array, $weight, 3);
chose only the unique
rand_string_pro($array, $weight, 3, 1);
*/
function rand_string_pro($array, $weight = 0, $amount = 1, $unique = 0){
$count = count($array);
if($amount == 1){
if($weight == 0){
return $array[mt_rand(0, $count - 1)];
}else{
$weighted_sum = array_sum($weight) + ($count-count($weight));
$i = 0; $n = 0;
$num = mt_rand(0, $weighted_sum);
while($i < $count){
if(empty($weight[$i])){
++$n;
}else{
$n += $weight[$i];
}
if($n >= $num){
break;
}
++$i;
}
return $array[$i];
}
}else{
if($weight == 0){
$rand = array_rand($array, $amount);
foreach ($rand as $var){
$result[] = $array[$var];
}
return $result;
}else{
if($unique == 0){
$max = array_sum($weight)+ $count - count($weight);
$i = 0;
while($i<$amount){
$random[] = mt_rand(0, $max);
++$i;
}
sort($random);
$i = 0; $n = 0; $m = 0;
$random_count = count($random);
while(1){
while($m >= $random[$i]){
$key[] = $array[$n-1];
++$i;
if($i == $random_count){
break 2;
}
}
if($weight[$n]){
$m += $weight[$n];
}else{
++$m;
}
++$n;
}
}else{
$i = 0;
if($amount >= $count){
return $array;
}else{
$weighted_sum = array_sum($weight) + ($count-count($weight)) - 1;
$sub = 0;
while($i<$amount){
$weighted_sum -= $sub;
$n = 0; $m = 0;
$num = mt_rand(0, $weighted_sum);
while ($m <= $num){
if($weight[$n]){
$m += $weight[$n];
$sub = $weight[$n];
++$n;
}else{
++$m;
$sub = 1;
++$n;
}
}
if($n != 0){
$n = $n - 1;
}
$key[] = $array[$n];
unset($weight[$n]);
array_merge($weight);
++$i;
}
}
}
return $key;
}
}
}
Because I want to know if there is any other way to make it faster, I join forums and started a Contest for the fastest weighted Random script. Here is mine result of the benchmark:
test1: 0.116928815842 s
test2: 0.145362615585 s
test3: 0.141038894653 s
score: 1.38378739357
and here are the guys who send their work.
Submission from pterodactyl
function weightedRandom($array, $weight = array())
{
$count = count($array);
$sum = array_sum($weight) + $count - count($weight);
$key = mt_rand(0, $sum);
$n = 0; $i = -1;
do {
$n += isset($weight[++$i]) ? $weight[$i] : 1;
} while ($key > $n);
return $array[$i];
}
Result:
test1: 0.140782833099 s
test2: 0.145872592926 s
test3: 0.14186835289 s
score: 1.4365336895
Submission from krt
function weightedRandom($array, $weight=null) {
if ($weight) {
$rand = mt_rand(1, array_sum($weight));
$cummulativeWeight = 0;
foreach ($array as $k=>$v) {
$cummulativeWeight += $weight[$k];
if ($cummulativeWeight >= $rand) { $v; break; }
}
} else {
return $array[array_rand($array)];
}
}
Result:
test1: 0.106962442398 s
test2: 0.139887094498 s
test3: 0.133788108826 s
score: 1.31472468376
krt did not make the script accept a weight array with a condition to check if the $weight[$i] is set or not.
Submission from Daniel15
function rand_weighted($values, $weights) {
// Loop through all values passed. If a weight isn't set for a value, assume 1
foreach ($values as $key => $value) if (!isset($weights[$key])) $weights[$key] = 1;
// Now, get the total of the weights
$total = array_sum($weights);
// Stuff for percentiles
$percentiles = array();
$percentileTemp = 0;
$percentileLast = 0;
// Loop through each weight
foreach ($weights as $key => $weight) {
// Get the percentage chance for this
$percent = ($weight / $total) * 100;
// Add this to the running total, for the percentile
$percentileTemp += $percent;
// The last percentile value used. Used to find the minimum
$percentileLast = $percent;
// Now, set the percentile
$percentiles[$key]['min'] = $percentileTemp - $percentileLast;
$percentiles[$key]['max'] = $percentileTemp;
}
// Get a random value (0 to 100). Get 4 decimal places (not sure if there's a better way to do this)
$random = mt_rand(0, 1000000) / 10000;
//echo $random, '<br />';
//echo '<pre>'; print_r($percentiles); echo '</pre>';
// Loop through all percentile values
foreach ($percentiles as $key => $percentile) {
// If value falls in this percentile...
if ($percentile['min'] < $random && $random < $percentile['max'])
// We've found it. Return this string
return $values[$key];
}
}
Result:
test1: 0.238054037094 s
test2: 0.225346565246 s
test3: 0.227444648742 s
score: 2.28517484665
Daniel15 got a detailed description, but getting the percentile is not necessary.
codestryke
function get_random_item($content, $weight, $value) {
// Add all the weights together.
$total_weight = 0;
foreach ($content as $entry) {
$total_weight += $entry[$weight];
}
// Generate random selection between 0 and totalweight - 1
$pick = mt_rand(0, $total_weight - 1);
// Find the content entry that matches the random number.
$cumulative_weight = 0;
foreach ($content as $entry) {
$cumulative_weight += $entry[$weight];
if ($pick < $cumulative_weight) {
return $entry[$value];
}
}
return false;
}
Result:
test1: 0.17635011673 s(input get_random_item($array, $weight=array())
test2: 0.193833112716 s
test3: 0.168747425079 s
score: 1.82810807228
I have changed part of the code (take out the $value part) in order for the testing to work. codestryke include the function to specify the name of the array's output string.
Recent comments
12 hours 45 min ago
21 hours 58 min ago
2 days 7 hours ago
3 days 11 hours ago
3 days 15 hours ago
3 days 15 hours ago
3 days 15 hours ago
4 days 7 hours ago
4 days 8 hours ago
4 days 10 hours ago