{"id":59,"date":"2008-04-13T17:38:23","date_gmt":"2008-04-13T16:38:23","guid":{"rendered":"http:\/\/jimblackler.com\/blog\/?p=59"},"modified":"2008-04-13T17:38:23","modified_gmt":"2008-04-13T16:38:23","slug":"selecting-a-number-of-random-elements-from-a-list-without-repeats-an-implementation-in-java","status":"publish","type":"post","link":"https:\/\/jimblackler.com\/?p=59","title":{"rendered":"Selecting a number of random elements from a list, without repeats. An implementation in Java."},"content":{"rendered":"<p>A project I&#8217;ve been working on required a random selection of postings to be shown to users. The entries were to be drawn from a larger list : a cache of references to posts held in memory.<\/p>\n<p>My first effort at the selection code simply copied the original list, shuffled it, then selected the first &#8216;n&#8217; entries (where &#8216;n&#8217; is the number of required postings). A colleague remarked that this was hardly an optimal use of time or memory. So I decided to write a a general-purpose function to select &#8216;n&#8217; unique entries from a list of &#8216;m&#8217; entries &#8211; using as little memory and processing time as possible.<\/p>\n<h2>A naive solution<\/h2>\n<p>The most obvious solution to the problem is to repeatedly draw random entries from 0.. n of the list. To reduce the repeats, keep a record of the indices of the random entries already selected. Then, as each new item is selected, the existing list is checked to see if the entry was already taken.<\/p>\n<p>The problem with this is that as more entries are selected the likelihood of such a collision increases. The execution time of the function cannot therefore be predicted. In some cases where the number to select was very large this could make the function quite slow. You could speed it by using different containers and such, but a better approach seems to be avoid collisions in the first place.<\/p>\n<p>Consider a case of selecting five random entries from a list of ten. The first entry index can be found by picking a random number between zero and nine. When making the second selection however there are now only nine remaining unselected entries. So if we pick from zero to eight, then find a way of converting this number (the index of <em>unused entries<\/em>) into a full source table index, we have avoided the possibility of collision altogether.<\/p>\n<h2>The method<\/h2>\n<p>My solution calls for items chosen using a random number generated between 0 and m-c where &#8216;c&#8217; is the number of already selected items. This represents the relative index of <em>unchosen<\/em> entries, not the actual entries themselves. This number is then converted into the actual entry index by an algorithm.<\/p>\n<p>This algorithm involves iterating backwards through the list of already picked random numbers. The principal is at each iteration to convert the index from 0&#8230; m-c space to 0 .. 1+m-c space. In other words the space of the previously selected element. One single operation can do this, and at the same time avoid any potential collision with the last drawn entry in its target space. The operation says &#8220;if the already selected entry&#8217;s unchosen entry index is the same or less than the current index number then increment this index by one.&#8221;<\/p>\n<p>When all already selected entries have been processed in this way, the resulting index is guaranteed to be in 0.. n space and clear of any collisions with already selected entries.<\/p>\n<p>The method has an order of execution n*m \/ 2, and temporary storage requirements of n indices. This is stable and predictable however so it is a in improvement on collision avoidance methods that would mean repeating random selections.<\/p>\n<h2>Code<\/h2>\n<p>Here is a ready-to-use implementation in Java.<\/p>\n<div id=\"ig-sh-1\" class=\"syntax_hilite\">\n\n\t\n\t<div class=\"code\">\n\t\t<ol class=\"java\" style=\"font-family:monospace\"><li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\"><span style=\"color: #000000;font-weight: bold\">import<\/span> <span style=\"color: #006699\">java.util.List<\/span><span style=\"color: #339933\">;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\"><span style=\"color: #000000;font-weight: bold\">import<\/span> <span style=\"color: #006699\">java.util.Random<\/span><span style=\"color: #339933\">;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\"><span style=\"color: #000000;font-weight: bold\">import<\/span> <span style=\"color: #006699\">java.util.ArrayList<\/span><span style=\"color: #339933\">;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp;<\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\"><span style=\"color: #000000;font-weight: bold\">public<\/span> <span style=\"color: #000000;font-weight: bold\">class<\/span> ListUtil <span style=\"color: #009900\">&#123;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp;<\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; <span style=\"color: #008000;font-style: italic;font-weight: bold\">\/**<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\"><span style=\"color: #008000;font-style: italic;font-weight: bold\">&nbsp; &nbsp; &nbsp;* Create a new list which contains the specified number of elements from the source list, in a<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\"><span style=\"color: #008000;font-style: italic;font-weight: bold\">&nbsp; &nbsp; &nbsp;* random order but without repetitions.<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\"><span style=\"color: #008000;font-style: italic;font-weight: bold\">&nbsp; &nbsp; &nbsp;*<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\"><span style=\"color: #008000;font-style: italic;font-weight: bold\">&nbsp; &nbsp; &nbsp;* @param sourceList &nbsp; &nbsp;the list from which to extract the elements.<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\"><span style=\"color: #008000;font-style: italic;font-weight: bold\">&nbsp; &nbsp; &nbsp;* @param itemsToSelect the number of items to select<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\"><span style=\"color: #008000;font-style: italic;font-weight: bold\">&nbsp; &nbsp; &nbsp;* @param random &nbsp; &nbsp; &nbsp; &nbsp;the random number generator to use<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\"><span style=\"color: #008000;font-style: italic;font-weight: bold\">&nbsp; &nbsp; &nbsp;* @return a new list &nbsp; containg the randomly selected elements<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\"><span style=\"color: #008000;font-style: italic;font-weight: bold\">&nbsp; &nbsp; &nbsp;*\/<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; <span style=\"color: #000000;font-weight: bold\">public<\/span> <span style=\"color: #000000;font-weight: bold\">static<\/span> <span style=\"color: #339933\">&lt;<\/span>T<span style=\"color: #339933\">&gt;<\/span> List<span style=\"color: #339933\">&lt;<\/span>T<span style=\"color: #339933\">&gt;<\/span> chooseRandomly<span style=\"color: #009900\">&#040;<\/span>List<span style=\"color: #339933\">&lt;<\/span>T<span style=\"color: #339933\">&gt;<\/span> sourceList, <span style=\"color: #000066;font-weight: bold\">int<\/span> itemsToSelect, <span style=\"color: #003399\">Random<\/span> random<span style=\"color: #009900\">&#041;<\/span> <span style=\"color: #009900\">&#123;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #000066;font-weight: bold\">int<\/span> sourceSize <span style=\"color: #339933\">=<\/span> sourceList.<span style=\"color: #006633\">size<\/span><span style=\"color: #009900\">&#040;<\/span><span style=\"color: #009900\">&#041;<\/span><span style=\"color: #339933\">;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp;<\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #666666;font-style: italic\">\/\/ Generate an array representing the element to select from 0... number of available<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #666666;font-style: italic\">\/\/ elements after previous elements have been selected.<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #000066;font-weight: bold\">int<\/span><span style=\"color: #009900\">&#091;<\/span><span style=\"color: #009900\">&#093;<\/span> selections <span style=\"color: #339933\">=<\/span> <span style=\"color: #000000;font-weight: bold\">new<\/span> <span style=\"color: #000066;font-weight: bold\">int<\/span><span style=\"color: #009900\">&#091;<\/span>itemsToSelect<span style=\"color: #009900\">&#093;<\/span><span style=\"color: #339933\">;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp;<\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #666666;font-style: italic\">\/\/ Simultaneously use the select indices table to generate the new result array<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; ArrayList<span style=\"color: #339933\">&lt;<\/span>T<span style=\"color: #339933\">&gt;<\/span> resultArray <span style=\"color: #339933\">=<\/span> <span style=\"color: #000000;font-weight: bold\">new<\/span> ArrayList<span style=\"color: #339933\">&lt;<\/span>T<span style=\"color: #339933\">&gt;<\/span><span style=\"color: #009900\">&#040;<\/span><span style=\"color: #009900\">&#041;<\/span><span style=\"color: #339933\">;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp;<\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #000000;font-weight: bold\">for<\/span> <span style=\"color: #009900\">&#040;<\/span><span style=\"color: #000066;font-weight: bold\">int<\/span> count <span style=\"color: #339933\">=<\/span> <span style=\"color: #cc66cc\">0<\/span><span style=\"color: #339933\">;<\/span> count <span style=\"color: #339933\">&lt;<\/span> itemsToSelect<span style=\"color: #339933\">;<\/span> count<span style=\"color: #339933\">++<\/span><span style=\"color: #009900\">&#041;<\/span> <span style=\"color: #009900\">&#123;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp;<\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #666666;font-style: italic\">\/\/ An element from the elements *not yet chosen* is selected<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #000066;font-weight: bold\">int<\/span> selection <span style=\"color: #339933\">=<\/span> random.<span style=\"color: #006633\">nextInt<\/span><span style=\"color: #009900\">&#040;<\/span>sourceSize <span style=\"color: #339933\">-<\/span> count<span style=\"color: #009900\">&#041;<\/span><span style=\"color: #339933\">;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; selections<span style=\"color: #339933\">&amp;<\/span>#<span style=\"color: #cc66cc\">91<\/span><span style=\"color: #339933\">;<\/span>count<span style=\"color: #339933\">&amp;<\/span>#<span style=\"color: #cc66cc\">93<\/span><span style=\"color: #339933\">;<\/span> <span style=\"color: #339933\">=<\/span> selection<span style=\"color: #339933\">;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #666666;font-style: italic\">\/\/ Store original selection in the original range 0.. number of available elements<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp;<\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #666666;font-style: italic\">\/\/ This selection is converted into actual array space by iterating through the elements<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #666666;font-style: italic\">\/\/ already chosen.<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #000000;font-weight: bold\">for<\/span> <span style=\"color: #009900\">&#040;<\/span><span style=\"color: #000066;font-weight: bold\">int<\/span> scanIdx <span style=\"color: #339933\">=<\/span> count <span style=\"color: #339933\">-<\/span> <span style=\"color: #cc66cc\">1<\/span><span style=\"color: #339933\">;<\/span> scanIdx <span style=\"color: #339933\">&gt;=<\/span> <span style=\"color: #cc66cc\">0<\/span><span style=\"color: #339933\">;<\/span> scanIdx<span style=\"color: #339933\">--<\/span><span style=\"color: #009900\">&#041;<\/span> <span style=\"color: #009900\">&#123;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #000000;font-weight: bold\">if<\/span> <span style=\"color: #009900\">&#040;<\/span>selection <span style=\"color: #339933\">&gt;=<\/span> selections<span style=\"color: #009900\">&#091;<\/span>scanIdx<span style=\"color: #009900\">&#093;<\/span><span style=\"color: #009900\">&#041;<\/span> <span style=\"color: #009900\">&#123;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; selection<span style=\"color: #339933\">++;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #009900\">&#125;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #009900\">&#125;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #666666;font-style: italic\">\/\/ When the first selected element record is reached all selections are in the range<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #666666;font-style: italic\">\/\/ 0.. number of available elements, and free of collisions with previous entries.<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp;<\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #666666;font-style: italic\">\/\/ Write the actual array entry to the results<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; resultArray.<span style=\"color: #006633\">add<\/span><span style=\"color: #009900\">&#040;<\/span>sourceList.<span style=\"color: #006633\">get<\/span><span style=\"color: #009900\">&#040;<\/span>selection<span style=\"color: #009900\">&#041;<\/span><span style=\"color: #009900\">&#041;<\/span><span style=\"color: #339933\">;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #009900\">&#125;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; &nbsp; &nbsp; <span style=\"color: #000000;font-weight: bold\">return<\/span> resultArray<span style=\"color: #339933\">;<\/span><\/div><\/li>\n<li><div style=\"font: normal normal 1em\/1.2em monospace;margin:0;padding:0;background:none;vertical-align:top\">&nbsp; &nbsp; <span style=\"color: #009900\">&#125;<\/span><\/div><\/li>\n<\/ol>\t<\/div>\n\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"<p>A project I&#8217;ve been working on required a random selection of postings to be shown to users. The entries were to be drawn from a larger list : a cache of references to posts held in memory. My first effort at the selection code simply copied the original list, shuffled it, then selected the first [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-59","post","type-post","status-publish","format-standard","hentry","category-general-programming"],"_links":{"self":[{"href":"https:\/\/jimblackler.com\/index.php?rest_route=\/wp\/v2\/posts\/59","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/jimblackler.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/jimblackler.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/jimblackler.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/jimblackler.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=59"}],"version-history":[{"count":0,"href":"https:\/\/jimblackler.com\/index.php?rest_route=\/wp\/v2\/posts\/59\/revisions"}],"wp:attachment":[{"href":"https:\/\/jimblackler.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=59"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/jimblackler.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=59"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/jimblackler.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=59"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}