{"id":36362,"date":"2008-04-02T18:26:00","date_gmt":"2008-04-02T18:26:00","guid":{"rendered":"\/2008\/04\/02\/ruby-include-versus-intersection"},"modified":"2009-08-30T04:45:35","modified_gmt":"2009-08-30T04:45:35","slug":"ruby-include-versus-intersection","status":"publish","type":"post","link":"http:\/\/pullmonkey.com\/2008\/04\/02\/ruby-include-versus-intersection\/","title":{"rendered":"Ruby – include? versus intersection"},"content":{"rendered":"

After my last article about intersection and arrays<\/a>, I got to thinking how this could apply elsewhere.
\nI decided to see if I could write a faster include?() method for Array using the intersection operator. So here is what I did:<\/p>\n

1) Tested via irb (of course)<\/h3>\n\n\n
\n
1\n<\/tt>2\n<\/tt>3\n<\/tt>4\n<\/tt>5<\/strong>\n<\/tt>6\n<\/tt>7\n<\/tt>8\n<\/tt>9\n<\/tt>10<\/strong>\n<\/tt>11\n<\/tt>12\n<\/tt>13\n<\/tt>14\n<\/tt>15<\/strong>\n<\/tt>16\n<\/tt>17\n<\/tt>18\n<\/tt>19\n<\/tt>20<\/strong>\n<\/tt><\/pre>\n<\/td>\n
\n
\n<\/tt>[pullmonkey]$<\/span> irb\n<\/tt># build the array for conceptual testing<\/span>\n<\/tt>irb(main):003<\/span>:0<\/span>> a = [1<\/span>,2<\/span>,3<\/span>]\n<\/tt>=> [1<\/span>, 2<\/span>, 3<\/span>]\n<\/tt># see how include?() does it<\/span>\n<\/tt>irb(main):004<\/span>:0<\/span>> a.include?(2<\/span>)\n<\/tt>=> true<\/span>\n<\/tt>irb(main):005<\/span>:0<\/span>> a.include?(4<\/span>)\n<\/tt>=> false<\/span>\n<\/tt># see what could be done with intersect<\/span>\n<\/tt>irb(main):006<\/span>:0<\/span>> a & [2<\/span>]\n<\/tt>=> [2<\/span>]\n<\/tt>irb(main):007<\/span>:0<\/span>> (a & [2<\/span>]).empty?\n<\/tt>=> false<\/span>\n<\/tt># this should return true, so negate it<\/span>\n<\/tt>irb(main):00<\/span>8<\/span>:0<\/span>> !(a & [2<\/span>]).empty?\n<\/tt>=> true<\/span>\n<\/tt>irb(main):00<\/span>9<\/span>:0<\/span>> !(a & [4<\/span>]).empty?\n<\/tt>=> false<\/span>\n<\/tt><\/pre>\n<\/td>\n<\/tr>\n<\/table>\n

2) Extended the Array class with the faster_include?() method:<\/h3>\n\n\n
\n
1\n<\/tt>2\n<\/tt>3\n<\/tt>4\n<\/tt>5<\/strong>\n<\/tt>6\n<\/tt><\/pre>\n<\/td>\n
\n
\n<\/tt>class<\/span> Array<\/span>\n<\/tt>  def<\/span> faster_include?<\/span>(n)\n<\/tt>    !(self<\/span> & [n]).empty?\n<\/tt>  end<\/span>\n<\/tt>end<\/span>\n<\/tt><\/pre>\n<\/td>\n<\/tr>\n<\/table>\n

3) Wrote a performance test comparing elapsed time for true and false results:<\/h3>\n\n\n
\n
1\n<\/tt>2\n<\/tt>3\n<\/tt>4\n<\/tt>5<\/strong>\n<\/tt>6\n<\/tt>7\n<\/tt>8\n<\/tt>9\n<\/tt>10<\/strong>\n<\/tt>11\n<\/tt>12\n<\/tt>13\n<\/tt>14\n<\/tt>15<\/strong>\n<\/tt>16\n<\/tt>17\n<\/tt>18\n<\/tt>19\n<\/tt><\/pre>\n<\/td>\n
\n
\n<\/tt># size of the array<\/span>\n<\/tt>x = 4000000<\/span>\n<\/tt># build the array<\/span>\n<\/tt>a = (1<\/span>..x).to_a\n<\/tt>p "<\/span>Finding <\/span>#{<\/span>x\/2<\/span>}<\/span><\/span> ...<\/span>"<\/span><\/span>\n<\/tt>t1 = Time<\/span>.now\n<\/tt>p "<\/span>include?() - returns true: <\/span>#{<\/span>a.include?(x\/2<\/span>)}<\/span><\/span>"<\/span><\/span>\n<\/tt>t2 = Time<\/span>.now\n<\/tt>p "<\/span>faster_include?() - returns true: <\/span>#{<\/span>a.faster_include?(x\/2<\/span>)}<\/span><\/span>"<\/span><\/span>\n<\/tt>t3 = Time<\/span>.now\n<\/tt>p "<\/span>include?() - returns false: <\/span>#{<\/span>a.include?(-50<\/span>)}<\/span><\/span>"<\/span><\/span>\n<\/tt>t4 = Time<\/span>.now\n<\/tt>p "<\/span>faster_include?() - returns false: <\/span>#{<\/span>a.faster_include?(-50<\/span>)}<\/span><\/span>"<\/span><\/span>\n<\/tt>t5 = Time<\/span>.now\n<\/tt>p "<\/span>Elapsed time for include?() returning true: <\/span>#{<\/span>t2 - t1}<\/span><\/span>.<\/span>"<\/span><\/span>\n<\/tt>p "<\/span>Elapsed time for faster_include?() returning true: <\/span>#{<\/span>t3 - t2}<\/span><\/span>.<\/span>"<\/span><\/span>\n<\/tt>p "<\/span>Elapsed time for include?() returning false: <\/span>#{<\/span>t4 - t3}<\/span><\/span>.<\/span>"<\/span><\/span>\n<\/tt>p "<\/span>Elapsed time for faster_include?() returning false: <\/span>#{<\/span>t5 - t4}<\/span><\/span>.<\/span>"<\/span><\/span>\n<\/tt><\/pre>\n<\/td>\n<\/tr>\n<\/table>\n

4) Just to see if there was a difference I got results and here they are:<\/h3>\n

Attempt 1:<\/p>\n\n\n
\n
1\n<\/tt>2\n<\/tt>3\n<\/tt>4\n<\/tt>5<\/strong>\n<\/tt>6\n<\/tt>7\n<\/tt>8\n<\/tt>9\n<\/tt>10<\/strong>\n<\/tt><\/pre>\n<\/td>\n
\n
\n<\/tt>"<\/span>Finding 2000000 ...<\/span>"<\/span><\/span>\n<\/tt>"<\/span>include?() - returns true: true<\/span>"<\/span><\/span>\n<\/tt>"<\/span>faster_include?() - returns true: true<\/span>"<\/span><\/span>\n<\/tt>"<\/span>include?() - returns false: false<\/span>"<\/span><\/span>\n<\/tt>"<\/span>faster_include?() - returns false: false<\/span>"<\/span><\/span>\n<\/tt>"<\/span>Elapsed time for include?() returning true: 0.308243.<\/span>"<\/span><\/span>\n<\/tt>"<\/span>Elapsed time for faster_include?() returning true: 0.271465.<\/span>"<\/span><\/span>\n<\/tt>"<\/span>Elapsed time for include?() returning false: 0.629375.<\/span>"<\/span><\/span>\n<\/tt>"<\/span>Elapsed time for faster_include?() returning false: 0.242673.<\/span>"<\/span><\/span>\n<\/tt><\/pre>\n<\/td>\n<\/tr>\n<\/table>\n

Attempt 2:<\/p>\n\n\n
\n
1\n<\/tt>2\n<\/tt>3\n<\/tt>4\n<\/tt>5<\/strong>\n<\/tt>6\n<\/tt>7\n<\/tt>8\n<\/tt>9\n<\/tt>10<\/strong>\n<\/tt><\/pre>\n<\/td>\n
\n
\n<\/tt>"<\/span>Finding 2000000 ...<\/span>"<\/span><\/span>\n<\/tt>"<\/span>include?() - returns true: true<\/span>"<\/span><\/span>\n<\/tt>"<\/span>faster_include?() - returns true: true<\/span>"<\/span><\/span>\n<\/tt>"<\/span>include?() - returns false: false<\/span>"<\/span><\/span>\n<\/tt>"<\/span>faster_include?() - returns false: false<\/span>"<\/span><\/span>\n<\/tt>"<\/span>Elapsed time for include?() returning true: 0.323652.<\/span>"<\/span><\/span>\n<\/tt>"<\/span>Elapsed time for faster_include?() returning true: 0.272212.<\/span>"<\/span><\/span>\n<\/tt>"<\/span>Elapsed time for include?() returning false: 0.594751.<\/span>"<\/span><\/span>\n<\/tt>"<\/span>Elapsed time for faster_include?() returning false: 0.277882.<\/span>"<\/span><\/span>\n<\/tt><\/pre>\n<\/td>\n<\/tr>\n<\/table>\n

Note that faster_include?() is consistent in elapsed time for a result of true and a result of false. However, include?() takes much longer in both attempts to return false than it does to return true, well actually about twice as long. include?() goes through each element, and I am using the midway point of the array in all tests, so it makes sense that include?() would return true in half the time it would take to return false. Very interesting, so now that I see performance improvement with faster_include?(), I need to do a wider spread of tests and for more attempts:
\n<\/p>\n

5) Better testing:<\/h3>\n\n\n
\n
1\n<\/tt>2\n<\/tt>3\n<\/tt>4\n<\/tt>5<\/strong>\n<\/tt>6\n<\/tt>7\n<\/tt>8\n<\/tt>9\n<\/tt>10<\/strong>\n<\/tt>11\n<\/tt>12\n<\/tt>13\n<\/tt>14\n<\/tt>15<\/strong>\n<\/tt>16\n<\/tt>17\n<\/tt>18\n<\/tt>19\n<\/tt>20<\/strong>\n<\/tt>21\n<\/tt>22\n<\/tt>23\n<\/tt>24\n<\/tt>25<\/strong>\n<\/tt>26\n<\/tt>27\n<\/tt><\/pre>\n<\/td>\n
\n
\n<\/tt>results = {}\n<\/tt>n = 5<\/span>\n<\/tt># size of the array<\/span>\n<\/tt>[1<\/span>,50<\/span>,500<\/span>,5000<\/span>,5000000<\/span>].each do<\/span> |x|\n<\/tt>  a = (1<\/span>..x).to_a\n<\/tt>  n.times do<\/span> |t|\n<\/tt>    # build the array<\/span>\n<\/tt>    t1 = Time<\/span>.now\n<\/tt>    a.include?(x\/2<\/span>)\n<\/tt>    t2 = Time<\/span>.now\n<\/tt>    a.faster_include?(x\/2<\/span>)\n<\/tt>    t3 = Time<\/span>.now\n<\/tt>    a.include?(-50<\/span>)\n<\/tt>    t4 = Time<\/span>.now\n<\/tt>    a.faster_include?(-50<\/span>)\n<\/tt>    t5 = Time<\/span>.now\n<\/tt>    results[x]          ||= {}\n<\/tt>    results[x]["<\/span>true<\/span>"<\/span><\/span>]  ||= []\n<\/tt>    results[x]["<\/span>false<\/span>"<\/span><\/span>] ||= []\n<\/tt>    results[x]["<\/span>true<\/span>"<\/span><\/span>]  << (t2 - t1) - (t3 - t2)\n<\/tt>    results[x]["<\/span>false<\/span>"<\/span><\/span>] << (t4 - t3) - (t5 - t4)\n<\/tt>  end<\/span>\n<\/tt>  detailed_results = results\n<\/tt>  results[x] = {"<\/span>true<\/span>"<\/span><\/span> => detailed_results[x]["<\/span>true<\/span>"<\/span><\/span>].sum \/ n, "<\/span>false<\/span>"<\/span><\/span> => detailed_results[x]["<\/span>false<\/span>"<\/span><\/span>].sum \/ n}\n<\/tt>end<\/span>\n<\/tt>pp results\n<\/tt><\/pre>\n<\/td>\n<\/tr>\n<\/table>\n

My new results:<\/h3>\n

The results are based on 5 attempts for each array for both true and false for a total of 50 attempts.<\/p>\n\n\n
\n
1\n<\/tt>2\n<\/tt>3\n<\/tt>4\n<\/tt>5<\/strong>\n<\/tt>6\n<\/tt><\/pre>\n<\/td>\n
\n
\n<\/tt>{5000000<\/span>=>{"<\/span>true<\/span>"<\/span><\/span>=>0.0921126<\/span>, "<\/span>false<\/span>"<\/span><\/span>=>0.4374644<\/span>},\n<\/tt> 5000<\/span>=>{"<\/span>true<\/span>"<\/span><\/span>=>5.84e-05<\/span>, "<\/span>false<\/span>"<\/span><\/span>=>0.0004404<\/span>},\n<\/tt> 500<\/span>=>{"<\/span>true<\/span>"<\/span><\/span>=>1.4e-06<\/span>, "<\/span>false<\/span>"<\/span><\/span>=>4.16e-05<\/span>},\n<\/tt> 50<\/span>=>{"<\/span>true<\/span>"<\/span><\/span>=>-3.4e-06<\/span>, "<\/span>false<\/span>"<\/span><\/span>=>1.0e-06<\/span>},\n<\/tt> 1<\/span>=>{"<\/span>true<\/span>"<\/span><\/span>=>-4.8e-06<\/span>, "<\/span>false<\/span>"<\/span><\/span>=>-3.8e-06<\/span>}}\n<\/tt><\/pre>\n<\/td>\n<\/tr>\n<\/table>\n

Note that arrays with 50 items or less seem to perform better using include?(), but barely. On the other hand with an array of 5M items, you shave off half a second using faster_include?().<\/p>\n

So there you have it, faster_include?() can keep its name \ud83d\ude42<\/p>\n","protected":false},"excerpt":{"rendered":"

After my last article about intersection and arrays, I got to thinking how this could apply elsewhere. I decided to see if I could write a faster include?() method for Array using the intersection operator. So here is what I did: 1) Tested via irb (of course) 1 2 3 4 5 6 7 8 […]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[6,3,9],"tags":[211],"_links":{"self":[{"href":"http:\/\/pullmonkey.com\/wp-json\/wp\/v2\/posts\/36362"}],"collection":[{"href":"http:\/\/pullmonkey.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/pullmonkey.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/pullmonkey.com\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/pullmonkey.com\/wp-json\/wp\/v2\/comments?post=36362"}],"version-history":[{"count":1,"href":"http:\/\/pullmonkey.com\/wp-json\/wp\/v2\/posts\/36362\/revisions"}],"predecessor-version":[{"id":57362,"href":"http:\/\/pullmonkey.com\/wp-json\/wp\/v2\/posts\/36362\/revisions\/57362"}],"wp:attachment":[{"href":"http:\/\/pullmonkey.com\/wp-json\/wp\/v2\/media?parent=36362"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/pullmonkey.com\/wp-json\/wp\/v2\/categories?post=36362"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/pullmonkey.com\/wp-json\/wp\/v2\/tags?post=36362"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}