I felt good when I simplified one of my algorithms and sped it up 10 times. I felt so good that I even wrote an entire blog post about it patting myself on the back. Then last week I got an email from Kevin of Keming Labs suggesting a few alternatives.
;;Three ways to count the number of occurrences in a collection
;; ("orange" "bottle" "coke" "bottle") => [("bottle" "coke" "orange") (2 1 1)]
(let [c '("orange" "bottle" "coke" "bottle")]
(let [counts (reduce #(conj %1 (hash-map %2 (inc (get %1 %2 0))))
{} c)]
[(keys counts)
(vals counts)])
(let [l (for [[k v] (group-by identity c)]
[k (count v)])]
[(map first l)
(map second l)])
(let [counts (apply merge-with +
(map #(apply hash-map %)
(partition 2 (interleave c (repeat 1)))))]
[(keys counts)
(vals counts)]))
First of all, his solutions looked much cleaner than mine. Then over the weekend I was able to incorporate his 3 algorithms into my program. I ran a few benchmarks and here are the average of 2 tests using a dataset of 28,760 items.
- My algorithm. Elapsed time: 68372.026532 msecs.
- Kevin's solution #1. Elapsed time: 156.940976 msecs.
- Kevin's solution #2. Elapsed time: 60.165483 msecs.
- Kevin's solution #3. Elapsed time: 296.162042 msecs.
Total ownage. That's what I like about sharing my work; once in a blue moon, a reader drops by and generously show me how I can improve a solution 1,000 times! Now the ball is in my hands to understand what he has done and improve myself. Collaborating and learning, that's why I open source.
Update: I've done some more digging and it seems that one of the reasons for the drastic improvement in performance is due to the use of transients in the built-in functions. Lesson of the day, leverage the language's inherent optimization by staying with core data structures and functions as much as possible.