Paul Lam. Founding engineer for data-driven enterprise startups.

I felt good when I simplified one of my algorithms and sped it up 10 times. I felt so good that I even wrote an entire blog post about it patting myself on the back. Then last week I got an email from Kevin of Keming Labs suggesting a few alternatives.

;;Three ways to count the number of occurrences in a collection
;; ("orange" "bottle" "coke" "bottle") => [("bottle" "coke" "orange") (2 1 1)]
(let [c '("orange" "bottle" "coke" "bottle")]

  (let [counts (reduce #(conj %1 (hash-map %2 (inc (get %1 %2 0))))
                       {} c)]
    [(keys counts)
     (vals counts)])


  (let [l (for [[k v] (group-by identity c)]
            [k (count v)])]
    [(map first l)
     (map second l)])


  (let [counts (apply merge-with +
                      (map #(apply hash-map %)
                           (partition 2 (interleave c (repeat 1)))))]
    [(keys counts)
     (vals counts)]))

First of all, his solutions looked much cleaner than mine. Then over the weekend I was able to incorporate his 3 algorithms into my program. I ran a few benchmarks and here are the average of 2 tests using a dataset of 28,760 items.

My algorithm. Elapsed time: 68372.026532 msecs.
Kevin's solution #1. Elapsed time: 156.940976 msecs.
Kevin's solution #2. Elapsed time: 60.165483 msecs.
Kevin's solution #3. Elapsed time: 296.162042 msecs.

Total ownage. That's what I like about sharing my work; once in a blue moon, a reader drops by and generously show me how I can improve a solution 1,000 times! Now the ball is in my hands to understand what he has done and improve myself. Collaborating and learning, that's why I open source.

Update: I've done some more digging and it seems that one of the reasons for the drastic improvement in performance is due to the use of transients in the built-in functions. Lesson of the day, leverage the language's inherent optimization by staying with core data structures and functions as much as possible.

Paul Lam

Clojure Developer. Enterprise Startups. Being Useful.

Paul Lam

Clojure Developer. Enterprise Startups. Being Useful.

Algorithmic ownage

Next articles