Less locks more message queues!

I was reading a blog post RailGuard  about a creative interface for avoiding having to code explicit exception handling around locks. I left a tip.

A couple decades ago I would have agreed that was the solution. I changed my mind after realizing that you very rarely need locks and the typical example scenario is a paper tiger. If you assume you have simultaneous writers (readers don’t need locks just consistency, i.e. what is the state of the world now not what will it be after the current write).
A simple generic non-locking replacement implementation is to make the access to a shared resource (memory) a thread with a thread safe queue (hopefully non-blocking for reads and writes). Start the thread, add the access to the shared resource(memory) to the queue (a reference to the function or method or an object with a standard method or even DSL statement/function). Threaded shared object processes all requests priority FIFO or even writes first or reads first depending on the use case in a loop that sleeps and checks or use a function wrapper that starts the thread on add to queue if not already started and maybe even have the thread stop on empty queue.
Just my horse sense.

Posted in Uncategorized | Leave a comment

Numba

http://numba.continuim.io
Numba is dead simple to get the full close to c/c++ performance improvements from dynamic compilation and works great after you have the prerequisites installed.

Numba required python packages:
argparse
numpy
llvm

Now you can import numba and use numba.autojit as a decorator or function wrapper and get the benefits of dynamic compilation without any head scratching.

I tried an example using an old version of numba and a numpy array parameter that was at the center of the calculation and it produced an over 30 times speed up. The fasted speed up for numba is supposed to be when it is compiling code that uses numpy arrays or when it can figure out how to substitute numpy arrays in. Numba should still compile the rest of the code as well giving it c/c++ like performance too.

If numba can improve the performance of my existing code by a factor of 2 to 10 times where there is no chance for a numpy speed up it will still be a significant performance boost. Jobs that take 10 minutes could go down to 5 to 1 minutes and 1 minute jobs down to 30 to 6 seconds. Maybe there is some hope for my using Python as a way to solve some of the more computationally intensive http://projecteuler.net problems without throwing cloud computing at the problems.

Posted in Uncategorized | Leave a comment

A Grain Of Salt

Looks like it time to start using Numba!

Bulldozer00's Blog

Somehow, I stumbled upon an academic paper that compares programming language performance in the context of computing the results for a well-known, computationally dense, macro-economics problem: “the stochastic neoclassical growth model“. Since the results hoist C++ on top of the other languages, I felt the need to publish the researchers’ findings in this blog post :). As with all benchmarks, take it with a grain of salt because… context is everything.

Qualitative Findings

The Findings

Quantitative Findings

The Numbers

The irony of this post is that I’m a big fan of Nassim Taleb, whose lofty goal is to destroy the economics profession as we know it. He thinks all the fancy, schmancy mathematical models and metrics used by economists (including famous Nobel laureates) to predict the future are predicated on voodoo science. They cause more harm than good by grossly misrepresenting and underestimating the role of risk in their assumptions and…

View original post 2 more words

Posted in Uncategorized | Leave a comment

How to determine the performance hit of regular expressions

This coding example is really about how to figure out what is faster. In some cases regular expressions may be the optimal solution but as a rule of thumb they are suboptimal for the simpler cases. e.g. starts with, ends with, count of occurrences, fixed string search, etc.

I wrote this in response to a post on stackoverflow.com asking for help in crafting a regex to eliminate words from an input file by only printing words that are each made up of only unique letters. i.e. with one or less of each letter. This code is probably more interesting to the Pythonista than to developers of other languages as there are many terser solutions in other languages including a simple egrep.

The input file contains one word per a line.

Contents of example wordlist.txt
abducts
abe
abeam
abel
abele

Example output:
abducts
abe
abel

Code that produces that output written in python using a regex:

python -c 'import re, sys; print "".join(s for s in open(sys.argv[1]) if not re.match(r".*(\w).*\1", s))' wordlist.txt

In python without a regex:

python -c 'import sys; print "".join(s for s in open(sys.argv[1]) if len(s) == len(frozenset(s)))' wordlist.txt

I performed some timing tests with a hardcoded file name and output redirected to /dev/null to avoid including screen output in the timing:

Timings without the regex:

python -m timeit 'import sys' 'print >> sys.stderr, "".join(s for s in open("wordlist.txt") if len(s) == len(frozenset(s)))' 2>/dev/null

10000 loops, best of 3: 91.3 usec per loop

Timings with the regex:

python -m timeit 'import re, sys' 'print >> sys.stderr, "".join(s for s in open("wordlist.txt") if re.match(r".*(\w).*\1", s))' 2>/dev/null

10000 loops, best of 3: 105 usec per loop

The regex is a bit slower than a simple frozenset creation and len comparison in python. The timeit module can be used in this way to test code in modules by importing the module or script and calling the functions you wish to test.

And here is the obligatory long hand versions for those that have a need for a more flexible timing test with the added benefit of not having the compilation time considered part of the timing.

filter_words.py:

#!/usr/bin/env python
"""
Filter out words with more than one of a particular letter.
"""

import re
import sys

def filter_with_regex(input_generator):
  for w in input_generator:
    if not re.match(r".*?(\w).*?\1", w):
      yield w

def filter_without_regex(input_generator):
  for w in input_generator:
    if len(w) == len(frozenset(w)):
      yield w

def filter_words(filter_generator, input_generator):
  return filter_generator(input_generator)

def main(filter_generator=filter_without_regex, file_name=sys.argv[1]):
  ig = open(file_name)
  fg = filter_generator
  return filter_words(fg, ig)

if __name__ == '__main__':
  from timeit import Timer
  t_without = Timer("main(filter_without_regex)", "from __main__ import main, filter_without_regex")
  t_with = Timer("main(filter_with_regex)", "from __main__ import main, filter_with_regex")
  t = (1000000*t_without.timeit(100000)/100000, 1000000*t_with.timeit(100000)/100000)
  print "{0:0.2f} usec/pass without regex\n{1:0.2f} usec/pass with regex".format(*t)

Run the timing tests as ‘python filter_words.py wordlist.txt’ or at the ipython prompt as ‘%run filter_words.py wordlist.txt’.

Posted in programming, python | Leave a comment

Smarter last name first function

def last_first(name):
  """
  Return last_name, first_name.
  Checks for already reversed names.
  Looks for some suffixes after the last name.
  Only works with one part last names.
  Leaves names like 'Charles III' as is.
  """
  SUFFIXES = ('jr', 'jnr', 'sr', 'snr', 'ii', 'iii', 'dr', 'ddr', 'ddrs', 'pc', 'esq')
  retval = name
  if name and ',' not in name:
    parts = name.split()
    if len(parts) > 1:
      if len(parts) > 2:
        if parts[-1].lower() in SUFFIXES:
          retval = ' '.join(parts[-2:]) + ', ' + ' '.join(parts[:-1])
        else:
          retval = parts[-1] + ', ' + ' '.join(parts[:-1])
      elif parts[-1].lower() not in SUFFIXES:
        retval = parts[-1] + ', ' + ' '.join(parts[:-1])
  return retval

Posted in python | Tagged , , , | 1 Comment

Naive last name first function


def last_first(name):
  """
  Return last_name, first_name.
  Checks for already reversed names.
  Assumes no suffix and one part last names.
  """
  retval = name
  if name and ',' not in name:
    parts = name.split()
    retval = parts[-1] + ', ' + ' '.join(parts[:-1])
  return retval

Posted in python | Tagged , , , | Leave a comment

The digits of a number in reverse as a generator

def digits(n):
  q, m = divmod(n, 10)
  while q or m:
    yield m
    q, m = divmod(q, 10)

Posted in python | Tagged , , , | 1 Comment