I have been working since on vectorizing "user code". The primary goal of this project is to speed up trace loops that iterate over NumPy arrays. But As I have said in the last post it might (or might not) make sense to optimize traces found in the user program.
Vec, the little brother of NumPy array
Using the following snippet one can build a class for vectors in Python and let the optimization speed up the computation.
def __init__(self, content, type='d'):
self.size = len(content)
self.type = type
self.array = array.array(type, content)
def __add__(self, other):
def add(self, other, out=None):
# Ensure that other is the right type and size,
# out must be allocated if it is None
i = 0
# execute pypy with --jit vectorize_user=1 to
# enable the optimization
while i < self.size:
out.array[i] = self.array[i] + other.array[i]
i += 1
After tracing the loop in the add function a slightly better vector loop is generated. Let's run the program:
# jit warmup
a,b,c = vec(...), vec(...), vec(...)
# start time.time()
for i in range(500):
c = a * b
a = c - b
# stop time.time()
Has the following results can be optained (after the JIT has warmed up):
PyPy (vecopt): ~0.005
PyPy (no vecopt): ~0.008
The PyPy has higer variance in the execution. The garbage collector might be the reason for that. The program has been run 5 times and the mean value is shown above.
What about Python lists?
Honestly, I'm unsure if there is a real benefit. Since PyPy stores integers/floats arrays (that are fully homogenous) without the overhead of embedding it in a PyObject, SIMD operations could be used for normal Python lists.
The problem with this optimization is that the program must run for a very long time and spend a significant fraction of time in the trace loop that has been optimized. The short evaluation above shows that there might be potential. I will further investigate, because this is a great way to find bugs in the implementation as well.
Traces emitted by the user program are much more complex than the one in the NumPy library. The last week I have been working I found many edge cases and even reminded my that I have left some TODOs in the source code.