{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "In lecture 10 of mcs 320, we time Python functions and make them more efficient by vectorization and by Cython." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 1. Timing Python functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This lecture follows Chapter 3 of Sage for Power Users by William Stein.\n", "We consider the computation of a floating-point approximation\n", "of a sum of square roots. Such sums occur in numerical integration.\n", "One way to approximate $\\pi/4$ is as below." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.1395554669110264" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def python_sum_symbolic(n):\n", " return float( sum(sqrt(1-(k/n)^2) for k in range(1, n+1)) )/n\n", "4*python_sum_symbolic(1000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To benchmark the function, we can use timeit.\n", "The command ``timeit`` in Python is good to measure\n", "the execution time of small code snippets." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5 loops, best of 3: 58.8 ms per loop" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "timeit('python_sum_symbolic(1000)')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For longer execution times, we better use ``cputime()``." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "time for python_sum_symbolic : 1.859\n" ] } ], "source": [ "t1 = cputime() # cpu time since Sage started\n", "python_sum_symbolic(10^4)\n", "ct1 = cputime(t1) # cpu time since t1\n", "print('time for python_sum_symbolic :', ct1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 2. Use numerical functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first reason that the function is so slow is because we use\n", "the symbolic sqrt function. \n", "We will use the sqrt function of the math module in Python.\n", "The explicit conversion to float is no longer needed because\n", "the this sqrt returns a floating-point number." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.139555466911023" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def python_sum(n):\n", " from math import sqrt\n", " return sum( sqrt(1-(k/n)^2) for k in range(1, n+1) )/n\n", "4*python_sum(1000)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "time for python_sum : 0.016000000000000014\n" ] } ], "source": [ "t2 = cputime()\n", "python_sum(10^4)\n", "ct2 = cputime(t2)\n", "print('time for python_sum :', ct2)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "25 loops, best of 3: 21.3 ms per loop" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "timeit('python_sum(10^4)')" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "speedup : 116.1874999999999\n" ] } ], "source": [ "print('speedup :', ct1/ct2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 3. Vectorize" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Instead of the sum we could apply vectorization with numpy." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3.1435554669110277" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def numpy_sum(n): \n", " from numpy import sqrt, sum, arange\n", " x = arange(n)/float(n)\n", " return sum(sqrt(1-x**2))/n\n", "4*numpy_sum(1000)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "625 loops, best of 3: 101 μs per loop" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "timeit('numpy_sum(10^4)')" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "time for python_sum : 2.1249999999999982\n", "time for numpy_sum : 0.031000000000000583\n" ] } ], "source": [ "# To compare against pure Python, we sum a million times.\n", "t3 = cputime()\n", "python_sum(10^6)\n", "ct3 = cputime(t3)\n", "print('time for python_sum :', ct3)\n", "t4 = cputime()\n", "numpy_sum(10^6)\n", "ct4 = cputime(t4)\n", "print('time for numpy_sum :', ct4)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "speedup : 68.54838709677284\n" ] } ], "source": [ "print('speedup :', ct3/ct4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4. Cythonize" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "def cython_sum(n):\n", " from math import sqrt\n", " return sum( sqrt(1-(k/n)**2) for k in range(1, n+1) )/n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The advantage of Cython is that the code is almost identical to Python.\n", "The two links returned when evaluating a cell with Cython code are the generated C code\n", "and an html file with the annotated version of the Cython program.\n", "In this particular example we had to replace the ^ of the python_sum with **." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.139555466911023\n" ] } ], "source": [ "print(4*cython_sum(1000))" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "5 loops, best of 3: 32.8 ms per loop" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "timeit('cython_sum(10^4)')" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "time for the cython_sum : 3.2029999999999976\n" ] } ], "source": [ "t5 = cputime()\n", "cython_sum(10^6)\n", "ct5 = cputime(t5)\n", "print('time for the cython_sum :', ct5)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "%%cython\n", "cdef extern from \"math.h\":\n", " double sqrt(double)\n", "\n", "def cython_sum_typed(long n):\n", " cdef long k\n", " return sum( sqrt(1-(k/float(n))**2) for k in range(1, n+1) )/n" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.139555466911023\n" ] } ], "source": [ "print(4*cython_sum_typed(1000))" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "625 loops, best of 3: 1.11 ms per loop" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "timeit('cython_sum_typed(10^4)')" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "time for cython_sum_typed : 0.10999999999999943\n" ] } ], "source": [ "t6 = cputime()\n", "cython_sum_typed(10^6)\n", "ct6 = cputime(t6)\n", "print('time for cython_sum_typed :', ct6)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "speedup : 29.11818181818195\n" ] } ], "source": [ "print('speedup :', ct5/ct6)" ] } ], "metadata": { "kernelspec": { "display_name": "SageMath 10.3", "language": "sage", "name": "sagemath" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10" } }, "nbformat": 4, "nbformat_minor": 2 }