Searching, Sorting, and Timing

Agenda

  1. Timing
  2. Prelude: Timing list indexing
  3. Linear search
  4. Binary search
  5. Insertion sort

1. Timing

In [2]:
import time
time.time()
Out[2]:
1568209970.4445367
In [4]:
from time import time, sleep
start = time()
sleep(1)
end = time()
print(end - start)
1.0000569820404053

2. Prelude: Timing list indexing

In [5]:
lst = [0] * 10**5
In [6]:
import timeit
timeit.timeit(stmt='lst[0]', globals=globals())
Out[6]:
0.28391270022756465
In [8]:
timeit.timeit(stmt='lst[10**5-1]', globals=globals())
Out[8]:
0.28254640894888006
In [9]:
times = [timeit.timeit(stmt='lst[{}]'.format(i), 
                       globals=globals(), 
                       number=100) 
         for i in range(10**5)]
In [10]:
times[:10]
Out[10]:
[7.225578877978478e-05,
 4.35176909547863e-05,
 4.433877947462861e-05,
 3.9822792700761056e-05,
 4.474932373454976e-05,
 4.433877947462861e-05,
 4.3928235214707456e-05,
 4.269660246336571e-05,
 4.269660246336571e-05,
 3.079081909618253e-05]
In [11]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(times, 'ro')
plt.show()

Observation: accessing an element in a list by index takes a constant amount of time, regardless of position.

How? A Python list uses an array as its underlying data storage mechanism. To access an element in an array, the interpreter:

  1. Computes an offset into the array by multiplying the element's index by the size of each array entry (which are uniformly sized, since they are merely references to the actual elements)
  2. Adds the offset to the base address of the array
In [7]:
import timeit
[timeit.timeit(stmt='sorted(lst)',                    
        setup='import random ;  lst =[random.random() for i in range({})]'.format(j),
          number=100) 
 for j in (100, 200, 400, 800, 1600)]
Out[7]:
[0.0009114011355677576,
 0.002353631851406135,
 0.0059282127916446825,
 0.014969968922031285,
 0.05584713093386995]

Task: to locate an element with a given value in a list (array).

In [15]:
def index(lst, x):
    for i in range(len(lst)-1,-1,-1):  # range(len(lst))
        if lst[i]==x:
            return i
    return None    
In [21]:
lst = list(range(100))
index(lst, 10)
Out[21]:
10
In [22]:
lst[101]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-22-d21792a66db8> in <module>()
----> 1 lst[101]

IndexError: list index out of range
In [17]:
index(lst, 99)
Out[17]:
99
In [11]:
index(lst, -1)
In [18]:
import timeit
lst = list(range(1000))
times = [timeit.timeit(stmt='index(lst, {})'.format(x), 
                       globals=globals(), 
                       number=100)
         for x in range(1000)]
In [19]:
import matplotlib.pyplot as plt
plt.plot(times, 'ro')
plt.show()

Task: to locate an element with a given value in a list (array) whose contents are sorted in ascending order.

In [32]:
def index(lst, x):
    # assume that lst is sorted!!!
    low=0
    high=len(lst)-1
    while low<=high:
       # print('here')
        mid = low+(high-low)//2   # (low+high)//
        if lst[mid]==x:
            return mid
        elif lst[mid]<x:
            low=mid+1
        else:
            high=mid-1
    return None
            
In [37]:
def indexRec(lst, x,low=0, high=len(lst)-1):
    # assume that lst is sorted!!!
    if low>high:
        return None
    else:
        print('here')
        mid = low+(high-low)//2   # (low+high)//
        if lst[mid]==x:
            return mid
        elif lst[mid]<x:
            low=mid+1
        else:
            high=mid-1
        return indexRec(lst, x,low, high)
In [38]:
lst = list(range(1000))
indexRec(lst, 10)
here
here
here
here
here
here
here
here
Out[38]:
10
In [27]:
lst = list(range(1000))
index(lst, 10)
here
here
here
here
here
here
here
here
Out[27]:
10
In [28]:
index(lst, 999)
here
here
here
here
here
here
here
here
here
here
Out[28]:
999
In [29]:
index(lst, -1)
here
here
here
here
here
here
here
here
here
In [33]:
import timeit
lst = list(range(1000))
times = [timeit.timeit(stmt='index(lst, {})'.format(x), 
                       globals=globals(), 
                       number=1000)
         for x in range(1000)]
In [34]:
import matplotlib.pyplot as plt
plt.plot(times, 'ro')
plt.show()
In [35]:
import timeit
times = []
for size in range(1000, 100000, 100):
    lst = list(range(size))
    times.append(timeit.timeit(stmt='index(lst, -1)',
                               globals=globals(),
                               number=1000))
In [3]:
import matplotlib.pyplot as plt
plt.plot(times, 'ro')
plt.show()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-a8270049b7fc> in <module>()
      1 import matplotlib.pyplot as plt
----> 2 plt.plot(times, 'ro')
      3 plt.show()

NameError: name 'times' is not defined
In [ ]:
import timeit
times = []
for e in range(5, 20):
    lst = list(range(2**e))
    times.append(timeit.timeit(stmt='index(lst, -1)',
                               globals=globals(),
                               number=100000))
In [ ]:
import matplotlib.pyplot as plt
plt.plot(times, 'ro')
plt.show()

5. Insertion sort

Task: to sort the values in a given list (array) in ascending order.

In [1]:
import random
lst = list(range(1000))
random.shuffle(lst)
In [4]:
import matplotlib.pyplot as plt
plt.plot(lst, 'ro')
plt.show()
In [12]:
def bubble_sort(lst):
    # inner loop is bubble, pairwise compares and swap if out of order
    # if bubble up, end up the largest at rightmost end
    # outer loop - do it again (bubble shorter)
    for j in range(len(lst)-1):
        for i in range(len(lst)-j-1):
            if lst[i]>lst[i+1]:
                lst[i],lst[i+1]=  lst[i+1],lst[i]
    
    
In [13]:
bubble_sort(lst)
In [16]:
plt.plot(lst, 'ro')
plt.show()
In [15]:
import timeit
import random
times = [timeit.timeit(stmt='bubble_sort(lst)',
                       setup='lst=list(range({})); random.shuffle(lst)'.format(size),
                       globals=globals(),
                       number=1)
         for size in range(100, 5000, 250)]
In [17]:
plt.plot(times, 'ro')
plt.show()
In [19]:
def createListAppend(n):
    lst=[]
    for j in range(n):
        lst.append((j,j**2))
    return lst
In [20]:
createListAppend(10)
Out[20]:
[(0, 0),
 (1, 1),
 (2, 4),
 (3, 9),
 (4, 16),
 (5, 25),
 (6, 36),
 (7, 49),
 (8, 64),
 (9, 81)]
In [22]:
def createListComprehension(n):
    return [(i,i**2) for i in range(n)]
In [23]:
createListComprehension(10)
Out[23]:
[(0, 0),
 (1, 1),
 (2, 4),
 (3, 9),
 (4, 16),
 (5, 25),
 (6, 36),
 (7, 49),
 (8, 64),
 (9, 81)]
In [26]:
import timeit
import random
times = [timeit.timeit(stmt='createListAppend({})'.format(size),
                       globals=globals(),
                       number=1)
         for size in range(100, 5000, 250)]
In [27]:
plt.plot(times, 'ro')
plt.show()
In [30]:
import timeit
import random
times = [timeit.timeit(stmt='createListComprehension({})'.format(size),
                       globals=globals(),
                       number=1)
         for size in range(100, 5000, 250)]
In [31]:
plt.plot(times, 'ro')
plt.show()