# Searching, Sorting, and Timing¶

## Agenda¶

1. Timing
2. Prelude: Timing list indexing
3. Linear search
4. Binary search
5. Insertion sort

## 1. Timing¶

In [2]:
import time
time.time()

Out[2]:
1568209970.4445367
In [4]:
from time import time, sleep
start = time()
sleep(1)
end = time()
print(end - start)

1.0000569820404053


## 2. Prelude: Timing list indexing¶

In [5]:
lst = [0] * 10**5

In [6]:
import timeit
timeit.timeit(stmt='lst[0]', globals=globals())

Out[6]:
0.28391270022756465
In [8]:
timeit.timeit(stmt='lst[10**5-1]', globals=globals())

Out[8]:
0.28254640894888006
In [9]:
times = [timeit.timeit(stmt='lst[{}]'.format(i),
globals=globals(),
number=100)
for i in range(10**5)]

In [10]:
times[:10]

Out[10]:
[7.225578877978478e-05,
4.35176909547863e-05,
4.433877947462861e-05,
3.9822792700761056e-05,
4.474932373454976e-05,
4.433877947462861e-05,
4.3928235214707456e-05,
4.269660246336571e-05,
4.269660246336571e-05,
3.079081909618253e-05]
In [11]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(times, 'ro')
plt.show()


Observation: accessing an element in a list by index takes a constant amount of time, regardless of position.

How? A Python list uses an array as its underlying data storage mechanism. To access an element in an array, the interpreter:

1. Computes an offset into the array by multiplying the element's index by the size of each array entry (which are uniformly sized, since they are merely references to the actual elements)
In [7]:
import timeit
[timeit.timeit(stmt='sorted(lst)',
setup='import random ;  lst =[random.random() for i in range({})]'.format(j),
number=100)
for j in (100, 200, 400, 800, 1600)]

Out[7]:
[0.0009114011355677576,
0.002353631851406135,
0.0059282127916446825,
0.014969968922031285,
0.05584713093386995]

Task: to locate an element with a given value in a list (array).

In [15]:
def index(lst, x):
for i in range(len(lst)-1,-1,-1):  # range(len(lst))
if lst[i]==x:
return i
return None

In [21]:
lst = list(range(100))
index(lst, 10)

Out[21]:
10
In [22]:
lst[101]

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-22-d21792a66db8> in <module>()
----> 1 lst[101]

IndexError: list index out of range
In [17]:
index(lst, 99)

Out[17]:
99
In [11]:
index(lst, -1)

In [18]:
import timeit
lst = list(range(1000))
times = [timeit.timeit(stmt='index(lst, {})'.format(x),
globals=globals(),
number=100)
for x in range(1000)]

In [19]:
import matplotlib.pyplot as plt
plt.plot(times, 'ro')
plt.show()


Task: to locate an element with a given value in a list (array) whose contents are sorted in ascending order.

In [32]:
def index(lst, x):
# assume that lst is sorted!!!
low=0
high=len(lst)-1
while low<=high:
# print('here')
mid = low+(high-low)//2   # (low+high)//
if lst[mid]==x:
return mid
elif lst[mid]<x:
low=mid+1
else:
high=mid-1
return None


In [37]:
def indexRec(lst, x,low=0, high=len(lst)-1):
# assume that lst is sorted!!!
if low>high:
return None
else:
print('here')
mid = low+(high-low)//2   # (low+high)//
if lst[mid]==x:
return mid
elif lst[mid]<x:
low=mid+1
else:
high=mid-1
return indexRec(lst, x,low, high)

In [38]:
lst = list(range(1000))
indexRec(lst, 10)

here
here
here
here
here
here
here
here

Out[38]:
10
In [27]:
lst = list(range(1000))
index(lst, 10)

here
here
here
here
here
here
here
here

Out[27]:
10
In [28]:
index(lst, 999)

here
here
here
here
here
here
here
here
here
here

Out[28]:
999
In [29]:
index(lst, -1)

here
here
here
here
here
here
here
here
here

In [33]:
import timeit
lst = list(range(1000))
times = [timeit.timeit(stmt='index(lst, {})'.format(x),
globals=globals(),
number=1000)
for x in range(1000)]

In [34]:
import matplotlib.pyplot as plt
plt.plot(times, 'ro')
plt.show()

In [35]:
import timeit
times = []
for size in range(1000, 100000, 100):
lst = list(range(size))
times.append(timeit.timeit(stmt='index(lst, -1)',
globals=globals(),
number=1000))

In [3]:
import matplotlib.pyplot as plt
plt.plot(times, 'ro')
plt.show()

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-a8270049b7fc> in <module>()
1 import matplotlib.pyplot as plt
----> 2 plt.plot(times, 'ro')
3 plt.show()

NameError: name 'times' is not defined
In [ ]:
import timeit
times = []
for e in range(5, 20):
lst = list(range(2**e))
times.append(timeit.timeit(stmt='index(lst, -1)',
globals=globals(),
number=100000))

In [ ]:
import matplotlib.pyplot as plt
plt.plot(times, 'ro')
plt.show()


## 5. Insertion sort¶

Task: to sort the values in a given list (array) in ascending order.

In [1]:
import random
lst = list(range(1000))
random.shuffle(lst)

In [4]:
import matplotlib.pyplot as plt
plt.plot(lst, 'ro')
plt.show()

In [12]:
def bubble_sort(lst):
# inner loop is bubble, pairwise compares and swap if out of order
# if bubble up, end up the largest at rightmost end
# outer loop - do it again (bubble shorter)
for j in range(len(lst)-1):
for i in range(len(lst)-j-1):
if lst[i]>lst[i+1]:
lst[i],lst[i+1]=  lst[i+1],lst[i]


In [13]:
bubble_sort(lst)

In [16]:
plt.plot(lst, 'ro')
plt.show()

In [15]:
import timeit
import random
times = [timeit.timeit(stmt='bubble_sort(lst)',
setup='lst=list(range({})); random.shuffle(lst)'.format(size),
globals=globals(),
number=1)
for size in range(100, 5000, 250)]

In [17]:
plt.plot(times, 'ro')
plt.show()

In [19]:
def createListAppend(n):
lst=[]
for j in range(n):
lst.append((j,j**2))
return lst

In [20]:
createListAppend(10)

Out[20]:
[(0, 0),
(1, 1),
(2, 4),
(3, 9),
(4, 16),
(5, 25),
(6, 36),
(7, 49),
(8, 64),
(9, 81)]
In [22]:
def createListComprehension(n):
return [(i,i**2) for i in range(n)]

In [23]:
createListComprehension(10)

Out[23]:
[(0, 0),
(1, 1),
(2, 4),
(3, 9),
(4, 16),
(5, 25),
(6, 36),
(7, 49),
(8, 64),
(9, 81)]
In [26]:
import timeit
import random
times = [timeit.timeit(stmt='createListAppend({})'.format(size),
globals=globals(),
number=1)
for size in range(100, 5000, 250)]

In [27]:
plt.plot(times, 'ro')
plt.show()

In [30]:
import timeit
import random
times = [timeit.timeit(stmt='createListComprehension({})'.format(size),
globals=globals(),
number=1)
for size in range(100, 5000, 250)]

In [31]:
plt.plot(times, 'ro')
plt.show()