The list [2, 4, 7, 8, 1, 3, 6] is partially sorted. (1) is compared with its left neighbor (8) and switches places to create [2, 4, 7, 1, 8, 3, 6]. Then (1) is compared with (7).

Insertion Sort Algorithm

Quick reference

Complexity
Worst case time O(n2)O(n^2)
Best case time O(n)O(n)
Average case time O(n2)O(n^2)
Space O(1)O(1)

Strengths:

  • Intuitive. Ever arranged your cards while playing poker or go-fish? Chances are, you used something like insertion sort.
  • Space efficient. Insertion sort can be done in-place

    An in-place function modifies data structures or objects outside of its own stack frame

    Overview

    The call stack is what a program uses to keep track of function calls. The call stack is made up of stack frames—one for each function call.

    For instance, say we called a function that rolled two dice and printed the sum.

      def roll_die():
        return random.randint(1, 6)
    
    def roll_two_and_sum():
        total = 0
        total += roll_die()
        total += roll_die()
        print total
    
    roll_two_and_sum()
    

    First, our program calls roll_two_and_sum(). It goes on the call stack:

    roll_two_and_sum()

    That function calls roll_die(), which gets pushed on to the top of the call stack:

    roll_die()
    roll_two_and_sum()

    Inside of roll_die(), we call random.randint(). Here's what our call stack looks like then:

    random.randint()
    roll_die()
    roll_two_and_sum()

    When random.randint() finishes, we return back to roll_die() by removing ("popping") random.randint()'s stack frame.

    roll_die()
    roll_two_and_sum()

    Same thing when roll_die() returns:

    roll_two_and_sum()

    We're not done yet! roll_two_and_sum() calls roll_die() again:

    roll_die()
    roll_two_and_sum()

    Which calls random.randint() again:

    random.randint()
    roll_die()
    roll_two_and_sum()

    random.randint() returns, then roll_die() returns, putting us back in roll_two_and_sum():

    roll_two_and_sum()

    Which calls print():

    print()
    roll_two_and_sum()

    What's stored in a stack frame?

    What actually goes in a function's stack frame?

    A stack frame usually stores:

    • Local variables
    • Arguments passed into the function
    • Information about the caller's stack frame
    • The return address—what the program should do after the function returns (i.e.: where it should "return to"). This is usually somewhere in the middle of the caller's code.

    Some of the specifics vary between processor architectures. For instance, AMD64 (64-bit x86) processors pass some arguments in registers and some on the call stack. And, ARM processors (common in phones) store the return address in a special register instead of putting it on the call stack.

    The Space Cost of Stack Frames

    Each function call creates its own stack frame, taking up space on the call stack. That's important because it can impact the space complexity of an algorithm. Especially when we use recursion.

    For example, if we wanted to multiply all the numbers between 11 and nn, we could use this recursive approach:

      def product_1_to_n(n):
        return 1 if n <= 1 else n * product_1_to_n(n - 1)
    

    What would the call stack look like when n = 10?

    First, product_1_to_n() gets called with n = 10:

        product_1_to_n()    n = 10
    

    This calls product_1_to_n() with n = 9.

        product_1_to_n()    n = 9
    
        product_1_to_n()    n = 10
    

    Which calls product_1_to_n() with n = 8.

        product_1_to_n()    n = 8
    
        product_1_to_n()    n = 9
    
        product_1_to_n()    n = 10
    

    And so on until we get to n = 1.

        product_1_to_n()    n = 1
    
        product_1_to_n()    n = 2
    
        product_1_to_n()    n = 3
    
        product_1_to_n()    n = 4
    
        product_1_to_n()    n = 5
    
        product_1_to_n()    n = 6
    
        product_1_to_n()    n = 7
    
        product_1_to_n()    n = 8
    
        product_1_to_n()    n = 9
    
        product_1_to_n()    n = 10
    

    Look at the size of all those stack frames! The entire call stack takes up O(n)O(n) space. That's right—we have an O(n)O(n) space cost even though our function itself doesn't create any data structures!

    What if we'd used an iterative approach instead of a recursive one?

      def product_1_to_n(n):
        # We assume n >= 1
        result = 1
        for num in range(1, n + 1):
            result *= num
    
        return result
    

    This version takes a constant amount of space. At the beginning of the loop, the call stack looks like this:

        product_1_to_n()    n = 10, result = 1, num = 1
    

    As we iterate through the loop, the local variables change, but we stay in the same stack frame because we don't call any other functions.

        product_1_to_n()    n = 10, result = 2, num = 2
    

        product_1_to_n()    n = 10, result = 6, num = 3
    

        product_1_to_n()    n = 10, result = 24, num = 4
    

    In general, even though the compiler or interpreter will take care of managing the call stack for you, it's important to consider the depth of the call stack when analyzing the space complexity of an algorithm.

    Be especially careful with recursive functions! They can end up building huge call stacks.

    What happens if we run out of space? It's a stack overflow! In Python 2.7, you'll get a RecursionError.

    If the very last thing a function does is call another function, then its stack frame might not be needed any more. The function could free up its stack frame before doing its final call, saving space.

    This is called tail call optimization (TCO). If a recursive function is optimized with TCO, then it may not end up with a big call stack.

    In general, most languages don't provide TCO. Scheme is one of the few languages that guarantee tail call optimization. Some Ruby, C, and Javascript implementations may do it. Python and Java decidedly don't.

    (i.e.: stored on the process heap or in the stack frame of a calling function). Because of this, the changes made by the function remain after the call completes.

    In-place algorithms are sometimes called destructive, since the original input is "destroyed" (or modified) during the function call.

    Careful: "In-place" does not mean "without creating any additional variables!" Rather, it means "without creating a new copy of the input." In general, an in-place function will only create additional variables that are O(1)O(1) space.

    An out-of-place function doesn't make any changes that are visible to other functions. Usually, those functions copy any data structures or objects before manipulating and changing them.

    In many languages, primitive values (integers, floating point numbers, or characters) are copied when passed as arguments, and more complex data structures (lists, heaps, or hash tables) are passed by reference. This is what Python does.

    Here are two functions that do the same operation on a list, except one is in-place and the other is out-of-place:

      def square_list_in_place(int_list):
        for index, element in enumerate(int_list):
            int_list[index] *= element
    
        # NOTE: no need to return anything - we modified
        # int_list in place
    
    
    def square_list_out_of_place(int_list):
        # We allocate a new list with the length of the input list
        squared_list = [None] * len(int_list)
    
        for index, element in enumerate(int_list):
            squared_list[index] = element ** 2
    
        return squared_list
    

    Working in-place is a good way to save time and space. An in-place algorithm avoids the cost of initializing or copying data structures, and it usually has an O(1)O(1) space cost.

    But be careful: an in-place algorithm can cause side effects. Your input is "destroyed" or "altered," which can affect code outside of your function. For example:

      original_list = [2, 3, 4, 5]
    square_list_in_place(original_list)
    
    print "original list: %s" % original_list
    # Prints: original list: [4, 9, 16, 25], confusingly!
    

    Generally, out-of-place algorithms are considered safer because they avoid side effects. You should only use an in-place algorithm if you're space constrained or you're positive you don't need the original input anymore, even for debugging.

    , requiring O(1)O(1) additional space.
  • Fast on a sorted list. If the input list is already sorted, then insertion sort runs in O(n)O(n) time.

Weaknesses:

  • Slow. Insertion sort usually takes O(n2)O(n^2) time—too slow to be used on super-big data sets.

How It Works

Insertion sort works by inserting elements from an unsorted list into a sorted subsection of the list, one item at a time.

Here's how it'd work on this list:

Unsorted input: [8, 3, 2, 7, 9, 1, 4].

We'll break the the list into two chunks: a sorted portion and an unsorted portion.

[8, 3, 2, 7, 9, 1, 4] can be broken into a sorted portion [8] and an unsorted portion [3, 2, 7, 9, 1, 4].

(The 0th item is "sorted" to start.)

Now we add the next item, 3, to our sorted portion.

(3) is the first element in the unsorted portion [3, 2, 7, 9, 1, 4].

It goes before 8, so we need to swap them.

(3) is smaller than its left neighbor (8). Swapping them, we have [3, 8, 2, 7, 9, 1, 4]. The first two elements are sorted.

Now the first two elements in the list are sorted.

Next up we've got a 2.

(2) is the first element in the unsorted portion [2, 7, 9, 1, 4].

It goes before the 8, so we'll swap them.

(2) is smaller than its left neighbor (8). Swapping them, we have [3, 2, 8, 7, 9, 1, 4].

2 is also smaller than 3, so we need to swap those as well.

(2) is still smaller than its left neighbor (3). Swapping them, we have [2, 3, 8, 7, 9, 1, 4]. The first three elements are sorted.

At this point, the first three elements are sorted.

As you can see, the idea is to "swap" each new item to the left until it's in the right spot.

The next unsorted item is a 7. It'll need to be swapped with the 8.

(7) is the first element in the unsorted portion [7, 9, 1, 4]. Since its smaller than its left neighbor (8), they swap, creating [2, 3, 7, 8, 9, 1, 4].

Once that's done, we can compare 7 and 3. Since 7 is bigger, we don't need to swap them. So we're done moving the 7 down.

No additional swaps are needed. We have [2, 3, 7, 8, 9, 1, 4], and the first four elements are sorted.

Now we've got the first four elements in sorted order.

9 is next, and it's already in the right spot.

(9) does not need to move at all. We have [2, 3, 7, 8, 9, 1, 4], and the first five elements are sorted.

We'll have to swap a bunch of elements to get the 1 down to the start of the list

(1) moves leftward, trading places with (9) [2, 3, 7, 8, 1, 9, 4], (8) [2, 3, 7, 1, 8, 9, 4], (7) [2, 3, 1, 7, 8, 9, 4], (3) [2, 1, 3, 7, 8, 9, 4], and (2) [1, 2, 3, 7, 8, 9, 4]. Finally, we have [1, 2, 3, 7, 9, 8, 4], and the first 6 elements are sorted.

Finally, we'll swap the 4 over to the right spot.

(4) moves leftward, trading places with (9) [1, 2, 3, 7, 8, 4, 9], (8) [1, 2, 3, 7, 4, 8, 9], and (7) [1, 2, 3, 4, 7, 8, 9] until it's larger than its left neighbor (3). We're left with [1, 2, 3, 4, 7, 8, 9], which is sorted.

Implementation

Let's code it up:

  def insertion_sort(the_list):

    # For each item in the input list
    for index in xrange(len(the_list)):

        # Shift it to the left until it's in the right spot
        while index > 0 and the_list[index - 1] >= the_list[index]:
            the_list[index], the_list[index - 1] =\
                the_list[index - 1], the_list[index]
            index -= 1

Time Complexity

Worst Case

In the worst case, the input list is in descending order (reverse-sorted order). So each time we insert an element into the sorted portion, we'll need to swap it with each of the elements already in the sorted list to get it all the way to the start. That's 1 swap the first time, 2 swaps the second time, 3 swaps the third time, and so on, up to n1n - 1 swaps for the last item.

Adding them all up:

1+2+3++(n2)+(n1) 1 + 2 + 3 + \ldots + (n - 2) + (n - 1)

That's the triangular series, whose sum is O(n2)O(n^2)

A triangular series is a series of numbers where each number could be the row of an equilateral triangle.

So 1, 2, 3, 4, 5 is a triangular series, because you could stack the numbers like this:

Rows of 1, 2, 3, 4, and 5 circles to show how numbers in a triangular series can be stacked to form a triangle.

Their sum is 15, which makes 15 a triangular number.

A triangular series always starts with 1 and increases by 1 with each number.

Since the only thing that changes in triangular series is the value of the highest number, it’s helpful to give that a name. Let’s call it nn.

  # n is 8
1, 2, 3, 4, 5, 6, 7, 8

Triangular series are nice because no matter how large nn is, it’s always easy to find the total sum of all the numbers.

Take the example above. Notice that if we add the first and last numbers together, and then add the second and second-to-last numbers together, they have the same sum! This happens with every pair of numbers until we reach the middle. If we add up all the pairs of numbers, we get:

  1 + 8 = 9
2 + 7 = 9
3 + 6 = 9
4 + 5 = 9

This is true for every triangular series:

  1. Pairs of numbers on each side will always add up to the same value.
  2. That value will always be 1 more than the series’ nn.

This gives us a pattern. Each pair's sum is n+1n+1, and there are n2\frac{n}{2} pairs. So our total sum is: (n+1)n2(n + 1) * \frac{n}{2}

Or:

n2+n2\frac{n^2 + n}{2}

Ok, but does this work with triangular series with an odd number of elements? Yes. Let's say n=5n = 5. So if we calculate the sum by hand:

1+2+3+4+5=151+2+3+4+5=15

And if we use the formula, we get the same answer:

52+52=15\frac{5^2 + 5}{2}=15

One more thing:

What if we know the total sum, but we don't know the value of nn?

Let’s say we have:

1+2+3++(n2)+(n1)+n=781 + 2 + 3 + \ldots + (n - 2) + (n - 1) + n = 78

No problem. We just use our formula and set it equal to the sum!

n2+n2=78\frac{n^2 + n}{2}=78

Now, we can rearrange our equation to get a quadratic equation (remember those?)

n2+n=156n^2 + n = 156 n2+n156=0n^2 + n - 156 = 0

Here's the quadratic formula:

b±b24ac2a\frac{-b\pm\sqrt{b^2-4ac}}{2a}

If you don't really remember how to use it, that's cool. You can just use an online calculator. We don't judge.

Taking the positive solution, we get n=12n = 12.

So for a triangular series, remember—the total sum is:

n2+n2\frac{n^2 + n}{2}

. Each swap costs constant time, so that's O(n2)O(n^2) time overall.

Best Case

In the best case, the input list is already sorted. We'll still need to compare every element to the one before it, to check if it needs to be swapped. But the answer will always be "no." So we sidestep the swapping entirely and just do those O(n)O(n) comparisons, costing O(n)O(n) time overall.

Space Complexity

Insertion sort doesn't rely on any extra lists, so it's O(1)O(1) space.

What's next?

If you're ready to start applying these concepts to some problems, check out our mock coding interview questions.

They mimic a real interview by offering hints when you're stuck or you're missing an optimization.

Try some questions now

. . .