You only have 3 free questions left (including this one).

But it doesn't have to end here! Sign up for the 7-day coding interview crash course and you'll get a free Interview Cake problem every week.

Find a duplicate, Space Edition™ BEAST MODE

In Find a duplicate, Space Edition™, we were given an array of integers where:

  1. the integers are in the range 1..n1..n
  2. the array has a length of n+1n+1

These properties mean the array must have at least 1 duplicate. Our challenge was to find a duplicate number without modifying the input and optimizing for space. We used a divide and conquer approach, iteratively cutting the array in half to find a duplicate integer in O(nlgn)O(n\lg{n}) time and O(1)O(1) space (sort of a modified binary search).

But we can actually do better. We can find a duplicate integer in O(n)O(n) time while keeping our space cost at O(1)O(1).

This is a tricky one to derive (unless you have a strong background in graph theory), so we'll get you started:

Imagine each item in the array as a node in a linked list. In any linked list,

Quick reference

Worst Case
space O(n)O(n)
prepend O(1)O(1)
append O(1)O(1)
lookup O(n)O(n)
insert O(n)O(n)
delete O(n)O(n)

A linked list organizes items sequentially, with each item storing a pointer to the next one.

Picture a linked list like a chain of paperclips linked together. It's quick to add another paperclip to the top or bottom. It's even quick to insert one in the middle—just disconnect the chain at the middle link, add the new paperclip, then reconnect the other half.

An item in a linked list is called a node. The first node is called the head. The last node is called the tail.

Confusingly, sometimes people use the word tail to refer to "the whole rest of the list after the head."

A linked list with 3 nodes. The first node is labelled "head" and the last node is labelled "tail."

Unlike an array, consecutive items in a linked list are not necessarily next to each other in memory.

Strengths:

  • Fast operations on the ends. Adding elements at either end of a linked list is O(1)O(1). Removing the first element is also O(1)O(1).
  • Flexible size. There's no need to specify how many elements you're going to store ahead of time. You can keep adding elements as long as there's enough space on the machine.

Weaknesses:

  • Costly lookups. To access or edit an item in a linked list, you have to take O(i)O(i) time to walk from the head of the list to the iith item.

Uses:

  • Stacks and queues only need fast operations on the ends, so linked lists are ideal.

In PHP

Most languages (including PHP) don't provide a linked list implementation. Assuming we've already implemented our own, here's how we'd construct the linked list above:

  $a = new LinkedListNode(5);
$b = new LinkedListNode(1);
$c = new LinkedListNode(9);

$a->next = $b;
$b->next = $c;

Doubly Linked Lists

In a basic linked list, each item stores a single pointer to the next element.

In a doubly linked list, items have pointers to the next and the previous nodes.

A doubly-linked list with 3 nodes. The first node has value 5 with a "next" arrow pointing ahead to the second node and a "previous" arrow pointing back to "None." The second node has value 1 with a "next" arrow pointing ahead to the third node and a "previous" arrow pointing back to the first node. The third node has value 9 with a "next" arrow pointing ahead to "None" and a "previous" arrow pointing back to the second node.

Doubly linked lists allow us to traverse our list backwards. In a singly linked list, if you just had a pointer to a node in the middle of a list, there would be no way to know what nodes came before it. Not a problem in a doubly linked list.

Not cache-friendly

Most computers have caching systems that make reading from sequential addresses in memory faster than reading from scattered addresses.

Array items are always located right next to each other in computer memory, but linked list nodes can be scattered all over.

So iterating through a linked list is usually quite a bit slower than iterating through the items in an array, even though they're both theoretically O(n)O(n) time.

each node has a value and a "next" pointer. In this case:

  • The value is the integer from the array.
  • The "next" pointer points to the value-eth node in the list (numbered starting from 1). For example, if our value was 3, the "next" node would be the third node.

Here’s a full example:

An array [2, 3, 1, 3], so 2 is in the first position and points to 3 in the second position.

Notice we're using "positions" and not "indices." For this problem, we'll use the word "position" to mean something like "index," but different: indices start at 0, while positions start at 1. More rigorously: position = index + 1.

Using this, find a duplicate integer in O(n)O(n) time while keeping our space cost at O(1)O(1). Just like before, don't modify the input.

Drawing pictures will help a lot with this one. Grab some paper and pencil (or a whiteboard, if you have one).

Gotchas

We don’t need any new data structures. Your final space cost must be O(1)O(1).

We can do this without destroying the input.

Breakdown

Start your free trial!

Log in or sign up with one click to get immediate access to 3 free mock interview questions

Where do I enter my password?

Actually, we don't support password-based login. Never have. Just the OAuth methods above. Why?

  1. It's easy and quick. No "reset password" flow. No password to forget.
  2. It lets us avoid storing passwords that hackers could access and use to try to log into our users' email or bank accounts.
  3. It makes it harder for one person to share a paid Interview Cake account with multiple people.

Solution

Start your free trial!

Log in or sign up with one click to get immediate access to 3 free mock interview questions

Where do I enter my password?

Actually, we don't support password-based login. Never have. Just the OAuth methods above. Why?

  1. It's easy and quick. No "reset password" flow. No password to forget.
  2. It lets us avoid storing passwords that hackers could access and use to try to log into our users' email or bank accounts.
  3. It makes it harder for one person to share a paid Interview Cake account with multiple people.

Complexity

O(n)O(n) time and O(1)O(1) space.

Our space cost is O(1)O(1) because all of our additional variables are integers, which each take constant space.

For our runtime, we iterate over the array a constant number of times, and each iteration takes O(n)O(n) time in its worst case. So we traverse the linked list more than once, but it's still a constant number of times—about 3.

Bonus

There another approach using randomized algorithms that is O(n)O(n) time and O(1)O(1) space. Can you come up with that one? (Hint: You'll want to focus on the median.)

What We Learned

Start your free trial!

Log in or sign up with one click to get immediate access to 3 free mock interview questions

Where do I enter my password?

Actually, we don't support password-based login. Never have. Just the OAuth methods above. Why?

  1. It's easy and quick. No "reset password" flow. No password to forget.
  2. It lets us avoid storing passwords that hackers could access and use to try to log into our users' email or bank accounts.
  3. It makes it harder for one person to share a paid Interview Cake account with multiple people.

Do you have an answer?

6
7
8
# Find a number that appears more than once ... in O(n) time
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Reset editor

Powered by qualified.io

. . .