But it doesn't have to end here! Sign up for the 7-day coding interview crash course and you'll get a free Interview Cake problem every week.
You're in!
Design a texting app.
Kind of like Snapchat. But with text messages, not pictures.
First step: figure out what's in scope for our app.
What questions should we ask?
Don't worry if you've never built a mobile app before—we'll just be talking about the app's overall design. No iOS or Android specifics.
What are we building? A web API? Website? Mobile app? Focus on the mobile app portion for now.
What phones do we support? Android? iOS? Don't focus on specific phones yet; that's a bit more detail than we want at this stage.
Who can we chat with? Assume you'll be chatting with other Tapchat users. Each user has a unique cell phone number.
Is there a length limit for texts sent over Tapchat? Yes: 280 characters. Just like tweets.
What features do we need to support? Let's focus on the basics: sending and receiving messages. If we have time, we'll build up more functionality in our app at the end.
How many people do we want to support? We'll come up with a specific estimate in a bit. Think big—there are seven billion phones in the world.
Alright, this should be enough to get us started. We're building a texting app where users can exchange messages with each other.
Next up: let's crunch some numbers to figure out how big our system will need to be.
How many users do we have? And how many chats does each person send per day?
All of our users have to have mobile phones (since we're building a mobile app). And there are roughly 7 billion phones in the world.
Of course, not all mobile users will use our app. Like Snapchat, we'll probably be most popular with people in their teens and 20's. If that's roughly a sixth of the total mobile population, then we're looking at a little over 1 billion target users.
We'd love to have all of those people use our app. But we also want to be realistic when building out our system.
Facebook has spread widely around the globe, and with over 2 billion users, they're reaching about a third of the population. So, let's say we'll get a third of our target audience. That puts us around 400 million users.
As a reference point, Snapchat has around 160 million users. So we're definitely in the ballpark for how many users we could have when we go viral.
This is kind of tricky to estimate. Some users will probably send chats constantly, while others might only open the app every few days.
Let's say the most any user would send is 100 chats per day. That means we should be fine if our system can handle 40 billion chats per day in total. Call it 50 billion just to be on the safe side.
If our users send messages at a consistent rate throughout the day, that's about 2 billion chats per hour, 30 million chats per minute, or 500,000 chats per second.
Will users really send messages at a consistent rate?
It's tempting to say yes. After all, with a global user base, any variation by time (e.g.: more chats sent in the evenings than overnight) should sort of balance out.
But what about events that cause a spike in the number of chats sent? We'll probably see lots of chats on holidays like Christmas. Or, if there's a natural disaster.
Since we're basing our numbers on the assumption that every user sends 100 chats per day (a high estimate), we have a good amount of extra capacity built in. That should cover any spikes we might see.
So, we'll keep working with our assumption that the most we'll ever see is 500,000 chats per second. Most of the time we'll see fewer, and we really shouldn't ever see more, even if users start sending a lot of messages at once.
Do we even need storage? Can't we just deliver each message to its recipient immediately?
Not if their phone is turned off!
So, when a user sends a message, we'll need to store it briefly—just until we can deliver it to the recipient. In most cases, that's shouldn't take too long. But in rare cases (like, a user loses their phone and has to save up for a new one), it might be a while before we can deliver the message. To keep old messages from cluttering up our database, we'll delete any undelivered messages after they've been there for a month.
Okay, so how many chats will we have to store in our database?
As a rough upper limit, let's say that we have to be able to store about 50 billion chats—roughly the number of chats we'd get every day.
Earlier, we decided that each chat would be 280 characters. So, that's 280 bytes, plus some extra information (like its sender). As an estimate, let's say 300 bytes per chat.
Is 280 characters always 280 bytes?
It depends how we're storing our characters. If we only need to support English, then we can use an encoding like ASCII, where one character takes up one byte. But if we want to support other languages or emojis, we'll need a different encoding, like UTF-8. Then, 280 characters could take up more than 280 bytes.
For now, we'll stick with ASCII, since it'll make our calculations a bit simpler.
If we're storing 50 billion chats, at 300 bytes per chat, we're looking at about 15 terabytes of data.
Now that we've got some numbers, let's come up with a minimum viable product that'll handle a small number of users. Then, we'll scale that up to our global audience of texters.
Okay, so to get started, what's in our minimum viable product?
The main building blocks of Tapchat are the user's phone, servers for handling requests (to either send a new message or request received messages), and some storage for tracking chats that haven't been delivered yet.
Drawing out all these components on a whiteboard, here's what we've got:
This isn't going to scale too well, but it gets the big pieces of our system on the drawing board and gives us something to work with.
Let's keep fleshing this out. What's our data model?
We're mainly keeping track of one thing: chats from one user to another.
Each chat has a sender, a recipient, and contents.
How should we store users, like the sender and recipient, in our database?
There are a few different ways we could do it.
We could have our users pick a unique user ID when they sign up or assign them one automatically. Do we need that though?
Since each user has a distinct phone number, we could use that to identify users. Since international phone numbers are at most 15 digits, if we stored them as strings it'd be 15 characters for each user, which is 15 bytes.
We don't have to use strings though. We could convert our phone number strings into actual numbers and store that. We'd need to be able to store up to 10^{15} numbers, which fits in 64 bits.
We'll go with that, since it's the most space efficient.
So, here's what our database looks like:
Chats |
---|
|
296 bytes |
Now that we've got a basic data model in place, let's run through the steps involved for sending and receiving messages.
Say one user, Klarissa, wants to send a chat to Ajit. What does that look like under the hood?
Klarissa types out her message in the Tapchat mobile app. Once she's finished, she'll tap "Send."
This sends Klarissa's message up to our app servers.
If we can deliver the message to Ajit, we'll do that. Otherwise, we'll add Klarissa's message to our database.
At some point later on, Ajit will open up the Tapchat app, which will connect to our app servers and request any new messages.
When we query the database for messages pending for Ajit, we'll find Klarissa's message, send it to Ajit, and delete it from our database.
What if Ajit turns off his phone after we send the message but before the message gets processed by the app? If we're not careful, we might lose the message.
To prevent this, we shouldn't delete a message from our database until we confirm the message has been successfully delivered to the user.
Great, looks like we've got the basic logic in place here.
Interviewer: Let's dig into communication between the app servers and recipient a bit more. How does the server get messages to the recipient?
Since we're working with mobile phones, we can send push notifications.
How do push notifications work under the hood?
Each device manufacturer maintains a push notification service. Apple's is the Apple Push Notification Service (APNS). Google's is called Firebase Cloud Messaging. When a phone starts up, it sets up a secure connection with its appropriate Push Notification Service.
Each app that is allowed to send push notifications must register with the phone's operating system (iOS or Android). The phone OS requests a device-specific token from the Push Notification Service and gives this token to the app.
Application servers (e.g.: our Tapchat server) can use this token to send notifications to the Push Notification Service, which forwards the data to the user's device on behalf of the app.
So, when our servers get a message to be delivered to another user, they can turn around and send that message to the recipient through a push notification.
Of course, we'll need some way to handle cases where delivery of the push notification fails for some reason. (Maybe the user disabled push notifications. Or the push notification service is down. Or something else went wrong.)
In those cases, we'll need to wait for the user to manually request messages from us. We could build this into our app's user interface (UI), adding a button for requesting new messages or mimicking other apps by having users swipe down to fetch new messages.
We could also periodically check for messages in the background, assuming our app has the right permissions. But we need to be careful— if the app checks too often, it'll drain the battery!
Putting those together, here's what we have:
Interviewer: Okay, nice. Just brainstorming for a bit, how do you think the Push Notification Service delivers messages to the user's phone?
One way the communication could work is with polling. Periodically, the phone would reach out to the Push Notification Service and request any new notifications.
One issue: polling doesn't scale! Apple has millions of iPhones, and they'd all be trying to connect to the Push Notification Service every minute. Yikes—that would take a lot of servers and network bandwidth.
Other connection types like long polling or server sent events are a bit more complex to set up and administer, but they generate much less network traffic and scale more gracefully.
It's hard to know exactly what is happening under the hood of the Push Notification Service. But it probably looks a lot like server sent events.
Interviewer: "Okay, great. Let's get back on track and move on to scaling up the system."
Right. We want to handle 50 billion chats per day. What changes should we make to scale up to this load?
With billions of chats per day, we're going to need a lot more than one web server.
We'll go with a tried and true technique here: adding an autoscaler like Amazon's EC2 to spin up new servers as they're needed, on the fly. To distribute load between them, we'll add in a load balancer.
Users (or rather, their apps) interact with our servers in two ways:
We could design our system with two sets of servers: one set for handling incoming chats and another set for delivering chats to users. That might seem like a good idea, since it'll cleanly distinguish between these two actions.
Alternatively, we could just have one big group of servers, and we'd make each server capable of handling both actions.
Both options are fine. Let's go with the second option here. Being able to treat each server as interchangeable will make our system easier to maintain and administer, avoiding two separate sets of infrastructure.
Let's shift gears and focus on our database. How can we make it handle billions of messages per day?
Does every message make it to our database?
Maybe not. If we can quickly deliver a message using push notifications, we might not need to store it at all.
But we'll definitely need to store messages that we can't deliver via push notifications.
Just to be on the safe side, we'll make sure our system can handle the case where no messages are delivered quickly and all of them end up being written to the database.
With 50 billion chats per day, we're writing and reading about 15 terabytes of data every 24 hours. That's a lot of data, reads, and writes—probably more than a single machine could handle.
Once we break our database across multiple machines, our database management becomes much more complicated. Traditional SQL databases offer nice guarantees about data consistency (everyone sees the same data in the database) and are flexible to query. But they're trickier to scale across multiple machines. NoSQL databases scale well in a distributed system, but offer fewer features and guarantees.
Which type of database should we use?
The most important features of our database are scalability (since we want to be able to handle millions of phones) and speed (since we want our app to be reasonably fast). And, we don't need strong consistency guarantees or database transactions.
Given these priorities, a NoSQL database seems like a better choice.
Interviewer: "Can you talk a bit about the different types of NoSQL databases? Any ideas on what type you'd use here?"
There are four main types of NoSQL databases:
"There are so many NoSQL databases! How do I know which one to use?"
Even if you're not familiar with the specifics of a particular NoSQL database, you should know the different types of NoSQL databases (i.e.: the ones we listed above) and pick one to use.
There might be more than one option that makes sense—in some cases you can use a column database or a document database. Pick whichever one seems to best fit the problem.
We want a NoSQL database that pairs a user's phone number with a list of messages to deliver to them.
All of these types could work. Let's go with a document database here, since it cleanly corresponds to our model where we're storing a list of chats for each user, keyed by their phone number.
Our database entries look something like this:
What if we chose graph databases instead? That could work!
If you have experience working with NoSQL graph databases, you could suggest using a graph database like Neo4J. Each user could be a node in the database, and we'd add edges to represent messages from one user to another. When we receive a message, we'd add a new edge to our graph. When asked to deliver messages, we'd iterate through all the edges adjacent to the recipient's node.
Interviewer: "Okay, good. Taking a step back to our earlier discussion on SQL and NoSQL, what are common challenges with NoSQL, and how might they complicate our app's design?"
Broadly, NoSQL databases take away three "nice" features that SQL databases have:
Which of these could complicate the design of our app?
Let's run through that list with our app in mind.
We don't need database transactions, so that's not a problem.
We're not doing any fancy lookups—in fact, our data model is really quite simple. So that's OK too.
We do make updates to our databases though, so we need to figure out how we should handle concurrent updates and if we're OK working with data that might not be completely up to date.
What are the specific scenarios in Tapchat that we need to consider?
Here are two big ones:
Let's start with the first one. What do we need to do to make sure we don't drop any messages?
If a user gets a message from two people at the same time, then it's possible that our database will have two distinct values (messages) associated with one key (the recipient's phone number).
That's an inconsistency because looking up the one key can produce different results depending on which database server ends up handling the request.
Here, we'd want to make sure that our database handles this scenario by combining all of the pending messages for a user and setting that as the value for that user's entry.
Of course, the specifics of how we'd implement this will depend on what database we're specifically using. Something to keep in mind when we're choosing between NoSQL options.
Some column databases, like Bigtable or HBase, store their data sorted by key. These databases allow you to efficiently look up all entries whose keys begin with a certain value.
If we had chosen one of these databases for Tapchat, then we could assign each message a unique key, sidestepping the issue of concurrent key updates entirely.
We might construct keys by concatenating the recipient, sender, and a timestamp together. Since these databases are sorted by key, as long as we start our key with the recipient's phone number, all messages sent to the recipient will appear next to each other in our database. That makes it easy to find all the messages that we need to deliver when they're requested.
What about the second scenario? Where a user's request for messages sends back old messages that have already been delivered?
How might this scenario occur? Let's walk through the steps:
Suppose Sofia checks for new messages—sending a request to our app servers and getting back a chat from Min:
Once the message has been delivered, we can delete it from the database. But, while the delete is being processed, Sofia issues another request for new messages.
This request goes to a different server—one that hasn't processed the delete yet. This server delivers the same message from Min to Sofia:
Whoops. Sofia just got Min's message twice.
Where should we try to fix this? In our databases? App servers? Or on the mobile devices?
It might be tempting to try to tweak our databases to prevent this scenario from happening. After all, the whole problem is that our database is sending us back stale data.
Hang on a second though. We're using a NoSQL database, which means that we might not be guaranteed to always read up-to-date data. So, unless we are sure to use a database that does have those guarantees, we can't prevent stale reads in our database.
What about in our app servers?
That's tempting. We could give each message a unique identifier and have our app servers track which identifiers they've already sent down to mobile devices.
Why do we need a separate identifier? Couldn't we just use a hash of the message content?
Hashing the message would only work if each message were unique. If someone sent the same message twice, it would have the same hash, and we'd erroneously only deliver one copy!
"Wake up!" "Wake up!"
How do messages get their unique identifiers? Given all the difficulty we have syncing up information between our database and app servers, it'll be hard to ensure that the IDs are truly unique.
One common solution to this issue is to just randomly generate a 128-bit number and use that as the ID. Since there are so many possibilities, it's exceedingly unlikely that the same number will be generated twice.
To be thorough, we should make sure nothing bad happens if we do have the same ID generated twice. In this case, we might end up erroneously deleting the message with the duplicate ID. If we're worried, we could combine the random ID with other message metadata, like the sender and a timestamp, to make it even less likely we'd have two messages that might be mistaken as duplicates.
One issue with that logic though. We have multiple app servers, and there's no guarantee that requests from one device will always go to the same app server. We'd need some way for all of the app servers to agree on which chats have already been sent out ... but that could be tricky to scale.
Another valid solution would be to use load balancers with sticky sessions. Those load balancers will send traffic from the same device to the same app server every time, which fixes the issue we had above where a device requested data from two different app servers.
But there's a tradeoff: we're giving up some flexibility with load balancing. If one server is under a heavy load and another is free, we can't send traffic from any of our users with existing sessions on the busy server over to the free one. All we can do is direct traffic from new users to the free server.
The logic of associating each message with a unique ID and using that to filter out duplicates read from our database seems promising. Can we implement it in our mobile app so that it runs on the user's phone?
That should work! We'll have our app track the IDs of messages its delivered recently. And, if it gets a new message to deliver that has an ID we've seen before, we'll treat it like a duplicate and won't pass it up to the user.
Let's run through our example again and make sure this fixes it.
Like last time, Sofia requests any new chats from our app servers. We send back a message from Min, this time tagged with an ID of 101:
What happens if Sofia requests her messages again, before the message from Min is completely deleted from our database?
Our database could send the message again, where it'll arrive at Sofia's phone. But this time, we'll see that message ID 101 has already been delivered, so we can delete it.
This way, Sofia only sees the message once. Nice!
Here's another approach to the duplicate message problem. Some document databases, like CouchDB, associate data with a specific revision number that gets updated each time the data change.
If we were using CouchDB, we could have each phone track the latest data version its seen so far and include that revision in its requests to the app servers. If our app server does a lookup and gets back a revision that's older than the one sent by the user, those messages have already been delivered and can be deleted.
Interviewer: We're almost out of time. Just to make sure we're on the same page on what you've built, walk me through the steps of sending and receiving messages in Tapchat.
Here's a final picture of our system's architecture:
Here's what happens on a send:
There could be other users sending chats to the same person. That's OK—our NoSQL database can reconcile concurrent updates.
And, here's what happens when a user requests any new messages manually:
If the same user immediately requests their messages again, and their request goes to a different server, it's possible our app servers will send back the same message twice. But, thanks to our ID checks, it'll only make it up to the user once.
Having fun? Here are some additional pieces of the system to think about:
Log in or sign up with one click to get immediate access to free mock interview questions
We'll never post on your wall or message your friends.
Actually, we don't support password-based login. Never have. Just the OAuth methods above. Why?