But it doesn't have to end here! Sign up for the 7-day coding interview crash course and you'll get a free Interview Cake problem every week.
You're in!
You left your computer unlocked and your friend decided to troll you by copying a lot of your files to random spots all over your file system.
Even worse, she saved the duplicate files with random, embarrassing names ("this_is_like_a_digital_wedgie.txt" was clever, I'll give her that).
Write a function that returns an array of all the duplicate files. We'll check them by hand before actually deleting them, since programmatically deleting files is really scary. To help us confirm that two files are actually duplicates, return an array of FilePaths objects with variables for the original and duplicate paths:
class FilePaths: CustomStringConvertible {
let duplicatePath: String
let originalPath: String
init(duplicatePath: String, originalPath: String) {
self.duplicatePath = duplicatePath
self.originalPath = originalPath
}
var description: String {
return "(original: \(originalPath), duplicate: \(duplicatePath))"
}
}
For example:
(duplicatePath: "/tmp/parker_is_dumb.mpg", originalPath: "/home/parker/secret_puppy_dance.mpg")
(duplicatePath: "/home/trololol.mov", originalPath: "/etc/apache2/httpd.conf")
You can assume each file was only duplicated once.
Log in or sign up with one click to get immediate access to 3 free mock interview questions
We'll never post on your wall or message your friends.
Actually, we don't support password-based login. Never have. Just the OAuth methods above. Why?
Each "fingerprint" takes time and space, so our total time and space costs are where is the number of files on the file system.
If we add the last-minute check to see if two files with the same fingerprints are actually the same files (which we probably should), then in the worst case all the files are the same and we have to read their full contents to confirm this, giving us a runtime that's order of the total size of our files on disk.
If we wanted to get this code ready for a production system, we might want to make it a bit more modular. Try separating the file traversal code from the duplicate detection code.
What about concurrency? Can we go faster by splitting this procedure into multiple threads? Also, what if a background process edits a file while our script is running? Will this cause problems?
What about link files (files that point to other files or folders)? One gotcha here is that a link file can point back up the file tree. How do we keep our file traversal from going in circles?
Log in or sign up with one click to get immediate access to 3 free mock interview questions
We'll never post on your wall or message your friends.
Actually, we don't support password-based login. Never have. Just the OAuth methods above. Why?