Trying to pull a one-liner out of thin air can be daunting and error-prone. Instead of trying to tackle that head-on, let's work on understanding the framework of our answer, and only afterwards try to convert it into a one-liner. Our first thought might be to keep a running count as we look through the file:
count = 0
text = fh.read()
for character in text:
if character.isupper():
count += 1
This is a great start—the code isn't very long, but it is clear. That makes it easier to iterate over as we build up our solution. There are two main issues with what we have so far if we want to turn it into a one-liner:
-
Variable initialization: In a one-liner, we won't be able to use something like count = 0
-
Memory: The question says this needs to work even on files that won't fit into memory, so we can't just read the whole file into a string.
Let's try to deal with the memory issue first: we can't use the read method since that reads the whole file at once. There are some other common file methods, and , that might help, so let's look at them.
-
readlines is a method that reads a file into a list—each line is a different item in the list. That doesn't really help us—it's still reading the entire file at once, so we still won't have room.
-
readline only reads a single line at a time—it seems more promising. This is great from a memory perspective (let's assume each line fits in memory, at least for now).
But, we can't just replace read with readline because that only gives us the first line. We need to call readline over and over until we read the entire file.
The idea of repeatedly calling a function, such as readline, until we hit some value (the end of the file) is so common, there's a standard library function for it: iter. We'll need the two-argument form of iter, where the first argument is our function to call repeatedly, and the second argument is the value that tells us when to stop (also called the sentinel).
What value do we need as our sentinel? Looking at the documentation for readline, it includes the newline character so even blank lines will have at least one character. It returns an empty string only when it hits the end of the file, so our sentinel is ''.
count = 0
for line in iter(fh.readline, ''):
for character in line:
if character.isupper():
count += 1
And this works! But...it's not as clear as it could be. Understanding this code requires knowing about the two-argument iter and that readline returns '' at the end of the file. Trying to condense all this into a one-liner seems like it might be confusing to follow.
Is there a simpler way to iterate over the lines in the file? If you're using Python 3, there aren't any methods for that on your file handle. If you're using Python 2.7, there is something that sounds interesting—xreadlines. It iterates over the lines in a file, yielding each one to let us process it before reading the next line. In our code, it might be used like:
count = 0
for line in fh.xreadlines():
for character in line:
if character.isupper():
count += 1
It's exactly like our code with readline and iter but even clearer! It's a shame we can't use this in Python3.x though, it seems like it would be great. Let's look at the documentation for this method to see if we can learn what alternatives Python3.x might have:
>> help(fh.readlines)
xreadlines() -> returns self.
For backwards compatibility. File objects now include the performance
optimizations previously implemented in the xreadlines module.
Huh? "returns self"—how does that even do anything?
What's happening here is that iterating over the lines of a file is so common that they built it right in to the object itself. If we use our file object in an iterator, it starts yielding us lines, just like xreadlines! So we can clean up our code, and make it Python3.x compatible, by just removing xreadlines.
count = 0
for line in fh:
for character in line:
if character.isupper():
count += 1
Alright, we've finally solved the issue of efficiently reading the file and iterating over it, but we haven't made any progress on making it a one-liner. As we said in the beginning, we can't initialize variables, so what we need is a function that will just return the count of all capitalized letters. There isn't a count function in Python (at least, not one that would help us here), but we can rephrase the question just enough to find a function that gets the job done.
Instead of thinking about a "count of capitalized letters", let's think about mapping every letter (every character, even) to a number, since our answer is a number. All we care about are capital letters, and each one adds exactly 1 to our final count. Every other character should be ignored, or add 0 to our final count. We can get this mapping into a single line using Python's inline if-else:
count = 0
for line in fh:
for character in line:
count += (1 if character.isupper() else 0)
What did this mapping get us? Well, Python didn't have a function to count capital letters, but it does have a function to add up a bunch of 1s and 0s: sum.
sum takes any iterable, such as a generator expression, and our latest solution—nested for loops and a single if-else—can easily be rewritten as a generator expression:
count = sum(1 if character.isupper() else 0 for line in fh for character in line)
and now we've got a one-liner! It's not quite as clear as it could be—seems unnecessary to explicitly sum 0 whenever we have a character that isn't a capital letter. We can filter those out:
count = sum(1 for line in fh for character in line if character.isupper())
or we can even take advantage of the fact that Python will coerce True to 1 (and False to 0):
count = sum(character.isupper() for line in fh for character in line)