The Linux one-liner of today answers a simple sounding question:
What is the largest file within an entire directory tree?
Answering this is not as straightforward as it may sound at first. But with a clever combination of easy to understand command line tools (aka. with a “one liner”), it’s solvable in seconds.
Are you ready? So let’s go.
Imagine a filled up disk. And imagine you want to free up some space. And for this, you want to start with the largest file on this disk.
The first tool that comes to mind for showing and handling the size of files is the tools “ls”: If you call "ls -l" within a directory, you see the disk space occupied by each individual single file.
I bet you know an output like this …
… where the 5th column shows you the size of every single file. (there are a few files more within this directory, I’m showing only the first seven here)
Now let’s add two command line options to ls - just to see the largest file from this list in a convenient way: The two switches "-h" and "-S".
The first of the two, the "-h" command line switch, gives you the output of the file-sizes in a “human readable” format. That means you no longer have to count the digits showing you the numbers of bytes. Instead you’ll see the sizes in an easy readable way like “1.2M” for instance.
The second command line switch "-S"(that’s an uppercase “S”) does all the “heavy lifting” for sorting the files by their size.
As you see, the largest file in the current directory here is the file “demolog” with a size of 1.2MB. (again - there are a few more files within this directory, I’m showing only the first lines of output here)
But this is only the largest file within the single current directory.
As you can see in the screenshot, there are a few other directories contained within the current directory (“bin” and “project” are shown here). And these directories surely could contain files larger than the largest file we found till now …
So, how can we include the files from these sub-directories into our search too?
How can we find the largest file recursively from an entire directory tree?
If you wanna talk about the load of a Linux system or if you wanna measure the load of a Linux system for monitoring purposes, you are always talking about a sequence of three decimal values:
The screenshot above was taken from the command “w” on a reasonably busy Linux system and you see the three decimal numbers describing the load average of the system.
But what are these numbers telling you?
In contrast to Windows based systems, where you typically talk about the CPU utilization in percent, the three values of the load-average gives you a great insight into if your system is idle or loaded.
And with the addition of some detail of your system, it is really easy for you to judge the current load.
Is your system overloaded? And if yes, how much more resources are needed to serve the current requirements.
… or maybe your system isn’t overloaded at all, although the numbers seem so at a first glance.
A second useful information you’ll get from these numbers is if you are observing your system during a short lasting load peak only or if the load your system is dealing with is a constant one.
Let’s start with talking about the one who is calculating these three numbers for us: The process scheduler.
Sometimes you are faced with a problem at the Linux command line that tries hard to force you to write a script. With perl, with python or just a quick and dirty shell script.
But most of the time, you do not even need to leave the command line to solve the task.
The trick is just to know the right tools and combine them cleverly.
Just as with the example I want to show you here.
And you hear me saying this once again:There are plenty of other ways at the Linux command line to solve this or similar tasks. So take this example to get inspiration for how to tackle tasks like this. And take it as a demonstration of the power that the Linux command line gives you.
That’s said - let’s jump in and have a look at the task we want to solve today:
Calculate the top 10 IP addresses hitting a website based on the web servers log file
… without scripting! Just by using a clever combination of Linux command line tools.
As part of the “Ask Robert” campaign, I recently got the following question:
“Assuming I have a huge directory of files of different types. How can I - with the help of the shell and awk - get a list of all *.tex and *.txt files within this directory, that contain a certain string within the first N lines?”
(For the sake of readability I slightly edited the wording. Ulrike - I hope you’re ok with this.)
This is a very great task. First, to demonstrate the power of the Linux command-line and second, to give you an understanding of how to tackle these types of problems.
And as always - there are way more ways to get this problem solved.
So take this as an inspiration for similar tasks.
Ok - so let’s start.
This is what we have
many, many files within a directory
some of them are *.txt and *.tex files
And that’s what we want:
The name of every *.txt or *.tex file, that contains a search-phrase within the first N lines
Let’s start with the part that searches the files for a search phrase
If you are working at the Linux shell, then every now and then it will happen that the output a command-line gives you is more extensive than the amount of data your terminal-window can display.
If this happens, the output not fitting into the terminal anymore simply scrolls out at the top and is not visible anymore.
But what if this is the output you’re interested in?
Of course you can have a look at the scrollbars your terminal window may give you. (perhaps you’ve opened a graphical terminal window or you are connected via putty)
Then take the mouse and simply scroll up.
But wait! What if the terminal doesn’t give you a scrollbar?
What if you are accessing the Linux system from the “real” console, for instance within your virtualization environment VMware vSphere or Hyper-V?
Aaand … don’t the cool guys always try hard to avoid using the mouse? ;-)
Well - because I know you are one of these cool guys, I have three ways for you to handle massive text-output at the command-line
The statement “everything is a file” is a kind of Unix/Linux-philosophy.
The idea here is the following:
No matter what you wanna accomplish on a system - everything you need for getting the job done are tools to work somehow with files.
Print them out, modify them, copy them, move them … you name it.
Well - that doesn’t apply to really everything. But as you are working more and more on Linux systems, you will recognize this pattern again and again.
No - I don’t have any empirical data for this. But …
if I should name the top 5 frustrations of Linux beginners, then the “command-not-found” errors would be most certainly on this list.
Especially if you are new to the Linux command line, it’s not always obvious what is causing this error.
First: You simply don’t know all the tools
As you start exploring the Linux command line (aka “the Linux shell”), you simply don’t yet know the tools that are available there.
There are a load of them and unfortunately not every tool is available on every Linux distribution. And don’t worry - you will get to know them step-by-step once you proceed through your Linux journey.
(I’ll show you a shortcut for getting all the really essential tools later on - but that’s _not_ the point of this post at all)
The point here is …
The shell searches in it’s own way …
To understand (and really overcome) these “command not found” errors, we need to understand what causes them, don’t we?
So let’s face the challenge and let’s have a look how the shell searches for a command to run.