Robert Wohlfahrt = Blog =

Linux one-liner of the day: Normalize the load average

During a Linux class, the following question came up:

“How to normalize the load average for easier monitoring”

A student, utilizing check_mk for system monitoring, wanted to apply what he had just learned:

To properly evaluate the load average of a Linux system, one must consider the number of available CPUs.

Or simply spoken: you need to divide the reported load average by the number of available CPUs, to determine if your system is currently overloaded or not.

I have discussed this for instance here: The Linux Load Average - and what these numbers tell you

The student in question monitored a bunch of Linux servers with various CPU configurations and wanted to apply the same logic and alarm thresholds for each system.

The solution to this sounds straight forward:

Step #1: Obtain the load average
Step #2: Determine the number of CPUs
Step #3: Divide the first number by the second

Looks like an ideal opportunity for a nerdy-looking one-liner. So let’s go …

How to extract strings by a given search-pattern

The other day I was asked how to extract strings matching a given search-pattern from a file or datastream.

The one who asked had to implement a simple broken-link-checker for a website. And therefore, he wanted to extract all the URLs referenced in this website and then check them for availability.

Another use case could be to extract all IP-addresses from a given file or all timestamps or dates - and only them - from a server’s logfile.

I think you got the point.

As long as we are able to describe the string we are looking for as a regular expression, we can simply extract it with grep.

Oh - yes. You are absolutely right: If we simply search with grep in a file or datastream, we usually get the entire line containing the matching string. (as “grep root /etc/passwd” gives us all lines from /etc/passwd containing the string “root”)

BUT …. did you know the option “-o” of grep, which only prints out the matching strings and not the whole lines?

And exactly this is the little trick I want to point out in this post:

If you use grep to find strings matching a regular expression, you can use the “-o” command-line switch to get only the matching strings instead of the whole lines.

So - that’s all for today - really.

But if - and only if - you are curious and want some kind of examples - read on.

How to combine tools at the Linux command line to unleash its power

The massive power of the Linux command line comes from mainly two things:

First: The powerful command line tools available

There are a ton of command line tools you can find on any Linux system. Tools for every aspect of controlling the Linux system itself as well as tools for managing and processing your data and files in every single way you can think of.

And most of these tools follow a concept called DOTADIW:

DOTADIW: Do one thing and do it well.

So as a simple example, you have for instance the tool "w" for counting something. Or take the tool "sort" that is only used for different ways of sorting data.

But our real-world problems are typically a bit more complex than just counting or sorting something. This leads us to the second reason that makes the Linux command line so powerful …

Second: Flexible ways to combine multiple tools

The Linux command line gives you different ways to connect multiple tools with each other to solve a “bigger” problem. And this is what this article is all about - combining multiple tools.

The two reasons for combining tools at the Linux command line

There are typically two reasons why you need to combine tools at the Linux command line.

The first one is to do some sort of data processing …

such as for instance

you get some data from a file
you extract only the data that is useful for you
you sort the data according to your needs
then you reformat the data
…

I think you’ve got the point: You have a command line that gives you some data, and then you wanna give this data to the next command to process it further:

The second reason for combining tools is to implement some logic …

as for instance to …

restart a process if it isn’t running anymore
send an email if a disk is full
generate a warning if a system isn’t “pingable” anymore
highlight data if it matches a pattern
…

Let’s start our journey with the first reason mentioned. Let’s start with combining Linux command line tools for processing data …

The Linux one-liner of the day - Find the largest file from an entire directory tree

The Linux one-liner of today answers a simple sounding question:

What is the largest file within an entire directory tree?

Answering this is not as straightforward as it may sound at first. But with a clever combination of easy to understand command line tools (aka. with a “one liner”), it’s solvable in seconds.

Are you ready? So let’s go.

Imagine a filled up disk. And imagine you want to free up some space. And for this, you want to start with the largest file on this disk.

The first tool that comes to mind for showing and handling the size of files is the tools “ls”: If you call "ls -l" within a directory, you see the disk space occupied by each individual single file.

I bet you know an output like this …

… where the 5th column shows you the size of every single file. (there are a few files more within this directory, I’m showing only the first seven here)

Now let’s add two command line options to ls - just to see the largest file from this list in a convenient way: The two switches "-h" and "-S".

The first of the two, the "-h" command line switch, gives you the output of the file-sizes in a “human readable” format. That means you no longer have to count the digits showing you the numbers of bytes. Instead you’ll see the sizes in an easy readable way like “1.2M” for instance.

The second command line switch "-S" (that’s an uppercase “S”) does all the “heavy lifting” for sorting the files by their size.

As you see, the largest file in the current directory here is the file “demolog” with a size of 1.2MB. (again - there are a few more files within this directory, I’m showing only the first lines of output here)

But this is only the largest file within the single current directory.

As you can see in the screenshot, there are a few other directories contained within the current directory (“bin” and “project” are shown here). And these directories surely could contain files larger than the largest file we found till now …

So, how can we include the files from these sub-directories into our search too?

How can we find the largest file recursively from an entire directory tree?

The Linux Load Average - and what these numbers tell you

If you wanna talk about the load of a Linux system or if you wanna measure the load of a Linux system for monitoring purposes, you are always talking about a sequence of three decimal values:

The screenshot above was taken from the command “w” on a reasonably busy Linux system and you see the three decimal numbers describing the load average of the system.

But what are these numbers telling you?

In contrast to Windows based systems, where you typically talk about the CPU utilization in percent, the three values of the load-average gives you a great insight into if your system is idle or loaded.

And with the addition of some detail of your system, it is really easy for you to judge the current load.

Is your system overloaded? And if yes, how much more resources are needed to serve the current requirements.

… or maybe your system isn’t overloaded at all, although the numbers seem so at a first glance.

A second useful information you’ll get from these numbers is if you are observing your system during a short lasting load peak only or if the load your system is dealing with is a constant one.

Let’s start with talking about the one who is calculating these three numbers for us: The process scheduler.

Linux One Liner Of The Day - Calculate the top 10 IP addresses from a log file

Sometimes you are faced with a problem at the Linux command line that tries hard to force you to write a script. With perl, with python or just a quick and dirty shell script.

But most of the time, you do not even need to leave the command line to solve the task.

The trick is just to know the right tools and combine them cleverly.

Just as with the example I want to show you here.

And you hear me saying this once again: There are plenty of other ways at the Linux command line to solve this or similar tasks. So take this example to get inspiration for how to tackle tasks like this. And take it as a demonstration of the power that the Linux command line gives you.

That’s said - let’s jump in and have a look at the task we want to solve today:

Calculate the top 10 IP addresses hitting a website based on the web servers log file

… without scripting! Just by using a clever combination of Linux command line tools.

List of files with a search string within the first 15 lines

As part of the “Ask Robert” campaign, I recently got the following question:

“Assuming I have a huge directory of files of different types. How can I - with the help of the shell and awk - get a list of all *.tex and *.txt files within this directory, that contain a certain string within the first N lines?”

(For the sake of readability I slightly edited the wording. Ulrike - I hope you’re ok with this.)

This is a very great task. First, to demonstrate the power of the Linux command-line and second, to give you an understanding of how to tackle these types of problems.

And as always - there are way more ways to get this problem solved.

So take this as an inspiration for similar tasks.

Ok - so let’s start.

This is what we have

many, many files within a directory
some of them are *.txt and *.tex files

And that’s what we want:

The name of every *.txt or *.tex file, that contains a search-phrase within the first N lines

Let’s start with the part that searches the files for a search phrase

Linux one-liner of the day: Normalize the load average

How to extract strings by a given search-pattern

How to combine tools at the Linux command line to unleash its power

First: The powerful command line tools available

Second: Flexible ways to combine multiple tools

The two reasons for combining tools at the Linux command line

The first one is to do some sort of data processing …

The second reason for combining tools is to implement some logic …

The Linux one-liner of the day - Find the largest file from an entire directory tree

The Linux Load Average - and what these numbers tell you

Linux One Liner Of The Day - Calculate the top 10 IP addresses from a log file

List of files with a search string within the first 15 lines

The ShellToolBox 3.0

Free Beginners eBook

On-Demand Course

Master The Linux Filesystem Permissions