Your first steps with regular expressions - the essentials

When you search for something at the Linux command line, you’ll typically come across two different types of search patterns.

First, there are the ones you probably already use intuitively for file operations, like in cp *.pdf /tmp - which means “copy all files ending with .pdf to /tmp”. These patterns are known as filename globbing. You use them typically, if you want to address multiple files at once on the command line.

The second type of pattern you’ll come across are the regular expressions. At first glance, regular expressions might look similar to filename globbing, but they operate very differently. And more importantly, they’re incredibly powerful for searching.

For example, let’s look at a regular expression for matching email addresses:

\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b

And here’s one for matching any possible IP address:

\b((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\b

A very prominent tool you will typically use in combination with regular expressions is grep. Therefore I’ll use it here for doing the illustrations. (For an introduction to using grep see Searching with grep: The Essentials).

But over the time you’ll find that many other Linux command-line tools, such as find, sed and awk, just to name a few, can also leverage the power of regular expressions.

From the examples above you can guess that regular expressions can quickly become quite complex. But don’t worry - as a starting point you can achieve quite a lot with just a few basics. And these basics are typically referred to as “Basic Regular Expressions (BRE)” or simply standard regular expressions.

I typically use the phrase “standard regular expressions” in courses and conversations as a contrast to the much more complex “Extended Regular Expressions (ERE)”

The most important patterns of standard regular expressions

Let’s dive into the use of standard regular expressions with an example:

Let’s say you want to search through the file “/etc/passwd” (this file contains the locally defined users of a system) for the user named “max”.

To do this search with the command grep, you first need to know about the layout of the file, which is really straight forward: Every single line describes the properties of a user in seven fields. And these fields are separated by colons (“:”).

Here are a few possible lines from “/etc/passwd” for illustration.

...
tux:x:1099:1099:Tux:/home/tux:/bin/bash
max:x:1100:1100:Max:/home/max:/bin/bash
test1:x:1101:1101:Testuser 1:/home/max:/bin/false
...

Search for something at the beginning of a line

If we are now searching for the defined user “max”, we could simply try the following:

grep max /etc/passwd

But because the user “test1” has a defined home directory that contains the phrase “max” too (have a look into the 6th field), we need to specify what exactly we are looking for: The phrase “max” exactly at the beginning of a line (the first field contains the username), followed by a colon.

Continue reading »

cat - This multitool can help you more than you think

When you think about essential Linux command-line tools, cat is probably one of the first that comes into mind. Short for “concatenate”, this seemingly simple command is part of the basic equipment of every Linux user’s toolkit. Most commonly, it’s used to display the contents of a file in the terminal - a quick and easy way to peek inside a document.

But if you think that’s all cat is good for - you’re in for a surprise :-)

cat has a few tricks up its sleeve, that can help you to streamline your workflow: From transforming data to merging files or creating new ones - cat definitively deserves a closer look.

And along the way, I promise, we will stumble upon one or the other interesting tool or concept too …

Let’s start simple and understand the basic way cat is working

If you start cat on the command line without any additional parameter, then you will “lose your prompt”: You’ll have only a cursor on a blank line as a sign that you can enter some text:

Now if you enter some text and finish your input-line by hitting “<enter>”, then cat will immediately repeat the line you just typed in:

After that - again an empty line with a lonely cursor. Now you can enter the next line, which will also be repeated and so on. (you can stop the cat command in this state at any time by simply hitting <ctrl>+c)

what we just observed is exactly the way how cat works:

  • It reads in data line by line from its input datastream, which is by default bound to the terminal - and therefore to your keyboard.
  • The output of cat then goes to its output datastream, that is in this simple example bound to the terminal.

For illustration: This is part of a screenshot taken from the video I linked below: On the left-hand side I’ve tried to draw a keyboard, on the right-hand side a terminal (such an artist I am … :))

Continue reading »

Searching with grep: The Essentials

If I had to name the most important and useful tools for the Linux command line, then grep would definitely be within the top 5.

The tool “grep” is the powerful workhorse every time you need to search for content - whether in a file or a datastream.

  • You are searching for a phrase within a file? use grep!
  • You are searching for all the files containing a phrase? use grep!
  • You want to filter the looong output of a command for relevant information? use grep!
  • You want to extract strings from a file based on a given pattern? again: use grep!

For the last one I already created an article here: How to extract strings by a given search-pattern

In this post I wanna focus on the essentials of using grep and how it can become an invaluable part of your daily toolkit.

The basics of using grep

There are typically two ways to use grep at the command line: with or without a filename as a parameter. When you provide a filename, grep obviously searches within that file. When you don’t, grep searches within its input data stream, which will typically be the output of another command.

example 1: search within a file

robert@demo:~$ grep robert /etc/passwd
robert:x:1003:1003::/home/robert:/bin/bash

In this example, I search for the phrase “robert” in the file “/etc/passwd”.

example 2: search within a data stream

robert@demo:~$ ps aux | grep "^postfix"
postfix   2268  0.0  0.5  90516  5284 ?        S     2022   0:32 qmgr -l -t unix -u
postfix  22675  0.0  0.6  90440  6764 ?        S    07:50   0:00 pickup -l -t unix -u

Here I take the output of the command “ps aux” and filter only for lines starting with the phrase “postfix”.

As you can see, the typical behavior of grep is to search the data line by line and to print out the entire line if it is matched by the given search pattern.

Hint: Do you wonder about the leading “^” in the search-phrase “^postfix”? This is a special character in “regular expressions” to mark the start of a line. This tells grep to only match lines that begin with “postfix”. If you want to learn more, see Your first steps with regular expressions - the essentials

The two ways of using grep

So if we want to have a simple formula, the typical usage of grep is one of these two:

first: To search within a file, give the filename as a parameter:

or second: To search in the output of a command (aka a “datastream”), push this datastream via the pipe sign to the grep command,

Where the thing I wrote within square brackets (“[<options>]”) is - uhm - optional. So - beside the searchpattern and perhaps the filename - there is no need to give any additional parameters to grep.

But a few parameters are really useful to know, so I will cover them here:

Continue reading »

Book Launch 2024, my try to boost it, and giveaways

Sometimes more is better. And now it is one of those times.

… and I would like to ask you a favor.

Today is August 13th, and the paperback-version of the new ShellToolBox 3.0 is officially out and available on Amazon!

This new version is a complete overhaul of the previous version: Even better explanations and examples, added tools and - last but not least - an added tool index at the end (an option that was often asked for).

But to get some grip on Amazon within the crowded book space (with estimated thousands of new books launched every single day), my plan is to boost the sales in the first week after the launch, which involves giving away great goodies.

And this is where the favor I am asking of you comes into play …

Continue reading »

Linux one-liner of the day: Normalize the load average

During a Linux class, the following question came up:

“How to normalize the load average for easier monitoring”

A student, utilizing check_mk for system monitoring, wanted to apply what he had just learned:

To properly evaluate the load average of a Linux system, one must consider the number of available CPUs.

Or simply spoken: you need to divide the reported load average by the number of available CPUs, to determine if your system is currently overloaded or not.

I have discussed this for instance here: The Linux Load Average - and what these numbers tell you

The student in question monitored a bunch of Linux servers with various CPU configurations and wanted to apply the same logic and alarm thresholds for each system.

The solution to this sounds straight forward:

  • Step #1: Obtain the load average
  • Step #2: Determine the number of CPUs
  • Step #3: Divide the first number by the second

Looks like an ideal opportunity for a nerdy-looking one-liner. So let’s go …

Continue reading »

How to extract strings by a given search-pattern

The other day I was asked how to extract strings matching a given search-pattern from a file or datastream.

The one who asked had to implement a simple broken-link-checker for a website. And therefore, he wanted to extract all the URLs referenced in this website and then check them for availability.

Another use case could be to extract all IP-addresses from a given file or all timestamps or dates - and only them - from a server’s logfile.

I think you got the point.

As long as we are able to describe the string we are looking for as a regular expression, we can simply extract it with grep.

Oh - yes. You are absolutely right: If we simply search with grep in a file or datastream, we usually get the entire line containing the matching string. (as “grep root /etc/passwd” gives us all lines from /etc/passwd containing the string “root”)

BUT …. did you know the option “-o” of grep, which only prints out the matching strings and not the whole lines?

And exactly this is the little trick I want to point out in this post:

If you use grep to find strings matching a regular expression, you can use the “-o” command-line switch to get only the matching strings instead of the whole lines.

So - that’s all for today - really.

But if - and only if - you are curious and want some kind of examples - read on.

Continue reading »

How to combine tools at the Linux command line to unleash its power

The massive power of the Linux command line comes from mainly two things:

First: The powerful command line tools available

There are a ton of command line tools you can find on any Linux system. Tools for every aspect of controlling the Linux system itself as well as tools for managing and processing your data and files in every single way you can think of.

And most of these tools follow a concept called DOTADIW:

DOTADIW: Do one thing and do it well.

So as a simple example, you have for instance the tool "w" for counting something. Or take the tool "sort" that is only used for different ways of sorting data.

But our real-world problems are typically a bit more complex than just counting or sorting something. This leads us to the second reason that makes the Linux command line so powerful …

Second: Flexible ways to combine multiple tools

The Linux command line gives you different ways to connect multiple tools with each other to solve a “bigger” problem. And this is what this article is all about - combining multiple tools.

The two reasons for combining tools at the Linux command line

There are typically two reasons why you need to combine tools at the Linux command line.

The first one is to do some sort of data processing …

such as for instance

  • you get some data from a file
  • you extract only the data that is useful for you
  • you sort the data according to your needs
  • then you reformat the data

I think you’ve got the point: You have a command line that gives you some data, and then you wanna give this data to the next command to process it further:

The second reason for combining tools is to implement some logic …

as for instance to …

  • restart a process if it isn’t running anymore
  • send an email if a disk is full
  • generate a warning if a system isn’t “pingable” anymore
  • highlight data if it matches a pattern

Let’s start our journey with the first reason mentioned. Let’s start with combining Linux command line tools for processing data …

Continue reading »