The Linux Thread - The Autist's OS of Choice

As someone who has written thousands of lines of Perl I have 2 major dislikes of Python: 1. whitespace, although modern editors make that slightly less annoying. 2. Regex not being a first class operator.
The whitespace is a bit of a quirk but not something too bad.
I see no point in regex being first class since you really just pass it as a string to a regex matcher. Python's re is confusing to me though to use. I don't even think re.compile optimizes the finite automaton used for matching the regex? Well that is what I assumed was used to match regex based on my CS theory class briefly.
I'm also not a fan of there being different regex "flavors" (listed at https://regex101.com/) with different matching characters.
 
Probably because I am retarded, but the little I dabbled into Awk I couldn't quite wrap my head around it. What makes it so great?
I was going to write my own reasons, but this blog post summarizes it better than I can.

Why Learn AWK? | Jonathan Palardy's Blog​


Jonathan Palardy

5-7 minutes



The Wayback Machine - https://web.archive.org/web/20220609023812/https://blog.jpalardy.com/posts/why-learn-awk/

Because of the arcane syntax?
Because other languages can’t do the job?
No.
I resisted AWK for a long time. Couldn’t I already do everything I needed with sed and grep? I felt that anything more complex should be done with a “real” language. AWK seemed like yet-another-thing to learn, with marginal benefits.

Why Learn AWK?​

Let me count the ways.

You are working TOO HARD​

Too many times, I’ve seen people working way too hard at the command-line trying to solve simple tasks.
Imagine programming without regular expressions.
Can you even imagine the alternative? Would it entail building FSMs from scratch? Would it be easy to program? Would it be fun? Would it work the way you want?
That’s life without AWK.
For simple tasks (“only print column 3” or “sum the numbers from column 2”) almost falling in the “grep-and-sed” category, but where you feel you might need to open a man page, AWK is usually the solution.
And if you think that creating a new script file (column_3.py or sum_col_2.rb), putting it somewhere on disk, and invoking it on your data isn’t bad – I’m telling you that you’re working too hard.

Available EVERYWHERE​

On Linux, BSD, or Mac OS, AWK is already available. It is required by any POSIX-compliant OS.
More importantly, it will be the AWK you know. It has been around for a long time, and the way it works is stable. Any upgrade would not (could not) break your scripts – it’s the closest thing to “it just works”.
Contrast with BASH or Python … do you have the right version? Does it have all the language features you need? Is it backward and forward compatible?
When I write a script in AWK, I know 2 things:
  • AWK is going to be anywhere I deploy
  • it’s going to work

Scope​

You shouldn’t write anything complicated in AWK. That’s a feature – it limits what you’re going to attempt with the language. You are not going to write a web server in AWK, and you know it wouldn’t be a good idea.
There’s something refreshing about knowing that you’re not going to import a library (let alone a framework), and worry about dependencies.
You’re writing an AWK script, and you’re going to focus on what AWK is good at.

Language Features​

Do you want the following? (especially compared to BASH)
  • hashes
  • floating-point numbers
  • modern (i.e. Perl) regular expressions
It’s all there, ready to go. Don’t worry about the version number, the bolted-on syntax, or the dependence on other tools.

Convenience: minimized bureaucracy​

In a script sandwich, your logic is the “meat”, and the surrounding bureaucracy is the “bread”. In practice, bureaucracy means:
  • opening and closing files
  • iterating over each line of each file
  • parsing or breaking a line into fields
These things are needed but they aren’t what your script is about. AWK takes care of all that, your code is implicitly surrounded by a loop that’s going to iterate over every input line.
DISCLAIMER: This isn’t AWK, it’s JavaScript. It might as well be pseudocode. All code simplified and for illustrative purposes only.
// open each file, assign content to "lines"
lines.forEach(function (line) {
// the code you write goes here
});
// close all the files
AWK is going to break each line into “fields” or “columns” – for many people, that feature is the main reason to use AWK. By default, AWK breaks a line into fields based on whitespace (i.e. /\s+/) and ignores leading or trailing whitespace.
Also, AWK is automatically going to set a bunch of useful variables for you:
  • NF – how many fields in the current line
  • NR – what the current line number is
  • $1, $2, $3 .. $9 .. – the value of each field on the current line
  • $0 – the content of the current line
  • and more
// open each file, assign content to "lines"
var NR = 0;
lines.forEach(function (line) {
NR = NR + 1;
var fields = line.trim().split(/\s+/);
var NF = fields.length;
// the code you write goes here
});
// close all the files

Convenience: automatic conversions​

AWK does automatic string-to-number conversions. That’s something terrible in “real” programming languages, but very convenient within the scope of the things you should attempt with AWK.

Convenience: automatic variables​

Variables are automatically created when first used; you don’t need to declare variables.
a++
Let’s unpack it:
  • the variable a is created
  • using ++ treats it as a number – a is initialized to 0
  • the ++ operator increments it
It’s even more useful with hashes:
things[$1]++
  • things is created, as a hash
  • using dynamic key $1, a value is initialized to 0 (implicit in ++ use)
  • the ++ operator increments it

Convenience: built-in functions​

AWK has a bunch of numeric and string functions at your disposal.

AWK is PERFECT*​

AWK is PERFECT when you use it for what it’s meant to do:
  • very powerful one-liners
  • (or) short AND portable scripts

Now What?​

Maybe I’ve convinced you to reconsider AWK: good.
How do you learn AWK?
There are many possibilities:
In my next post, I’ll explain everything you need to get you started with AWK.
 
Probably because I am retarded, but the little I dabbled into Awk I couldn't quite wrap my head around it. What makes it so great?
Because it's an amazing tool for gluing the output of one program into another, so you can automate the hell out of things. Example: run nmap, grep for discovered open ports, and automatically spit the hosts out just as a list of IPs (fuck knows why that isn't already a feature in nmap).

nmap -iR 100 -p 80 -vv |grep Discovered | awk {'print $6'}

Cuts out all the other output from the program and just leaves you with what you want to use. A list of IPs is then way easier to feed into...whatever you were going to use it for. If two command line programs don't know how to talk to each other, you can likely get them doing it with awk.
 
Because it's an amazing tool for gluing the output of one program into another, so you can automate the hell out of things. Example: run nmap, grep for discovered open ports, and automatically spit the hosts out just as a list of IPs (fuck knows why that isn't already a feature in nmap).

nmap -iR 100 -p 80 -vv |grep Discovered | awk {'print $6'}

Cuts out all the other output from the program and just leaves you with what you want to use. A list of IPs is then way easier to feed into...whatever you were going to use it for. If two command line programs don't know how to talk to each other, you can likely get them doing it with awk.
I use Sed in a similar manner, with nmap but to get all of the open ports of a single target IP outputted to stdout.

(assuming 10.10.10.10 being the IP)
nmap -sT -p- --min-rate 5000 -oG all 10.10.10.10
grep -oP ' [\d]{1,5}/' all | sed 's/[ /]//g' | tr '\n' ','

Then I just copy the ports separated by commas and drop them in the more details, scripted nmap scan (after the -p flag).

nmap -sV -sC -p 21,23,25,80,135,139,443,445,3289,8080 -oN TCP 10.10.10.10

Full 65535 TCP port scan with safe and standard scripts in under 5 minutes.
 
Did anyone notice a decrease in 3D performance in Debian a while back? Idk if it was when I switched to Bullseye but I'm definitely getting worse performance than I used to. It could be aging hardware to be fair, I just wanted to see if anyone else was experiencing the same thing or if I was going crazy.

(I have to come clean and admit I'm using an NVidia card)
 
Did anyone notice a decrease in 3D performance in Debian a while back? Idk if it was when I switched to Bullseye but I'm definitely getting worse performance than I used to. It could be aging hardware to be fair, I just wanted to see if anyone else was experiencing the same thing or if I was going crazy.

(I have to come clean and admit I'm using an NVidia card)
I'm thinking power management because with Linux it's always power management. Could be on nvidia, cpu or even pcie.
 
  • Thunk-Provoking
Reactions: Friendly Primarina
Probably because I am retarded, but the little I dabbled into Awk I couldn't quite wrap my head around it. What makes it so great?
Adding to what @no-exit said, you can just lazily memorize(ish) a few one liners initially. Awk can swap/select columns more easily than other tools which is extremely useful. Cut isn't as reliable. You can save time using a googled/written awk one liner rather than piping grep, cut, sed, and others together.
but real chads use perl one liners.
 
Python: 1. whitespace
I don't think that's a "coming from perl" thing but just a "normal person" thing.

Much has been said but all these "old" tools like awk etc. can replace 99% of the rats nests of python scripts (pulling in about a million libraries) people like to write. Most of everything my computer does is just sh scripts gluing a variation of these tools together, sometimes doing simple flow logic or holding of variables, doing everything from regular system maintenance to checking websites for me if a product is available back in stock and messaging me if it is. You don't really need more most of the time. The day you understood that you've reached enlightenment.

Wonder what that organization he left to is. Dude's a snarky little shit and has a talent for making people dislike him personally for things he didn't even 100% do by himself.
 
Much has been said but all these "old" tools like awk etc. can replace 99% of the rats nests of python scripts (pulling in about a million libraries) people like to write. Most of everything my computer does is just sh scripts gluing a variation of these tools together, sometimes doing simple flow logic or holding of variables, doing everything from regular system maintenance to checking websites for me if a product is available back in stock and messaging me if it is. You don't really need more most of the time. The day you understood that you've reached enlightenment.
The one other tool which is not "old" that hasn't been mentioned here recently is "jq" which is just awk/grep/sed for JSON. Which is quite handy for dealing with stupid (modern) web stuff when you don't feel like Python.

There's also an "xq" and "yq" for XML and YAML, although they're not typically included by default.
 
"More importantly, it will be the AWK you know."

Is that even true? Linux would be using GNU AWK (gawk) while BSD will be the POSIX awk. https://unix.stackexchange.com/questions/29576/difference-between-gawk-vs-awk
This isn't the first time I've come across GNU or Linux shit breaking POSIX compatibility. but I don't have the mental fortitude to keep track of it anyway. Maybe there's a shell script linter for it somewhere out there. Bashisms are another can of worms, with regular POSIX shell being a pain in the ass void of features.

The one other tool which is not "old" that hasn't been mentioned here recently is "jq" which is just awk/grep/sed for JSON. Which is quite handy for dealing with stupid (modern) web stuff when you don't feel like Python.

There's also an "xq" and "yq" for XML and YAML, although they're not typically included by default.
jq is great and the more data formats have these useful little CLI programs to parse them, the better. Made a script not too long ago that pipes a downloaded JSON into jq, runs it through some simple filters, outputs the result as a CSV and puts that into a SQLite table via sqlite3 in CSV mode. sqlite's CLI tool has some strange automation quirks, but nothing too bad.
 
but real chads use perl one liners.
Whenever sed falls short for the task I'm doing, Perl's always there to save the day. If I want to search and replace a paragraph or a block Perl can do the job.
Bash:
perl -i -pe 'BEGIN{undef $/;} s/( +\{\n.*)Chuck((.*\n)* +})/$1Sneed$2/g;' chuck.cpp > sneed.cpp

Turns
C++:
int main() {
  {
    Chuck d;
    d.sitAndChill();
  }
  return 0;
}
into
C++:
int main() {
  {
    Sneed d;
    d.sitAndChill();
  }
  return 0;
}

There's also an "xq" and "yq" for XML and YAML, although they're not typically included by default.
My recommendation for XML is xmllint as it has support for XPath 2.0 (which has many of the functionalities of jq but it has its own syntax) and is included in many distros alongside libxml2. As for YAML the Golang version of yq is my goto, almost everything jq does is supported and it runs really quick, there's another yq but its done in Python and converts all input to jq so it's gonna be slow as heck.
 
Back