Use python in shell, as if it were awk - python

Say I want to print 1 + 1 on stdout (i.e. one-liner coding).
With awk I could do it simply with:
$ echo | awk '{print 1+1}'
2
How to do this with python?

you are looking for -c:
$ python -c 'print 1 + 1'
2

As ifLoop pointed out, what you're looking for here is -c.
But, as yo've discovered, python -c often isn't as useful as the corresponding awk (or sed or bash or perl or even ruby) for one-liners.
Python is explicitly designed to value readability over brevity and explicitness over implicitness (along with some correlated tradeoffs, like vocabulary over syntax, as little magic as possible, etc.). See the Zen of Python. There are intentional limits to what you can cram onto one line, and things like looping over stdin and/or command-line args have to be done explicitly with, e.g., sys.stdin and sys.argv, or fileinput.input().
That means that some very trivial scripts become less trivial to write in Python, but that's considered a good tradeoff for making even moderately non-trivial scripts easier to write, maintain, and understand.
The core developers understand this means you can't rewrite a lot of one-liners in Python. And if you asked them, most of them will ask why that's a problem at all.
If you know how to write something as a one-liner in a language like sed or awk, then you should be writing it as a one-liner in sed or awk. Those are perfectly good languages that are handy for all kinds of simple tasks, and there's no reason to avoid them just because Python is also a good language.
If you can't figure your way through the syntax to write that one-liner… well, it probably shouldn't be a one-liner. The only reason you want to try it in Python is that Python is generally easier to write and read, and the same reasons that's true are the same reasons Python won't let you write what you want without 3 lines. So just write the 3 lines. Python is great for that.
So, what you often really want is not -c, but a heredoc, or just a separate script that you run like any other program, or awk or perl instead of python.

Inspired by the answer by IfLoop I wondered about the handy BEGIN and END blocks in awk. I have found the pawk module
ls -l | awk 'BEGIN {c = 0} {c += $5} END {print c}'
ls -l | pawk -s -B 'c = 0' -E 'c' 'c += int(f[4])'
Looks promising, but I have never tried this (yet)

if you're in the shell, the next would basic integer math
echo $((1+1))
echo $(( 100 / 5 ))
etc...
for floating point, yes, you should to use awk, or bc, or dc, or any other language what knows floating point math...
also, read this: https://stackoverflow.com/a/450853/632407

Like SingleNegationElimination's answer recommend's, the -c flag is the right tool for the job.
Here are some examples using awk, ruby, and python:
echo foo bar baz | awk '{ split($0, arr, " "); print arr[NF] }'
baz
echo foo bar baz | ruby -e 'puts STDIN.read.chomp.split(" ")[-1]'
baz
echo foo bar baz | python -c 'import sys; print sys.stdin.read().rstrip().split(" ")[-1]'
baz

Related

Passing path name with asterisk in bash as argument for python script [duplicate]

This question already has answers here:
How can I store a command in a variable in a shell script?
(12 answers)
Closed 4 years ago.
These work as advertised:
grep -ir 'hello world' .
grep -ir hello\ world .
These don't:
argumentString1="-ir 'hello world'"
argumentString2="-ir hello\\ world"
grep $argumentString1 .
grep $argumentString2 .
Despite 'hello world' being enclosed by quotes in the second example, grep interprets 'hello (and hello\) as one argument and world' (and world) as another, which means that, in this case, 'hello will be the search pattern and world' will be the search path.
Again, this only happens when the arguments are expanded from the argumentString variables. grep properly interprets 'hello world' (and hello\ world) as a single argument in the first example.
Can anyone explain why this is? Is there a proper way to expand a string variable that will preserve the syntax of each character such that it is correctly interpreted by shell commands?
Why
When the string is expanded, it is split into words, but it is not re-evaluated to find special characters such as quotes or dollar signs or ... This is the way the shell has 'always' behaved, since the Bourne shell back in 1978 or thereabouts.
Fix
In bash, use an array to hold the arguments:
argumentArray=(-ir 'hello world')
grep "${argumentArray[#]}" .
Or, if brave/foolhardy, use eval:
argumentString="-ir 'hello world'"
eval "grep $argumentString ."
On the other hand, discretion is often the better part of valour, and working with eval is a place where discretion is better than bravery. If you are not completely in control of the string that is eval'd (if there's any user input in the command string that has not been rigorously validated), then you are opening yourself to potentially serious problems.
Note that the sequence of expansions for Bash is described in Shell Expansions in the GNU Bash manual. Note in particular sections 3.5.3 Shell Parameter Expansion, 3.5.7 Word Splitting, and 3.5.9 Quote Removal.
When you put quote characters into variables, they just become plain literals (see http://mywiki.wooledge.org/BashFAQ/050; thanks #tripleee for pointing out this link)
Instead, try using an array to pass your arguments:
argumentString=(-ir 'hello world')
grep "${argumentString[#]}" .
In looking at this and related questions, I'm surprised that no one brought up using an explicit subshell. For bash, and other modern shells, you can execute a command line explicitly. In bash, it requires the -c option.
argumentString="-ir 'hello world'"
bash -c "grep $argumentString ."
Works exactly as original questioner desired. There are two restrictions to this technique:
You can only use single quotes within the command or argument strings.
Only exported environment variables will be available to the command
Also, this technique handles redirection and piping, and other shellisms work as well. You also can use bash internal commands as well as any other command that works at the command line, because you are essentially asking a subshell bash to interpret it directly as a command line. Here's a more complex example, a somewhat gratuitously complex ls -l variant.
cmd="prefix=`pwd` && ls | xargs -n 1 echo \'In $prefix:\'"
bash -c "$cmd"
I have built command processors both this way and with parameter arrays. Generally, this way is much easier to write and debug, and it's trivial to echo the command you are executing. OTOH, param arrays work nicely when you really do have abstract arrays of parameters, as opposed to just wanting a simple command variant.

How to replace Perl one-liner regex with Python one-liner?

I work on a project where Perl is not used and I would like to maintain consistency. That's why I'm wondering if I can easily replace this handy Perl one-liner with Python one-liner:
perl -pe 's/pattern/replacement/g' <<< 'expression'
This program reads from STDIN a line at a time, replaces all matches of regular expression pattern with the string replacement, and outputs the (possibly) modified line to STDOUT.
You can run re.sub with the -c command line option, but it won't be as pretty as the perl one:
python -c 'import re;print(re.sub(r"<pattern>", "<replacement>", "<string>"))'
If you want to get input from STDIN as well, you need sys.stdin and that also means import-ing sys:
python -c 'import re,sys;print(re.sub(r"<pattern>", "<replacement>", sys.stdin.read()))' <<< '<string>'
So, for example:
% python -c 'import re;print(re.sub(r"foo", "bar", "foobar"))'
barbar
% python -c 'import re,sys;print(re.sub(r"foo", "bar", sys.stdin.read()))' <<< 'foobar'
barbar
A more modern replacement for Perl one liner would be Ruby, which is said by some to be Perl 6. For example:
ruby -pe 'gsub /[es]/, "X"' <<< 'expression'
Python, due to its reliance on indentation, is not suitable for one liner, especially when you need to use "if ... else ..." and such in the code.

Trying to pass python variables inside sh script [duplicate]

This question already has answers here:
How can I store a command in a variable in a shell script?
(12 answers)
Closed 4 years ago.
These work as advertised:
grep -ir 'hello world' .
grep -ir hello\ world .
These don't:
argumentString1="-ir 'hello world'"
argumentString2="-ir hello\\ world"
grep $argumentString1 .
grep $argumentString2 .
Despite 'hello world' being enclosed by quotes in the second example, grep interprets 'hello (and hello\) as one argument and world' (and world) as another, which means that, in this case, 'hello will be the search pattern and world' will be the search path.
Again, this only happens when the arguments are expanded from the argumentString variables. grep properly interprets 'hello world' (and hello\ world) as a single argument in the first example.
Can anyone explain why this is? Is there a proper way to expand a string variable that will preserve the syntax of each character such that it is correctly interpreted by shell commands?
Why
When the string is expanded, it is split into words, but it is not re-evaluated to find special characters such as quotes or dollar signs or ... This is the way the shell has 'always' behaved, since the Bourne shell back in 1978 or thereabouts.
Fix
In bash, use an array to hold the arguments:
argumentArray=(-ir 'hello world')
grep "${argumentArray[#]}" .
Or, if brave/foolhardy, use eval:
argumentString="-ir 'hello world'"
eval "grep $argumentString ."
On the other hand, discretion is often the better part of valour, and working with eval is a place where discretion is better than bravery. If you are not completely in control of the string that is eval'd (if there's any user input in the command string that has not been rigorously validated), then you are opening yourself to potentially serious problems.
Note that the sequence of expansions for Bash is described in Shell Expansions in the GNU Bash manual. Note in particular sections 3.5.3 Shell Parameter Expansion, 3.5.7 Word Splitting, and 3.5.9 Quote Removal.
When you put quote characters into variables, they just become plain literals (see http://mywiki.wooledge.org/BashFAQ/050; thanks #tripleee for pointing out this link)
Instead, try using an array to pass your arguments:
argumentString=(-ir 'hello world')
grep "${argumentString[#]}" .
In looking at this and related questions, I'm surprised that no one brought up using an explicit subshell. For bash, and other modern shells, you can execute a command line explicitly. In bash, it requires the -c option.
argumentString="-ir 'hello world'"
bash -c "grep $argumentString ."
Works exactly as original questioner desired. There are two restrictions to this technique:
You can only use single quotes within the command or argument strings.
Only exported environment variables will be available to the command
Also, this technique handles redirection and piping, and other shellisms work as well. You also can use bash internal commands as well as any other command that works at the command line, because you are essentially asking a subshell bash to interpret it directly as a command line. Here's a more complex example, a somewhat gratuitously complex ls -l variant.
cmd="prefix=`pwd` && ls | xargs -n 1 echo \'In $prefix:\'"
bash -c "$cmd"
I have built command processors both this way and with parameter arrays. Generally, this way is much easier to write and debug, and it's trivial to echo the command you are executing. OTOH, param arrays work nicely when you really do have abstract arrays of parameters, as opposed to just wanting a simple command variant.

Fastest way to replace space with an unused character and add space in between all characters

What is a fast way to:
Replace space with an unused unicode character.
Add spaces in between all characters
I've tried:
$ python3 -c "print (open('test.txt').read().replace(' ', u'\uE000').replace('', ' '))" > test.spaced.txt
But when I tried it on a 6GB textfile with 90 Million lines, it's really slow.
Simply reading the file after opening it takes really long:
$ time python3 -c "print (open('test.txt').read())"
Assume that my machine has more than enough RAM to handle the inflated file,
Is there a way to do it with sed / awk / bash tools?
Is there a faster way to do the replacement and addition faster in Python?
I believe, using the tools specially designed for text processing is faster than invoking a script written in a general-purpose interpreted language such as Python.
SED doesn't support Unicode escape sequences, but it is possible to pass the actual characters using command substitution:
sed -i -e "s/ /$(printf '\uE000')/g; s/\(.\)/ \1 /g" file
Perl is my favorite, because it is very flexible. It is also much better for text processing than Python:
The Perl languages borrow features from other programming languages
including C, shell script (sh), AWK, and sed... They provide
powerful text processing facilities without the arbitrary data-length
limits of many contemporary Unix commandline tools,... facilitating
easy manipulation of text files.
(from Wikipedia)
Example:
perl -CSDL -p -i -e 's/ /\x{E000}/g ; s/(.)/ \1 /g' file
Note, the -CSDL option enables UTF-8 for the output.
There is also an AWKward way of doing this using GNU AWK version 4.1.0 or newer:
gawk -i inplace '{
a = gsub(/ /, "\xee\x80\x80");
a = gensub(/(.)/, " \\1 ", "g");
print a; }' file
But I wouldn't recommend for obvious reasons.
I doubt that anyone would claim that a specific tool, or algorithm is the fastest one, as there are plenty of factors that may affect the performance, - hardware, the way the tools are compiled, tool versions, the kernel version, etc. Perhaps, the best way to find the right tool, or algorithm is to benchmark. I don't think it necessary to mention the time command.

Regex and grep exception matching

I tested my regex for matching exceptions in a log file :
http://gskinner.com/RegExr/
Regex is :
.+Exception[^\n]+(\s+at.++)+
And it works for couple of cases I pasted here, but not when I'm using it with grep :
grep '.+Exception[^\n]+(\s+at.++)+' server.log
Does grep needs some extra flags to make it work wit regex ?
Update:
It doesn't have to be regex, I'm looking for anything that will print exceptions.
Not all versions of grep understand the same syntax.
Your pattern contains a + for 1 or more repeats, which means it is in egrep territory.
But it also has \s for white space, which most versions of grep are ignorant of.
Finally, you have ++ to mean a possessive match of the preceding atom, which only fairly sophisticated regex engines understand. You might try a non-possessive match.
However, you don’t need a leading .+, so you can jump right to the string you want. Also, I don’t see why you would use [^\n] since that’s what . normally means, and because you’re operating in line mode already anyways.
If you have grep -P, you might try that. I’m using a simpler but equivalent version of your pattern; you aren’t using an option to grep that gives only the exact match, so I assume you want the whole record:
$ grep -P 'Exception.+\sat' server.log
But if that doesn’t work, you can always bring out the big guns:
$ perl -ne 'print if /Exception.+\sat/' server.log
And if you want just the exact match, you could use
$ perl -nle 'print $& if /Exception.*\bat\b.*/' server.log
That should give you enough variations to play with.
I don’t understand why people use web-based “regex” builders when they can just do the same on the command line with existing tools, since that way they can be absolutely certain the patterns they devise will work with those tools.
You need to pass it the -e <regex> option and if you want to use the extended regex -E -e <regex> . Take a look at the man: man grep
It looks like you're trying to find lines that look something like:
... Exception foobar at line 7 ...
So first, to use regular expressions, you have to use -e with grep, or you can just run egrep.
Next, you don't really have to specify the .+ at the start of the expression. It's usually best to minimize what you're searching for. If it's imperative that there is at least one character before "Exception", then just use ..
Also, \s is a perl-ish way of asking for a space. grep uses POSIX regex, so the equivalent is [[:space:]].
So, I would use:
grep -e 'Exception.*[[:space:]]at'
This would get what you want with the least amount of muss and fuss.

Categories