Convert Perl syntax to Python [duplicate] - python

This question already has an answer here:
How to create a dict equivalent in Python from Perl hash?
(1 answer)
Closed 5 years ago.
I have a file with lines which are separated by white spaces.
I have written the program below in Perl and it works.
Now I must rewrite it in Python which is not my language, but I have solved it more or less.
I currently struggle with this expression in Perl which I can't convert it to Python.
$hash{$prefix}++;
I have found some solutions but I'm not sufficiently experienced with Python to solve this. All the solution looks complicated to me compared to the Perl one.
These Stack Overflow questions seem to be relevant.
Python variables as keys to dict
Python: How to pass key to a dictionary from the variable of a function?
Perl
#!perl -w
use strict;
use warnings FATAL => 'all';
our $line = "";
our #line = "";
our $prefix = "";
our %hash;
our $key;
while ( $line = <STDIN> ) {
# NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
next if $line =~ /^NAMESPACE/;
#aleks-event-test redis-1-m06k0 1/1 Running 0 1d 172.26.0.138 The_Server_name
#line = split ' ', $line;
$line[1] =~ /(.*)-\d+-\w+$/ ;
$prefix = $1;
#print "$prefix $line[7]\n";
print "$prefix $line[7]\n";
$hash{$prefix}++;
}
foreach $key ( keys %hash ) {
if ( $hash{$key} / 2 ){
print "$key : $hash{$key} mod 2 \n"
}
else {
print "$key : $hash{$key} not mod 2 \n"
}
}
Python
#!python
import sys
import re
myhash = {}
for line in sys.stdin:
"""
Diese Projekte werden ignoriert
"""
if re.match('^NAMESPACE|logging|default',line):
continue
linesplited = line.split()
prefix = re.split('(.*)(-\d+)?-\w+$',linesplited[1])
#print linesplited[1]
print prefix[1]
myhash[prefix[1]] += 1

Your problem is using this line:
myhash = {}
# ... code ...
myhash[prefix[1]] += 1
You likely are getting a KeyError. This is because you start off with an empty dictionary (or hash), and if you attempt to reference a key that doesn't exist yet, Python will raise an exception.
A simple solution that will let your script work is to use a defaultdict, which will auto-initialize any key-value pair you attempt to access.
#!python
import sys
import re
from collections import defaultdict
# Since you're keeping counts, we'll initialize this so that the values
# of the dictionary are `int` and will default to 0
myhash = defaultdict(int)
for line in sys.stdin:
"""
Diese Projekte werden ignoriert
"""
if re.match('^NAMESPACE|logging|default',line):
continue
linesplited = line.split()
prefix = re.split('(.*)(-\d+)?-\w+$',linesplited[1])
#print linesplited[1]
print prefix[1]
myhash[prefix[1]] += 1

Related

Python regex pattern in order to find if a code line is finishing with a space or tab character

Sorry for putting such a low level question but I really tried to look for the answer before coming here...
Basically I have a script which is searching inside .py files and reads line by line there code -> the object of the script is to find if a line is finishing with a space or a tab as in the below example
i = 5
z = 25
Basically afte r the i variable we should have a \s and after z variable a \t . ( i hope the code format will not erase it)
def custom_checks(file, rule):
"""
#param file: file: file in-which you search for a specific character
#param rule: the specific character you search for
#return: dict obj with the form { line number : character }
"""
rule=re.escape(rule)
logging.info(f" File {os.path.abspath(file)} checked for {repr(rule)} inside it ")
result_dict = {}
file = fileinput.input([file])
for idx, line in enumerate(file):
if re.search(rule, line):
result_dict[idx + 1] = str(rule)
file.close()
if not len(result_dict):
logging.info("Zero non-compliance found based on the rule:2 consecutive empty rows")
else:
logging.warning(f'Found the next errors:{result_dict}')
After that if i will check the logging output i will see this:
checked for '\+s\\s\$' inside it i dont know why the \ are double
Also basically i get all the regex from a config.json which is this one:
{
"ends with tab":"+\\t$",
"ends with space":"+s\\s$"
}
Could some one help me please in this direction-> I basically know that I may do in other ways such as reverse the line [::-1] get the first character and see if its \s etc but i really wanna do it with regex.
Thanks!
Try:
rules = {
'ends with tab': re.compile(r'\t$'),
'ends with space': re.compile(r' $'),
}
Note: while getting lines from iterating the file will leave newline ('\n') at the end of each string, $ in a regex matches the position before the first newline in the string. Thus, if using regex, you don't need to explicitly strip newlines.
if rule.search(line):
...
Personally, however, I would use line.rstrip() != line.rstrip('\n') to flag trailing spaces of any kind in one shot.
If you want to directly check for specific characters at the end of the line, you then need to strip any newline, and you need to check if the line isn't empty. For example:
char = '\t'
s = line.strip('\n')
if s and s[-1] == char:
...
Addendum 1: read rules from JSON config
# here from a string, but could be in a file, of course
json_config = """
{
"ends with tab": "\\t$",
"ends with space": " $"
}
"""
rules = {k: re.compile(v) for k, v in json.loads(json_config).items()}
Addendum 2: comments
The following shows how to comment out a rule, as well as a rule to detect comments in the file to process. Since JSON doesn't support comments, we can consider yaml instead:
yaml_config = """
ends with space: ' $'
ends with tab: \\t$
is comment: ^\\s*#
# ignore: 'foo'
"""
import yaml
rules = {k: re.compile(v) for k, v in yaml.safe_load(yaml_config).items()}
Note: 'is comment' is easy. A hypothetical 'has comment' is much harder to define -- why? I'll leave that as an exercise for the reader ;-)
Note 2: in a file, the yaml config would be without double backslash, e.g.:
cat > config.yml << EOF
ends with space: ' $'
ends with tab: \t$
is comment: ^\s*#
# ignore: 'foo'
EOF
Additional thought
You may want to give autopep8 a try.
Example:
cat > foo.py << EOF
# this is a comment
text = """
# xyz
bar
"""
def foo():
# to be continued
pass
def bar():
pass
EOF
Note: to reveal the extra spaces:
cat foo.py | perl -pe 's/$/|/'
# this is a comment |
|
text = """|
# xyz |
bar |
"""|
def foo(): |
# to be continued |
pass |
|
def bar():|
pass |
|
|
|
There are several PEP8 issues with the above (extra spaces at end of lines, only 1 line between the functions, etc.). Autopep8 fixes them all (but correctly leaves the text variable unchanged):
autopep8 foo.py | perl -pe 's/$/|/'
# this is a comment|
|
text = """|
# xyz |
bar |
"""|
|
|
def foo():|
# to be continued|
pass|
|
|
def bar():|
pass|

Python print -> Perl STDIN line skip problem

Im newbie of perl and python.
I need to file handling in python(dataframe), and that file need to calculated in Perl.
At first, I tried to use python subprocess, and it was not working(borken pipe)
i need to multiple lines from python, and perl code need to read it and processing.
I just use | in command line, and it was work, but perl skip odds number line and just read even number line.
how can i fix it?
my python code is :
import pandas as pd
data = pd.read_csv('./data.txt', sep = '\t', header = None)
datalist = list(data[0] + '_' + data[1])
for line in kinase_list:
print(line)
and my perl code is :
//
use strict;
my %new_list = ();
while (<STDIN>){
my $line = <STDIN>;
# print STDERR $line;
# chomp $line;
my ($name, $title) = split('_', <STDIN>);
$new_list{$title} = $name;
print STDERR $name, "\t", $title, "\n";
}
print STDERR scalar(keys %new_list);
my python output 657 lines, but perl just out 329.
how can i fix it?
The expression <STDIN> reads a line from standard input, so your Perl code reads two lines for every iteration of the while loop.
It is sufficient to say
while (<STDIN>) {
my $line = $_;
...
or just
while (my $line = <STDIN>) {
...

splitting string by multiple delimeters in python 3

I am trying to split a for statement in java by its deliemeters in python 3. For example for (int i = 0;) is split by the () and the ;. Then I would like to have another loop that goes through the string and then checks if it has the word "for" and then it will say "You have the beginning of the for loop". Basically I will do this for each part of the for loop. I plan on running this against a java script in which it will break it down into pseudocode. I will post what I have done so far below.I am new to python so I am having trouble with this.
code
import re
var = ("""for ( int i =0; i < 5; i++) { \
System.out.println("Hello World") \
""")
var2 =(re.split(';|(|)|{|}' , var))
for myvars in var2:
print(myvars)
for v in var:
if 'for' in v:
print("For loop starts here")

Summarizing log file to unique entries only

I have been using this script for years at work to summarize log files.
#!/usr/bin/perl
$logf = '/var/log/messages.log';
#logf=( `cat $logf` );
foreach $line ( #logf ) {
$line=~s/\d+/#/g;
$count{$line}++;
}
#alpha=sort #logf;
$prev = 'null';
#uniq = grep($_ ne $prev && ($prev = $_), #alpha);
foreach $line (#uniq) {
print "$count{$line}: ";
print "$line";
}
I have wanted to rewrite it in Python but I do not fully understand certain portions of it, such as:
#alpha=sort #logf;
$prev = 'null';
#uniq = grep($_ ne $prev && ($prev = $_), #alpha);
Does anyone know of a Python module that would negate the need to rewrite this? I haven't had any luck find something similar. Thanks in advance!
As the name of the var implies,
#alpha=sort #logf;
$prev = 'null';
#uniq = grep($_ ne $prev && ($prev = $_), #alpha);
is finding unique elements (i.e. removing duplicate lines), ignoring numbers in the line since they were previously replaced with #. Those three lines could have been written
#uniq = sort keys(%count);
or maybe even
#uniq = keys(%count);
Another way of writing the program in Perl:
my $log_qfn = '/var/log/messages.log';
open(my $fh, '<', $log_qfn)
or die("Can't open $log_qfn: $!\n");
my %counts;
while (<$fh>) {
s/\d+/#/g;
++$counts{$_};
}
#for (sort keys(%counts)) {
for (keys(%counts)) {
print "$counts{$_}: $_";
}
This should be easier to translate into Python.
#alpha=sort #logf;
$prev = 'null';
#uniq = grep($_ ne $prev && ($prev = $_), #alpha);
would be equivalent to
uniq = sorted(set(logf))
if logf were a list of lines.
However, since you are counting the freqency of lines,
you could use a collections.Counter to both count the lines and collect the unique lines (as keys) (thus removing the need to compute uniq at all):
count = collections.Counter()
for line in f:
count[line] += 1
import sys
import re
import collections
logf = '/var/log/messages.log'
count = collections.Counter()
write = sys.stdout.write
with open(logf, 'r') as f:
for line in f:
line = re.sub(r'\d+','#',line)
count[line] += 1
for line in sorted(count):
write("{c}: {l}".format(c = count[line], l = line))
I have to say I often encountered with people trying to do stuff in python perl can be done in one line on shell or bash:
I don't care for downvotes, since people should know there is no reason to do stuff in 20 lines of python if it can be done on shell
< my_file.txt | sort | uniq > uniq_my_file.txt

Bash script to select a single Python function from a file

For a git alias problem, I'd like to be able to select a single Python function from a file, by name. eg:
...
def notyet():
wait for it
def ok_start(x):
stuff
stuff
def dontgettrickednow():
keep going
#stuff
more stuff
def ok_stop_now():
In algorithmic terms, the following would be close enough:
Start filtering when you find a line that matches /^(\s*)def $1[^a-zA-Z0-9]/
Keep matching until you find a line that is not ^\s*# or ^/\1\s] (that is, either a possibly-indented comment, or an indent longer than the previous one)
(I don't really care if decorators before the following function are picked up. The result is for human reading.)
I was trying to do this with Awk (which I barely know) but it's a bit harder than I thought. For starters, I'd need a way of storing the length of the indent before the original def.
One way using awk. Code is well commented, so I hope it's easy to understand.
Content of infile:
...
def notyet():
wait for it
def ok_start(x):
stuff
stuff
def dontgettrickednow():
keep going
#stuff
more stuff
def ok_stop_now():
Content of script.awk:
BEGIN {
## 'f' variable is the function to search, set a regexp with it.
f_regex = "^" f "[^a-zA-Z0-9]"
## When set, print line. Otherwise omit line.
## It is set when found the function searched.
## It is unset when found any character different from '#' with less
## spaces before it.
in_func = 0
}
## Found function.
$1 == "def" && $2 ~ f_regex {
## Get position of first 'd' in the line.
i = index( $0, "d" )
## Sanity check. Never should success because the condition was
## checked before.
if ( i == 0 ) {
next
}
## Get characters until matched index before, check that all of
## them are spaces, and get its length.
indent = substr( $0, 0, i - 1 )
if ( indent ~ /^[[:space:]]*$/ ) {
num_spaces = length( indent )
}
## Set variable, print line and read next one.
in_func = 1
print
next
}
## When we are inside the function, line doesn't begin with '#' and
## it's not a blank line (only spaces).
in_func == 1 && $1 ~ /^[^#]/ && $0 ~ /[^[:space:]]/ {
## Get how many characters there are until first non-space. The result
## is the position of first non-blank, so substract one to get the number
## of spaces.
spaces = match( $0, /[^[:space:]]/ )
spaces -= 1
## If current indent is less or equal that the indent of function definition, then
## end of function found, so end processing.
if ( spaces <= num_spaces ) {
in_func = 0
}
}
## Self-explanatory.
in_func == 1 {
print
}
Run it like:
awk -f script.awk -v f="ok_start" infile
With following output:
def ok_start(x):
stuff
stuff
def dontgettrickednow():
keep going
#stuff
more stuff
Why not just let python do it? I think the inspection module can print out the source of a function, so you could just import the module, select the function and inspect it. Hang on. Banging away at a solution for you...
OK. It turns out the inspect.getsource function doesn't work for stuff defined interactively:
>>> def test(f):
... print 'arg:', f
...
>>> test(1)
arg: 1
>>> inspect.getsource(test)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\inspect.py", line 699, in getsource
lines, lnum = getsourcelines(object)
File "C:\Python27\lib\inspect.py", line 688, in getsourcelines
lines, lnum = findsource(object)
File "C:\Python27\lib\inspect.py", line 529, in findsource
raise IOError('source code not available')
IOError: source code not available
>>>
But for your use case, it will work: For modules that are saved to disk. Take for instance my test.py file:
def test(f):
print 'arg:', f
def other(f):
print 'other:', f
And compare with this interactive session:
>>> import inspect
>>> import test
>>> inspect.getsource(test.test)
"def test(f):\n print 'arg:', f\n"
>>> inspect.getsource(test.other)
"def other(f):\n print 'other:', f\n"
>>>
So... You need to write a simple python script that accepts the name of a python source file and a function/object name as arguments. It should then import the module and inspect the function and print that to STDOUT.

Categories