I want to put \n after every 20 character....
My_string = "aaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbccccccccccccccccccccddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeffffffffffffffffffff"
I tried with this: a = "\n".join(re.findall("(?s).{,20}", My_string))[0:-1]
When ever i am print like:
print '''
---------------------------------------------------------------
Value of a is
%s
---------------------------------------------------------------
''' % a
OUTPUT:
---------------------------------------------------------------
Value of a is
aaaaaaaaaaaaaaaaaaab
bbbbbbbbbbbbbbbbbbbc
cccccccccccccccccccd
ddddddddddddddddddde
eeeeeeeeeeeeeeeeeeef
fffffffffffffffffff
---------------------------------------------------------------
I want output like:
---------------------------------------------------------------
Value of a is
aaaaaaaaaaaaaaaaaaab
bbbbbbbbbbbbbbbbbbbc
cccccccccccccccccccd
ddddddddddddddddddde
eeeeeeeeeeeeeeeeeeef
fffffffffffffffffff
---------------------------------------------------------------
You want to create a list of all lines, both predefined and wrapped, then add space identation in front of each one (preferably in a single step to avoid duplicate code) and then join everything into a single string.
While regular expressions do the trick, have a look at a nice standard module textwrap, which allows you to wrap lines:
import textwrap
My_string = "aaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbccccccccccccccccccccddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeffffffffffffffffffff"
print '\n'.join(' {0}'.format(line) for line in [
'---------------------------------------------------------------',
'Value of a is'] + textwrap.fill(My_string, 20).split('\n') +
['---------------------------------------------------------------'])
prints
---------------------------------------------------------------
Value of a is
aaaaaaaaaaaaaaaaaaab
bbbbbbbbbbbbbbbbbbbc
cccccccccccccccccccd
ddddddddddddddddddde
eeeeeeeeeeeeeeeeeeef
fffffffffffffffffff
---------------------------------------------------------------
try this:
# -*- encoding: utf-8 -*-
import re
My_string = "aaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbccccccccccccccccccccddddddddddddddddddddeeeeeeeeeeeeeeeeeeeeffffffffffffffffffff"
split="\n "
a = split.join(re.findall("(?s).{,20}", My_string))[0:-1]
print ''' ---------------------------------------------------------------
Value of a is
%s ---------------------------------------------------------------''' % a
it looks like can meet your requirements
Related
Sorry for putting such a low level question but I really tried to look for the answer before coming here...
Basically I have a script which is searching inside .py files and reads line by line there code -> the object of the script is to find if a line is finishing with a space or a tab as in the below example
i = 5
z = 25
Basically afte r the i variable we should have a \s and after z variable a \t . ( i hope the code format will not erase it)
def custom_checks(file, rule):
"""
#param file: file: file in-which you search for a specific character
#param rule: the specific character you search for
#return: dict obj with the form { line number : character }
"""
rule=re.escape(rule)
logging.info(f" File {os.path.abspath(file)} checked for {repr(rule)} inside it ")
result_dict = {}
file = fileinput.input([file])
for idx, line in enumerate(file):
if re.search(rule, line):
result_dict[idx + 1] = str(rule)
file.close()
if not len(result_dict):
logging.info("Zero non-compliance found based on the rule:2 consecutive empty rows")
else:
logging.warning(f'Found the next errors:{result_dict}')
After that if i will check the logging output i will see this:
checked for '\+s\\s\$' inside it i dont know why the \ are double
Also basically i get all the regex from a config.json which is this one:
{
"ends with tab":"+\\t$",
"ends with space":"+s\\s$"
}
Could some one help me please in this direction-> I basically know that I may do in other ways such as reverse the line [::-1] get the first character and see if its \s etc but i really wanna do it with regex.
Thanks!
Try:
rules = {
'ends with tab': re.compile(r'\t$'),
'ends with space': re.compile(r' $'),
}
Note: while getting lines from iterating the file will leave newline ('\n') at the end of each string, $ in a regex matches the position before the first newline in the string. Thus, if using regex, you don't need to explicitly strip newlines.
if rule.search(line):
...
Personally, however, I would use line.rstrip() != line.rstrip('\n') to flag trailing spaces of any kind in one shot.
If you want to directly check for specific characters at the end of the line, you then need to strip any newline, and you need to check if the line isn't empty. For example:
char = '\t'
s = line.strip('\n')
if s and s[-1] == char:
...
Addendum 1: read rules from JSON config
# here from a string, but could be in a file, of course
json_config = """
{
"ends with tab": "\\t$",
"ends with space": " $"
}
"""
rules = {k: re.compile(v) for k, v in json.loads(json_config).items()}
Addendum 2: comments
The following shows how to comment out a rule, as well as a rule to detect comments in the file to process. Since JSON doesn't support comments, we can consider yaml instead:
yaml_config = """
ends with space: ' $'
ends with tab: \\t$
is comment: ^\\s*#
# ignore: 'foo'
"""
import yaml
rules = {k: re.compile(v) for k, v in yaml.safe_load(yaml_config).items()}
Note: 'is comment' is easy. A hypothetical 'has comment' is much harder to define -- why? I'll leave that as an exercise for the reader ;-)
Note 2: in a file, the yaml config would be without double backslash, e.g.:
cat > config.yml << EOF
ends with space: ' $'
ends with tab: \t$
is comment: ^\s*#
# ignore: 'foo'
EOF
Additional thought
You may want to give autopep8 a try.
Example:
cat > foo.py << EOF
# this is a comment
text = """
# xyz
bar
"""
def foo():
# to be continued
pass
def bar():
pass
EOF
Note: to reveal the extra spaces:
cat foo.py | perl -pe 's/$/|/'
# this is a comment |
|
text = """|
# xyz |
bar |
"""|
def foo(): |
# to be continued |
pass |
|
def bar():|
pass |
|
|
|
There are several PEP8 issues with the above (extra spaces at end of lines, only 1 line between the functions, etc.). Autopep8 fixes them all (but correctly leaves the text variable unchanged):
autopep8 foo.py | perl -pe 's/$/|/'
# this is a comment|
|
text = """|
# xyz |
bar |
"""|
|
|
def foo():|
# to be continued|
pass|
|
|
def bar():|
pass|
I have a string that I want to pass to a python script, e.g.
$printf "tas\nty\n"
yields
tas
ty
however when I pipe (e.g. printf "tas\nty\n" | ./pumpkin.py) where pumpkin.py is :
#!/usr/bin/python
import sys
data = sys.stdin.readlines()
print data
I get the output
['tas\n', 'ty\n']
How do I prevent the newline character from being read by python?
You can strip all white spaces (at the beginning and in the end) using strip :
data = [s.strip() for s in sys.stdin.readlines()]
If you need to strip only \n in the end you can do:
data = [s.rstrip('\n') for s in sys.stdin.readlines()]
Or use splitlines method:
data = sys.stdin.read().splitlines()
http://www.tutorialspoint.com/python/string_splitlines.htm
Python newbie here. I've been working my way through this code to basically create a string which includes a date. I have bits of the code working to get the data I want, however I need help formatting to string to tie in the data together.
This is what I have so far:
def get_rectype_count(filename, rectype):
return int(subprocess.check_output('''zcat %s | '''
'''awk 'BEGIN {FS=";"};{print $6}' | '''
'''grep -i %r | wc -l''' %
(filename, rectype), shell=True))
str = "MY VALUES ("
rectypes = 'click', 'bounce'
for myfilename in glob.iglob('*.gz'):
#print (rectypes)
print str.join(rectypes)
print (timestr)
print([get_rectype_count(myfilename, rectype)
for rectype in rectypes])
My output looks like this:
clickMY VALUES (bounce
'2015-07-01'
[222, 0]
I'm trying to create this output file:
MY VALUES ('2015-07-01', click, 222)
MY VALUES ('2015-07-01', bounce, 0)
When you call join on a string it joins together everything in the sequence passed to it, using itself as the separator.
>>> '123'.join(['click', 'bounce'])
click123bounce
Python supports formatting strings using replacement fields:
>>> values = "MY VALUES ('{date}', {rec}, {rec_count})"
>>> values.format(date='2015-07-01', rec='click', rec_count=222)
"MY VALUES ('2015-07-01', click, 222)"
With your code:
for myfilename in glob.iglob('*.gz'):
for rec in rectypes:
rec_count = get_rectype_count(myfilename, rec)
print values.format(date=timestr, rec=rec, rec_count=rec_count)
edit:
If you want to use join, you can join a newline, \n:
>>> print '\n'.join(['line1', 'line2'])
line1
line2
Putting it together:
print '\n'.join(values.format(date=timestr,
rec=rec,
rec_count=get_rectype_count(filename, rec))
for filename in glob.iglob('*.gz')
for rec in rectypes)
try this:
str1 = "MY VALUES ("
rectypes = ['click', 'bounce']
K=[]
for myfilename in glob.iglob('*.gz'):
#print (rectypes)
#print str.join(rectypes)
#print (timestr)
k=([get_rectype_count(myfilename, rectype)
for rectype in rectypes])
for i in range(0,len(rectypes)):
print str1+str(timestr)+","+rectypes[i]+","+str(k[i])+")"
My inputfile(i.txt) is given below:
പ്രധാനമന്ത്രി മന്മോഹന്സിംഗ് നാട്ടില് എത്തി .
അദ്ദേഹം മലയാളി അല്ല കാരണം അദ്ദേഹത്തെ പറ്റി പറയാന് വാക്കുകല്ളില്ല .
and my connectives are in the list:
connectives=['കാരണം','അതുകൊണ്ട് ','പക്ഷേ','അതിനാല്','എങ്കിലും','എന്നാലും','എങ്കില്','എങ്കില്പോലും',
'എന്നതുകൊണ്ട് ','എന്ന']
My desired output is(outputfile.txt):
പ്രധാനമന്ത്രി മന്മോഹന്സിംഗ് നാട്ടില് എത്തി .
അദ്ദേഹം മലയാളി അല്ല .
അദ്ദേഹത്തെ പറ്റി പറയാന് വാക്കുകല്ളില്ല .
If there are 2 connectives split according to that. My code is:
fr = codecs.open('i.txt', encoding='utf-8')
fw = codecs.open('outputfile.txt', 'w')
for line in fr:
line_data=line.split()
for x, e in list(enumerate(line_data)):
if e in connectives:
line_data[x]='.'
The code is not completed.
I think you just have some indentation problems. I also added u'' to the connectives to specify unicode since I am using python 2.7.
You need to maybe add a carriage return with the . if you want it to split an existing line into two lines...
Here is a start (but not final):
import codecs
connectives=[u'കാരണം',u'അതുകൊണ്ട് ',u'പക്ഷേ',u'അതിനാല്',u'എങ്കിലും',u'എന്നാലും',u'എങ്കില്',u'എങ്കില്പോലും',
u'എന്നതുകൊണ്ട് ',u'എന്ന']
fr = codecs.open('i.txt', encoding='utf-8')
# fw = codecs.open('outputfile.txt', 'w')
for line in fr:
line_data=line.split()
for x, e in list(enumerate(line_data)):
if e in connectives:
line_data[x]='.\n'
print " ".join(line_data).lstrip()
Generates this output (extra space because the split comes in the middle of a line).
പ്രധാനമന്ത്രി മന്മോഹന്സിംഗ് നാട്ടില് എത്തി .
അദ്ദേഹം മലയാളി അല്ല .
അദ്ദേഹത്തെ പറ്റി പറയാന് വാക്കുകല്ളില്ല .
Here's one way you could do it, building up a string word by word and adding .\n where appropriate:
#!/usr/bin/python
# -*- coding: utf-8 -*-
connectives=set(['കാരണം','അതുകൊണ്ട് ','പക്ഷേ','അതിനാല്','എങ്കിലും','എന്നാലും',
'എങ്കില്','എങ്കില്പോലും','എന്നതുകൊണ്ട് ','എന്ന', '.'])
s=""
with open('i.txt') as file:
for line in file:
for word in line.split():
if word in connectives:
s += '.\n'
else:
s += '{} '.format(word)
print s
Note that I added the '.' to the end of the connectives list and made it into a set. Sets are a type of collection that are useful for fast membership testing, such as if word in connectives: in the code. I also decided to use str.format to put the word into the string. This could be changed for word + ' ' if preferred.
Output:
പ്രധാനമന്ത്രി മന്മോഹന്സിംഗ് നാട്ടില് എത്തി .
അദ്ദേഹം മലയാളി അല്ല .
അദ്ദേഹത്തെ പറ്റി പറയാന് വാക്കുകല്ളില്ല .
Unlike the other answer, there's no problem with the leading whitespace at the start of each line after the first one.
By the way, if you are comfortable using list comprehensions, you could condense the code down to this:
#!/usr/bin/python
# -*- coding: utf-8 -*-
connectives=set(['കാരണം','അതുകൊണ്ട് ','പക്ഷേ','അതിനാല്','എങ്കിലും','എന്നാലും',
'എങ്കില്','എങ്കില്പോലും','എന്നതുകൊണ്ട് ','എന്ന', '.'])
with open('i.txt') as file:
s = ''.join(['.\n' if word in connectives else '{} '.format(word)
for line in file
for word in line.split()])
print s
I have a tab delimited text file with the following data:
ahi1
b/se
ahi
test -2.435953
1.218364
ahi2
b/se
ahi
test -2.001858
1.303935
I want to extract the two floating point numbers to a separate csv file with two columns, ie.
-2.435953 1.218264
-2.001858 1.303935
Currently my hack attempt is:
import csv
from itertools import islice
results = csv.reader(open('test', 'r'), delimiter="\n")
list(islice(results,3))
print results.next()
print results.next()
list(islice(results,3))
print results.next()
print results.next()
Which is not ideal. I am a Noob to Python so I apologise in advance and thank you for your time.
Here is the code to do the job:
import re
# this is the same data just copy/pasted from your question
data = """ ahi1
b/se
ahi
test -2.435953
1.218364
ahi2
b/se
ahi
test -2.001858
1.303935"""
# what we're gonna do, is search through it line-by-line
# and parse out the numbers, using regular expressions
# what this basically does is, look for any number of characters
# that aren't digits or '-' [^-\d] ^ means NOT
# then look for 0 or 1 dashes ('-') followed by one or more decimals
# and a dot and decimals again: [\-]{0,1}\d+\.\d+
# and then the same as first..
pattern = re.compile(r"[^-\d]*([\-]{0,1}\d+\.\d+)[^-\d]*")
results = []
for line in data.split("\n"):
match = pattern.match(line)
if match:
results.append(match.groups()[0])
pairs = []
i = 0
end = len(results)
while i < end - 1:
pairs.append((results[i], results[i+1]))
i += 2
for p in pairs:
print "%s, %s" % (p[0], p[1])
The output:
>>>
-2.435953, 1.218364
-2.001858, 1.303935
Instead of printing out the numbers, you could save them in a list and zip them together afterwards..
I'm using the python regular expression framework to parse the text. I can only recommend you pick up regular expressions if you don't already know it. I find it very useful to parse through text and all sorts of machine generated output-files.
EDIT:
Oh and BTW, if you're worried about the performance, I tested on my slow old 2ghz IBM T60 laptop and I can parse a megabyte in about 200ms using the regex.
UPDATE:
I felt kind, so I did the last step for you :P
Maybe this can help
zip(*[results]*5)
eg
import csv
from itertools import izip
results = csv.reader(open('test', 'r'), delimiter="\t")
for result1, result2 in (x[3:5] for x in izip(*[results]*5)):
... # do something with the result
Tricky enough but more eloquent and sequential solution:
$ grep -v "ahi" myFileName | grep -v se | tr -d "test\" " | awk 'NR%2{printf $0", ";next;}1'
-2.435953, 1.218364
-2.001858, 1.303935
How it works: Basically remove specific text lines, then remove unwanted text in lines, then join every second line with formatting. I just added the comma for beautification purposes. Leave the comma out of awks printf ", " if you don't need it.