Python replacement of Rubys grep? - python

abc=123
dabc=123
abc=456
dabc=789
aabd=123
From the above file I need to find lines beginning with abc= (whitespaces doesn't matter)
in ruby I would put this in an array and do
matches = input.grep(/^\s*abc=.*/).map(&:strip)
I'm a totally noob in Python, even said I'm a fresh Python developer is too much.
Maybe there is a better "Python way" of doing this without even grepping ?
The Python version I have available on the platform where I need to solve the problem is 2.6
There is no way of use Ruby at that time

with open("myfile.txt") as myfile:
matches = [line.rstrip() for line in myfile if line.lstrip().startswith("abc=")]

In Python you would typically use a list comprehension whose if clause does what you'd accomplish with Ruby's grep:
import sys, re
matches = [line.strip() for line in sys.stdin
if re.match(r'^\s*abc=.*', line)]

Related

Using Mac Automator to run Python Find & Replace. Multiline (\n\r) not working

I'm using Mac Automator to run a 'Find & Replace' python script. (Below is a simplified version.) It's working really well, with the exception of line breaks...
I've tried multiple variants of \r \n \r\n but it's not removing the line breaks.
replacements = {
'class="spec-container"':'class="row"',
'</span>\n\n':'</span>',
'</div>\n\n\n':'</div>\n'
}
with open('/../TEMP/INPUT.txt') as infile,
open('/../TEMP/OUTPUT.txt', 'w') as outfile:
for line in infile:
for find, replace in replacements.iteritems():
line = line.replace(find, replace)
outfile.write(line)
Really only just find my feet with Python, so apologies in advance. But any help gratefully received.
You can use the strip() method for removing whitespace around a string in Python. "hello\n".strip() will return "hello". You can also use lstrip() or rstrip() if you only want it to strip the string on one side.

"\n" not working in python while writing to files

I wrote python code to write to a file like this:
with codecs.open("wrtieToThisFile.txt",'w','utf-8') as outputFile:
for k,v in list1:
outputFile.write(k + "\n")
The list1 is of type (char,int)
The problem here is that when I execute this, file doesn't get separated by "\n" as expected. Any idea what is the problem here ? I think it is because of the
with
Any help is appreciated. Thanks in advance.
(I am using Python 3.4 with "Python Tools for Visual Studio" version 2.2)
If you are on windows the \n doesn't terminate a line.
Honestly, I'm surprised you are having a problem, by default any file opened in text mode would automatically convert the \n to os.linesep. I have no idea what codecs.open() is but it must be opening the file in binary mode.
Given that is the case you need to explicitly add os.linesep:
outputFile.write(k + os.linesep)
Obviously you have to import os somewhere.
Figured it out, from here:
How would I specify a new line in Python?
I had to use "\r\n" as in Windows, "\r\n" will work.
Per codecs.open's documentation, codecs.open opens the underlying file in binary mode, without line ending conversion. Frankly, codecs.open is semi-deprecated; in Python 2.7 and onwards, io.open (which is the same thing as the builtin open function in Python 3.x) handles 99% of the cases people used to use codecs.open, but better (faster, and without stupid issues like line endings). If you're reliably running on Python 3, just use plain open; if you need to run on Python 2.7 as well, import io and use io.open.
If you are on windows, try '\r\n'. Or open it with an editor that recognizes unix style new lines.

Replace all regex matches in a file

Consider a basic regex like a(.+?)a. How can one replace all occurences of that regex in a file with the content of the first group?
Use can use the re module to use regular expressions in python and the fileinput module to simply replace text in files in-place
Example:
import fileinput
import re
fn = "test.txt" # your filename
r = re.compile('a(.+?)a')
for line in fileinput.input(fn, inplace=True):
match = r.match(line)
print match.group() if match else line.replace('\n', '')
Before:
hello this
aShouldBeAMatch!!!!! and this should be gone
you know
After:
hello this
aShouldBeAMa
you know
Note: this works because the argument inplace=True causes input file to be moved to a backup file and standard output is directed to the input file, as documented under Optional in-place filtering.
You can use Notepad++ with Version >= 6.0. Since then it does support PCRE Regex.
You can then use your regex a(.+?)a and replace with $1
sed
Are you limited to using Python tools? Because sed works very well.
$ sed -i <filename> "s/a(.+?)a/\1/g"
Vim
In a Vim window, give the following search-and-replace ex command:
:%s/\va(.+?)a/\1/g
Note that many regex characters are escaped in Vim- \v sets "very magic" mode, which removes the need for escaping. The same command with "magic" (the default) is :%s/a\(.\+\?)a/\1/g
Python
If you're looking to do this in Python, BigYellowCactus' answer is excellent (use the re module for regex, and fileinput to modify the file).

python system call

having a difficult time understanding how to get python to call a system function...
the_file = ('logs/consolidated.log.gz')
webstuff = subprocess.Popen(['/usr/bin/zgrep', '/meatsauce/', the_file ],stdout=subprocess.PIPE) % dpt_search
for line in webstuff.stdout:
print line
Trying to get python to build another file with my search string.
Thanks!
I recommend the PyMotW Subprocess page from Doug Hellmann who (quoted) "Reads the docs so you don't have to"
Apart from that:
f = file('sourcefile')
for line in f:
if 'pattern' in line:
# mind the , at the end,
# since there's no stripping involved
# and print adds a newline without it
print line,
if you need to match regular expressions apart from the documentation in the Python Standard Library documentation for the re module also refer to the PyMotW Regular Expression page

Replace part of string using python regular expression

I have the following lines (many, many):
...
gfnfgnfgnf: 5656756734
arvervfdsa: 1343453563
particular: 4685685685
erveveersd: 3453454545
verveversf: 7896789567
..
What I'd like to do is to find line 'particular' (whatever number is after ':')
and replace this number with '111222333'. How can I do that using python regular expressions ?
for line in input:
key, val = line.split(':')
if key == 'particular':
val = '111222333'
I'm not sure regex would be of any value in this specific case. My guess is they'd be slower. That said, it can be done. Here's one way:
for line in input:
re.sub('^particular : .*', 'particular : 111222333')
There are subtleties involved in this, and this is almost certainly not what you'd want in production code. You need to check all of the re module constants to make sure the regex is acting the way you expect, etc. You might be surprised at the flexibility you find in dealing with problems like this in Python if you try not to use re (of course, this isn't to say re isn't useful) ;-)
Sure you need a regular expression?
other_number = '111222333'
some_text, some_number = line.split(': ')
new_line = ': '.join(some_text, other_number)
#!/usr/bin/env python
import re
text = '''gfnfgnfgnf: 5656756734
arvervfdsa: 1343453563
particular: 4685685685
erveveersd: 3453454545
verveversf: 7896789567'''
print(re.sub('[0-9]+', '111222333', text))
input = """gfnfgnfgnf: 5656756734
arvervfdsa: 1343453563
particular: 4685685685
erveveersd: 3453454545
verveversf: 7896789567"""
entries = re.split("\n+", input)
for entry in entries:
if entry.startswith("particular"):
entry = re.sub(r'[0-9]+', r'111222333', entry)
or with sed:
sed -e 's/^particular: [0-9].*$/particular: 111222333/g' file
An important point here is that if you have a lot of lines, you want to process them one by one. That is, instead of reading all the lines in replacing them, and writing them out again, you should read in a line at a time and write out a line at a time. (This would be inefficient if you were actually reading a line at a time from the disk; however, Python's IO is competent and will buffer the file for you.)
with open(...) as infile, open(...) as outfile:
for line in infile:
if line.startswith("particular"):
outfile.write("particular: 111222333")
else:
outfile.write(line)
This will be speed- and memory-efficient.
Your sed example forces me to say neat!
python -c "import re, sys; print ''.join(re.sub(r'^(particular:) \d+', r'\1 111222333', l) for l in open(sys.argv[1]))" file

Categories