Python to search keyword starts with and Replace in file

Python to search keyword starts with and Replace in file - python

I have file1.txt which has below contents
if [ "x${GRUB_DEVICE_UUID}" = "x" ] || [ "x${GRUB_DISABLE_LINUX_UUID}" = "xtrue" ] \
|| ! test -e "/dev/disk/by-uuid/${GRUB_DEVICE_UUID}" \
|| uses_abstraction "${GRUB_DEVICE}" lvm; then
LINUX_ROOT_DEVICE=${GRUB_DEVICE}
else
LINUX_ROOT_DEVICE=UUID=${GRUB_DEVICE_UUID}
fi
GRUBFS="`${grub_probe} --device ${GRUB_DEVICE} --target=fs 2>/dev/null || true`"
Linux_CMDLINE="nowatchdog rcupdate.rcu_cpu_stall_suppress=1"
I want to find string starts with Linux_CMDLINE=" and replace that line with Linux_CMDLINE=""
I tried below code and it is not working. Also I am thinking it is not best way to implement. Is there any easy method to achieve this?
with open ('/etc/grub.d/42_sgi', 'r') as f:
newlines = []
for line in f.readlines():
if line.startswith('Linux_CMDLINE=\"'):
newlines.append("Linux_CMDLINE=\"\"")
else:
newlines.append(line)
with open ('/etc/grub.d/42_sgi', 'w') as f:
for line in newlines:
f.write(line)
output expected:
if [ "x${GRUB_DEVICE_UUID}" = "x" ] || [ "x${GRUB_DISABLE_LINUX_UUID}" = "xtrue" ] \
|| ! test -e "/dev/disk/by-uuid/${GRUB_DEVICE_UUID}" \
|| uses_abstraction "${GRUB_DEVICE}" lvm; then
LINUX_ROOT_DEVICE=${GRUB_DEVICE}
else
LINUX_ROOT_DEVICE=UUID=${GRUB_DEVICE_UUID}
fi
GRUBFS="`${grub_probe} --device ${GRUB_DEVICE} --target=fs 2>/dev/null || true`"
Linux_CMDLINE=""

repl = 'Linux_CMDLINE=""'
with open ('/etc/grub.d/42_sgi', 'r') as f:
newlines = []
for line in f.readlines():
if line.startswith('Linux_CMDLINE='):
line = repl
newlines.append(line)

Minimal code thanks to open file for both reading and writing?
# Read and write (r+)
with open("file.txt","r+") as f:
find = r'Linux_CMDLINE="'
changeto = r'Linux_CMDLINE=""'
# splitlines to list and glue them back with join
newstring = ''.join([i if not i.startswith(find) else changeto for i in f])
f.seek(0)
f.write(newstring)
f.truncate()

Related

Python append text between lines

The task:
I have list of IPs which needs to be added to the .htaccess files in this format:
##ip_access1
Require ip 127.0.0.1
Require ip 127.0.0.2
Require all denied
##ip_access2
The problem:
How to append text into .htaccess file with Python? I know how to do this with bash, but I need Python specifically for now.
Cases:
If tuple of IPs is empty, find pattern ##ip_access1 ##ip_access2 and delete everything between them including the pattern in the file;
If .htaccess file is not empty, append ##ip_access1 <...> ##ip_access2 to the bottom of the file with all IPs;
P.S. Bash implementation.
ip_access() {
local user=$1
local htaccess="/var/www/${user}/site/.htaccess"
local ips="$2"
# manipulate records in .htaccess
[ ! -f "${htaccess}" ] && touch "${htaccess}"
if [ -z "${ips}" ]; then
sed -i '/##ip_access1/,/##ip_access2/{d}' "${htaccess}"
chown "${user}":"${user}" "${htaccess}"
echo "IP access successfully reset!"
exit 0
fi
arrip=()
for ip in ${ips//,/ }; do
arrip+=("Require ip $ip\n")
done
# always inject fresh batch of ips
sed -i '/##ip_access1/,/##ip_access2/{d}' "${htaccess}"
{ echo -e "##ip_access1";\
echo -e "${arrip:?}" | head -c -1;\
echo -e "Require all denied";\
echo -e "##ip_access2"; } >> "${htaccess}"
chown "${user}":"${user}" "${htaccess}"
echo "IP access successfully set!"
}

This function is the bare bones of a possible solution. It doesn't perform any sanity checks so caution should be exercised.
import os
def ips_to_file(ips, file_path):
if len(ips) > 0:
ip_lines = ['##ip_access1'] + [f'Require ip {ip}' for ip in ips] + ['Require all denied', '##ip_access2']
else:
ip_lines = []
if os.path.isfile(file_path):
with open(file_path, 'r+') as fp:
lines = [line.strip() for line in fp.readlines()]
lines = lines[:lines.index('##ip_access1')] + ip_lines + lines[lines.index('##ip_access2')+1:]
fp.seek(0)
fp.truncate()
fp.writelines(lines)

Found solution with help of course:
from typing import *
ip_lst = ["1.1.1.1", "2.2.2.2", "3.3.3.3"]
htaccess_file_contents = open("test.txt", "r").read()
def _generate_htaccess_compat_lst(lst) -> str:
to_return = []
for addr in lst:
to_return.append("Require ip " + addr)
return "\n{}\n".format("\n".join(to_return))
def _inject_between(start, end, to_manipulate, to_replace) -> str:
lines = to_manipulate.splitlines()
counter = 0
pos1, pos2 = -1, -1
# find lines between that we need to replace
for line in lines:
if start == line:
pos1 = counter
elif end == line:
pos2 = counter
counter += 1
# return null if we can't find text between
if pos1 == -1 or pos2 == -1:
return None
# +1 to offset the last line as the first index is inclusive
return "\n".join(lines[0:pos1]) + start + to_replace + end + "\n".join(lines[pos2 + 1:len(lines)])
tmp = _inject_between("##ip_access1", "##ip_access2",
htaccess_file_contents,
_generate_htaccess_compat_lst(ip_lst))
print(tmp)
# feel free to write tmp back to .htaccess

Get a string in Shell/Python with subprocess

After this topic Get a string in Shell/Python using sys.argv , I need to change my code, I need to use a subprocess in a main.py with this function :
def download_several_apps(self):
subproc_two = subprocess.Popen(["./readtext.sh", self.inputFileName_download], stdout=subprocess.PIPE)
Here is my file readtext.sh
#!/bin/bash
filename="$1"
counter=1
while IFS=: true; do
line=''
read -r line
if [ -z "$line" ]; then
break
fi
python3 ./download.py \
-c ./credentials.json \
--blobs \
"$line"
done < "$filename"
And my download.py file
if (len(sys.argv) == 2):
downloaded_apk_default_location = 'Downloads/'
else:
readtextarg = os.popen("ps " + str(os.getppid()) + " | awk ' { out = \"\"; for(i = 6; i <= NF; i++) out = out$i\" \" } END { print out } ' ").read()
textarg = readtextarg.split(" ")[1 : -1][0]
downloaded_apk_default_location = 'Downloads/'+textarg[1:]
How can I get and print self.inputFileName_download in my download.py file ?
I used sys.argv as answerd by #tripleee in my previous post but it doesn't work as I need.

Ok I changed the last line by :
downloaded_apk_default_location = 'Downloads/'+textarg.split("/")[-1]
to get the textfile name

The shell indirection seems completely superfluous here.
import download
with open(self.inputFileName_download) as apks:
for line in apks:
if line == '\n':
break
blob = line.rstrip('\n')
download.something(blob=blob, credentials='./credentials.json')
... where obviously I had to speculate about what the relevant function from downloads.py might be called.

Compare a regex match from two separate files and replace with values from one of them

I'm not really sure how is the best way to do this... I was thinking I might need to do it in python?
filea.html contains data-tx-text="9817db21ccc2d9acc021c4536690b90a_se"
fileb.html contains data-tx-text="0850235fcb0e503150c224dad3156312_se"
There are the exact same occurrences of data-tx-text values from filea.html to fileb.html (171).
I want to be able to use a regex pattern or a simple Python program to
Find data-tx-text="(.*?)" in filea.html
Find data-tx-text="(.*?)" in fileb.html
Replace the value from filea.html with the value found in fileb.html
Move to the next occurrence.
Continue until the end of the file, or until all values in filea.html match those in fileb.html
I have the basics. For instance, I know the regex pattern that I need, and I am guessing I need to loop this in Python or something similar?
Maybe I can do it with sed, but I'm not that good with that, so any help is greatly appreciated.

In awk, you could use something like this:
NR == FNR {
match($0, /data-tx-text="[^"]+"/);
if (RSTART > 0) {
data[++a] = substr($0, RSTART + 14, RLENGTH - 15);
}
next;
}
/data-tx-text/ {
sub(/data-tx-text="[^"]+"/, "data-tx-text=\"" data[++b] "\"");
print;
}

With GNU awk for the 3rd arg to match():
$ cat tst.awk
match($0,/(.*)(data-tx-text="[^"]+")(.*)/,a) {
if (NR==FNR) {
fileb[++bcnt] = a[2]
}
else {
$0 = a[1] fileb[++acnt] a[3]
}
}
NR>FNR
$ awk -f tst.awk fileb filea
data-tx-text="0850235fcb0e503150c224dad3156312_se"
with other awks you'd use 3 calls to substr() after the match():
$ cat tst.awk
match($0,/data-tx-text="[^"]+"/) {
if (NR==FNR) {
fileb[++bcnt] = substr($0,RSTART,RLENGTH)
}
else {
$0 = substr($0,1,RSTART-1) fileb[++acnt] substr($0,RSTART+RLENGTH)
}
}
NR>FNR
$ awk -f tst.awk fileb filea
data-tx-text="0850235fcb0e503150c224dad3156312_se"

open filea find stringa
open fileb find stringb
replace stringa with stringb
replace stringb with stringa
Write files back
In code as below
import re
pattern = 'data-tx-text="(.*?)"'
With open('filea.html', 'r') as f:
filea = f.read()
With open('fileb.html', 'r') as f:
fileb = f.read()
stringa= re.match(pattern, filea).group()
stringb= re.match(pattern, fileb).group()
filea = filea.replace(stringa, stringb)
fileb = fileb.replace(stringb, stringa)
with open('filea.html', 'w') as f:
f.write(filea)
with open('filea.html', 'w') as f:
f.write(fileb)

So this is how I have solved it using python, its a bit manual in that i have to change the names of filea and fileb each time, but it works
I think i can improve the regex with escapes?
import re
import sys
with open('filea.html') as originalFile:
originalFileContents = originalFile.read()
pattern = re.compile(r'[0-9a-f]{32}_se')
originalMatches = pattern.findall(originalFileContents)
counter = 0
def replaceId(match):
global counter
value = match.group()
newValue = originalMatches[counter]
print counter, '=> replacing', value, 'with', newValue
counter = counter + 1
return newValue
with open('fileb.html') as targetFile:
targetFileContents = targetFile.read()
changedTargetFileContents = pattern.sub(replaceId, targetFileContents)
print changedTargetFileContents
new_file = open("Output.html", "w")
new_file.write(changedTargetFileContents)
new_file.close()
Available on Github: https://github.com/timm088/rehjex-py

Here's how I'd do it using Beautiful Soup:
from bs4 import BeautifulSoup as bs
replacements, replaced_html = [], ''
with open('fileb.html') as fileb:
# Extract replacements
soup = bs(fileb, 'html.parser')
tags = soup.find_all(lambda tag: tag.get('data-tx-text'))
replacements = [tag.get('data-tx-text') for tag in tags]
with open('filea.html') as filea:
# Replace values
soup = bs(filea, 'html.parser')
tags = soup.find_all(lambda tag: tag.get('data-tx-text'))
for tag in tags:
tag['data-tx-text'] = replacements.pop(0)
replaced_html = str(soup)
with open('filea.html', 'w') as new_filea:
# Update file
new_filea.write(replaced_html)

converting bash/shell line into python 2.6

I am relatively new to programming especially in BASH and python, as well as this site. Sorry for multiple posts!
I am trying to get this line into python. I have tried os.popen.
Are there any other ways you guys can think of how to do it. I am limited to python v2.6 and cannot upgrade to a newer version, otherwise I would know how to do it in 3.whatever.
Thanks!
sample1=($(/bin/cat /proc/meminfo | egrep 'MemTotal|MemFree|Cached|SwapTotal|SwapFree|AnonPages|Dirty|Writeback|PageTables|HugePages_' | awk ' { print $2} ' | pr -t -T --columns=15 --width=240))
This is what I have in python but it isn't working. Any one have any idea how to rearrange it so it would be the same as the line in BASH.
I know these shouldn't be elif. Honestly i'm stumped and don't know where to go from here.
lst = [] #
inFile = open('/proc/meminfo') # open file
line = inFile.readline()
sample1 = {} #
while(line): #
if line.find('MemTotal'):
line = line.split()
sample1['MemTotal'] = line[1]
elif line.find('MemFree'):
line = line.split()
sample1['MemFree'] = line[1]
elif line.find(line, 'Cached'):
line = line.split()
sample1['Cached'] = line[1]
elif line.find(line, 'SwapTotal'):
line = line.split()
sample1['SwapTotal'] = line[1]
elif line.find(line, 'SwapFree'):
line = line.split()
sample1['SwapFree'] = line[1]
elif line.find(line, 'AnonPages'):
line = line.split()
sample1['AnonPages'] = line[1]
elif line.find(line, 'Dirty'):
line = line.split()
sample1['Dirty'] = line[1]
elif line.find(line, 'Writeback'):
line = line.split()
sample1['WriteBack'] = line[1]
elif line.find(line, 'PageTables'):
line = line.split()
sample1['PageTables'] = line[1]
elif line.find(line, 'HugePages_'):
line = line.split()
sample1['HugePages'] = line[1]

This should run the bash command from python by piping the output through subprocess.Popen and work for python2.6:
from subprocess import Popen, PIPE
p1 = Popen(["cat","/proc/meminfo"], stdout=PIPE)
p2 = Popen(["egrep", 'MemTotal|MemFree|Cached|SwapTotal|SwapFree|AnonPages|Dirty|Writeback|PageTables|HugePages_' ], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close()
p3 = Popen(["awk","{ print $2}"],stdin=p2.stdout,stdout=PIPE)
p2.stdout.close()
p4 = Popen(["pr", "-t", "-T", "--columns=15", "--width=240"],stdin=p3.stdout,stdout=PIPE)
p3.stdout.close()
output = p4.communicate()
print(output[0])
The output from my system is:
16341932 4484840 5105220 0 8388604 8388604 108 0 5106832 78100 0 0 0 0 0
You can also open the file with python and pass the file object to the first process:
from subprocess import Popen,PIPE,STDOUT
with open("/proc/meminfo") as f:
p1 = Popen(["egrep", 'MemTotal|MemFree|Cached|SwapTotal|SwapFree|AnonPages|Dirty|Writeback|PageTables|HugePages_' ], stdin=f, stdout=PIPE)
p2 = Popen(["awk","{ print $2}"],stdin=p1.stdout,stdout=PIPE)
p1.stdout.close()
p3 = Popen(["pr", "-t", "-T", "--columns=15", "--width=240"],stdin=p2.stdout,stdout=PIPE)
p2.stdout.close()
output = p3.communicate()
print(output[0])
A pure python solution using str.find to mimic egrep finding lines that contain any of the substrings from pre in the file and using str.rsplit to get the second column i.e the digits:
pre = ('MemTotal', 'MemFree', 'Cached', 'SwapTotal', 'SwapFree', 'AnonPages', 'Dirty', 'Writeback', 'PageTables', 'HugePages_')
with open("/proc/meminfo") as f:
out = []
for line in f:
# if line.find(p) is not -1 we have a match
if any(line.find(p) != -1 for p in pre):
# split twice from the end on whitespace and get the second column
v = line.rsplit(None, 2)[1]
out.append(v)
print(" ".join(out))
Output:
16341932 4507652 5128624 0 8388604 8388604 48 0 5059044 78068 0 0 0 0 0
Using any in the above code will lazily evaluate and short circuit on a match, if there is no match it will evaluate to False so nothing gets added.
Staying truer to egrep we can use re.search compiling the patterns/substrings to check for:
import re
r = re.compile(r"MemTotal|MemFree|Cached|SwapTotal|SwapFree|AnonPages|Dirty|Writeback|PageTables|HugePages_")
with open("/proc/meminfo") as f:
out =[]
for line in f:
if r.search(line):
v = line.rsplit(None, 2)[1]
out.append(v)
print(" ".join(out))
Output:
16341932 4507596 5128952 0 8388604 8388604 0 16788 5058092 78464 0 0 0 0 0
And python being python we can put all the logic in a single list comp to get the data:
pre = ('MemTotal', 'MemFree', 'Cached', 'SwapTotal', 'SwapFree', 'AnonPages', 'Dirty', 'Writeback', 'PageTables', 'HugePages_')
with open("/proc/meminfo") as f:
out = [line.rsplit(None, 2)[1] for line in f if r.search(line)]
print(" ".join(out))
Output:
16341932 4443796 5133420 0 8388604 8388604 120 0 5118004 78572 0 0 0 0 0

This gives the same output, but using built-in Python features instead of shelling out for everything:
columns = [
'MemTotal', 'MemFree', 'Cached', 'SwapTotal', 'SwapFree', 'AnonPages',
'Dirty', 'Writeback', 'WritebackTmp', 'PageTables', 'HugePages_Free',
'HugePages_Rsvd', 'HugePages_Surp', 'HugePages_Total'
]
stats = {}
with open('/proc/meminfo') as infile:
for line in infile:
line = line.split()
stats[line[0][:-1]] = line[1]
values = [stats[key] for key in columns]
print '\t'.join(values)

Something along this line perhaps:
desiredTags = [ 'MemTotal', 'MemFree', 'Cached', 'SwapCached', 'SwapTotal',
'SwapFree', 'AnonPages', 'Dirty', 'Writeback', 'PageTables',
'HugePages_Total', 'HugePages_Free', 'HugePages_Rsvd',
'HugePages_Surp' ]
stats = []
with open('/proc/meminfo') as fd:
for line in fd:
fields = line.strip().split()
# strip off the colon from the first field
if fields[0][:-1] in desiredTags:
stats.append(fields[1])
print ' '.join(stats)
Not sure I got the list of desired tags exactly right - feel free to amend those as necessary.

merge multiple lines into single line by value of column

I have a tab-delimited text file that is very large. Many lines in the file have the same value for one of the columns in the file. I want to put them into same line. For example:
a foo
a bar
a foo2
b bar
c bar2
After run the script it should become:
a foo;bar;foo2
b bar
c bar2
how can I do this in either a shell script or in Python?
thanks.

With awk you can try this
{ a[$1] = a[$1] ";" $2 }
END { for (item in a ) print item, a[item] }
So if you save this awk script in a file called awkf.awk and if your input file is ifile.txt, run the script
awk -f awkf.awk ifile.txt | sed 's/ ;/ /'
The sed script is to remove out the leading ;
Hope this helps

from collections import defaultdict
items = defaultdict(list)
for line in open('sourcefile'):
key, val = line.split('\t')
items[key].append(val)
result = open('result', 'w')
for k in sorted(items):
result.write('%s\t%s\n' % (k, ';'.join(items[k])))
result.close()
not tested

Tested with Python 2.7:
import csv
data = {}
reader = csv.DictReader(open('infile','r'),fieldnames=['key','value'],delimiter='\t')
for row in reader:
if row['key'] in data:
data[row['key']].append(row['value'])
else:
data[row['key']] = [row['value']]
writer = open('outfile','w')
for key in data:
writer.write(key + '\t' + ';'.join(data[key]) + '\n')
writer.close()

A Perl way to do it:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
open my $fh, '<', 'path/to/file' or die "unable to open file:$!";
my %res;
while(<$fh>) {
my ($k, $v) = split;
push #{$res{$k}}, $v;
}
print Dumper \%res;
output:
$VAR1 = {
'c' => [
'bar2'
],
'a' => [
'foo',
'bar',
'foo2'
],
'b' => [
'bar'
]
};

#! /usr/bin/env perl
use strict;
use warnings;
# for demo only
*ARGV = *DATA;
my %record;
my #order;
while (<>) {
chomp;
my($key,$combine) = split;
push #order, $key unless exists $record{$key};
push #{ $record{$key} }, $combine;
}
print $_, "\t", join(";", #{ $record{$_} }), "\n" for #order;
__DATA__
a foo
a bar
a foo2
b bar
c bar2
Output (with tabs converted to spaces because Stack Overflow breaks the output):
a foo;bar;foo2
b bar
c bar2

def compress(infilepath, outfilepath):
input = open(infilepath, 'r')
output = open(outfilepath, 'w')
prev_index = None
for line in input:
index, val = line.split('\t')
if index == prev_index:
output.write(";%s" %val)
else:
output.write("\n%s %s" %(index, val))
input.close()
output.close()
Untested, but should work. Please leave a comment if there are any concerns

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python to search keyword starts with and Replace in file - python

repl = 'Linux_CMDLINE=""' with open ('/etc/grub.d/42_sgi', 'r') as f: newlines = [] for line in f.readlines(): if line.startswith('Linux_CMDLINE='): line = repl newlines.append(line)

Related

Python append text between lines

Get a string in Shell/Python with subprocess

Compare a regex match from two separate files and replace with values from one of them

converting bash/shell line into python 2.6

merge multiple lines into single line by value of column

Categories

Resources