Better way to write multiple replaces in Python - python

So i currently have the following replaces in Python
f = f.replace("</ ", "</")
f = f.replace("<? Xml", "<?xml")
f = f.replace("<String", "<string")
f = f.replace("</String", "</string")
f = f.replace("<Resources", "<resources")
f = f.replace("</Resources", "</resources")
This is because there is some malformed parts in my XML file.
Is there a better, more clean way to write this part?

For a list of tuples (a, b) where you want to replace a with b
def rep_many(s, l):
for a, b in l:
s = s.replace(a, b)
return s

Related

Counting the command line arguments and removing the not needed one in python

I want to write python code which will be run using the following command :
python3 myProgram.py 4 A B C D stemfile
Where 4 is the number of files and A,B,C,D are 4 files.Then I wanted to generate all the combinations of A,B,C,D except the empty one.(A, B, C, D, AB, AC, AD, BC, BD, CD, ABC, ABD, ACD, BCD, ABCD) But before that it will read the stemfile.names and if stemfile.names has a line | Final Pseudo Deletion Count is 0. Then only it will generate the above 15 combination, else it will say noisy data and will not print the combinations of 3 files and not consider D. So the output will be : (A, B, C, AB, AC, BC, ABC)
So in my code what I did is, I always took D as the last file arguments and ran that loop 1 time less. But it is not always true that D will be the last argument only. It can be like : python3 myProgram.py 4 B D C A stemfile
In this case, in my code the A will not be considered while making the combinations, But whenever that line will not be found in the stemfile.names, I just want to remove D file from the equation. How should I do that?
And later in that code, when the combination is A only it will store the A in a seperate outputfile, whenever it is AB then it stores the union of A,B files in a separate files and so on for all the combinations. Here also if there is noisy data then that D file will not come in any of the outputfile.
One more example, If I give : python3 myProgram.py 3 A D B stemfile
And the stemfile.names doesn't have the line | Final Pseudo Deletion Count is 0. then the output combinations are : A,B,AB and it will create 2 output files only.
Below I am attaching my code:
import sys
import itertools
from itertools import combinations
def union(files):
lines = set()
for file in files:
with open(file) as fin:
lines.update(fin.readlines())
return lines
def main():
number = int(sys.argv[1])
dataset = sys.argv[number+2]
with open(dataset+'.names') as myfile:
if '| Final Pseudo Deletion Count is 0.' in myfile.read():
a_list = sys.argv[2:number+2]
print("All possible combinations:\n")
for L in range(1, len(a_list)+1):
for subset in itertools.combinations(a_list, L):
print(*list(subset), sep=',')
print("...............................")
matrix = [itertools.combinations(a_list, r)
for r in range(1, len(a_list) + 1)]
combinations = [c for combinations in matrix for c in combinations]
for combination in combinations:
filenames = [f'{name}' for name in combination]
output = f'{"".join(combination)}_output'
print(f'Writing union of {filenames} to {output}')
with open(output, 'w') as fout:
fout.writelines(union(filenames))
else:
a_list = sys.argv[2:number+1]
# Here I am reducing a number only
print("Noisy data.\n")
print("So all possible combinations:\n")
for L in range(1, len(a_list)+1):
for subset in itertools.combinations(a_list, L):
print(*list(subset), sep=',')
print("................................")
matrix = [itertools.combinations(a_list, r)
for r in range(1, len(a_list) + 1)]
combinations = [c for combinations in matrix for c in combinations]
for combination in combinations:
filenames = [f'{name}' for name in combination]
output = f'{"".join(combination)}_output'
print(f'Writing union of {filenames} to {output}')
with open(output, 'w') as fout:
fout.writelines(union(filenames))
if __name__ == '__main__':
main()
Please help me out.
I think you should probably break this down into smaller, more specific questions. It seems like there is a lot of detail here that's not focused on the specific problem you're facing. I took a shot at what I think you're asking, however.
I think you're trying to figure out how to remove an item from the command line arguments. If that's the case, there's nothing you can do about what's passed to the program, but you can modify the list of inputs after you parse. I really think you should try reading about the argparse library, as I stated in my comment. I'm not sure if it's exactly what you're looking for, but here's some code using argparse that expects full filenames for each input file. The last argument must be the stemfile.
Once the arguments are parsed, you have list of pathlib.Path objects. You can simply remove the D file from the list.
import argparse
import itertools
import pathlib
NOISY_DATA_LINE = '| Final Pseudo Deletion Count is 0.'
def get_parser():
parser = argparse.ArgumentParser()
parser.add_argument('filenames', type=pathlib.Path, nargs='+')
parser.add_argument('stemfile', type=pathlib.Path)
return parser
def union(files):
lines = set()
for file in files:
with open(file) as fin:
lines.update(fin.readlines())
return lines
def main():
parser = get_parser()
args = parser.parse_args()
stemfile_lines = args.stemfile.read_text().splitlines()
if stemfile_lines[-1] == NOISY_DATA_LINE:
filenames = [p for p in args.filenames if p.stem != 'D']
else:
filenames = args.filenames
matrix = [itertools.combinations(filenames, r) for r in range(1, len(filenames) + 1)]
combinations = [c for combinations in matrix for c in combinations]
print(' '.join([str([p.stem for p in c]) for c in combinations]))
for combination in combinations:
output = f'{"".join([p.stem for p in combination])}_output.txt'
print(f'Writing union of {[p.stem for p in combination]} to {output}')
with open(output, 'w') as fout:
fout.writelines(union(filenames))
if __name__ == '__main__':
main()

Error in concatenation and one more error

I'm trying to import a csv file and then output the continuous series from the file into a new csv file
the contents of the file are like
1
5
6
7
8
and so on
here for example the output would be ['1,1','5,5','6,8']
The error i'm getting is
>>> gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
TypeError: can only concatenate str (not "int") to str
Also for some reason after I do str1 = str1.replace(i, '')
it turns str1 into
['2855']'2856']'3250']'3251']'3252']'3253']'3254']'3255']'3256']'3257']'3258']'3259']'3260']'3261']'3262']'3263']'3264']'3265']'3278']'3279']'3280']'3281']'3299']'3312']'3314']'3331']'3332']'3333']'3334']'3405']'3406']'3407']'3408']'3500']'4849']'4850']'5567']'5568']'5569']'6000']
2856]3250]3251]3252]3253]3254]3255]3256]3257]3258]3259]3260]3261]3262]3263]3264]3265]3278]3279]3280]3281]3299]3312]3314]3331]3332]3333]3334]3405]3406]3407]3408]3500]4849]4850]5567]5568]5569]6000]
intead of giving just
2856]3250]3251]3252]3253]3254]3255]3256]3257]3258]3259]3260]3261]3262]3263]3264]3265]3278]3279]3280]3281]3299]3312]3314]3331]3332]3333]3334]3405]3406]3407]3408]3500]4849]4850]5567]5568]5569]6000]
The code:
with open('Book1.csv', newline='') as f:
reader = csv.reader(f)
data = list(reader)
str1 = ''.join(str(e) for e in data)
bad_chars = ["[","'"]
for i in bad_chars :
str1 = str1.replace(i, '')
str1.split("]",-1)
x = list((str1.split("]")))
def ranges(nums):
nums = sorted(set(nums))
gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
return list(zip(edges, edges))
print(ranges(x))
Try this:
def ranges(nums):
nums = sorted(set(nums))
gaps = [[s, e] for s, e in zip(nums, nums[1:]) if s+1 < e]
edges = iter(nums[:1] + sum(gaps, []) + nums[-1:])
return list(zip(edges, edges))
data = []
with open('Book1.csv', newline='') as f:
reader = csv.reader(f)
for i in reader:
data.append(int(i[0]))
print(ranges(data))
The problem with your code was you were making the code more complex and the task redundant by joining lists of strings to a big string, and then removing the bad chars from it. Instead you could just add the integer parts of separate lists beforehand and saved the time, like I did.
Also, the code in ranges function was giving error because you were trying to add string s to 1, which is an integer. What you didn't realise then that the x list still contained string types.

Get certain files from list by pattern

I have this list. I want to make a for loop that will use in a function combinations of these files in the list.
I am not sure how to make these combinations that for each 'check' it will take the correct combination.
The function if it wasn't for the loop it would look like this:
erase('check3_dwg_Polyline','check3_dwg_Polyline_feat_to_polyg_feat_to_line','output_name')
What I've tried:
Here's the list.
li=['check3_dwg_Polyline', 'check2_dwg_Polyline',
'check3_dwg_Polyline_feat_to_polyg',# this will not be needed to extracted
'check2_dwg_Polyline_feat_to_polyg',# >> >>
'check3_dwg_Polyline_feat_to_polyg_feat_to_line',
'check2_dwg_Polyline_feat_to_polyg_feat_to_line']
start with this:
a=[li[i:i+3] for i in range(0, len(li), 3)]
where returns:
[['check3_dwg_Polyline',
'check2_dwg_Polyline',
'check3_dwg_Polyline_feat_to_polyg'],
['check2_dwg_Polyline_feat_to_polyg',
'check3_dwg_Polyline_feat_to_polyg_feat_to_line',
'check2_dwg_Polyline_feat_to_polyg_feat_to_line']]
Finally:
for base, base_f, base_line in a:
print(base, base_line, base + "_output")
gives:
check3_dwg_Polyline check3_dwg_Polyline_feat_to_polyg check3_dwg_Polyline_output
check2_dwg_Polyline_feat_to_polyg check2_dwg_Polyline_feat_to_polyg_feat_to_line check2_dwg_Polyline_feat_to_polyg_output
Other method:
base = [f for f in li if not f.endswith(("_polyg", "_to_line"))]
base_f = {f.strip("_feat_to_polyg"): f for f in li if f.endswith("_polyg")}
base_line = {f.strip("_feat_to_polyg_feat_to_line"): f for f in li if f.endswith("_to_line")}
[(b, base_f[b], base_line[b]) for b in base]
gives:
KeyError: 'check3_dwg_Polyline'
I have tried sorting the list but it just ruins it in a different way when put through the processes mentioned above.
The ideal result is this
when trying this:
for base, base_f, base_line in a:
print(base, base_line, base + "_output")
to give this:
check3_dwg_Polyline check3_dwg_Polyline_feat_to_polyg_feat_to_line check3_dwg_Polyline_output
check2_dwg_Polyline check2_dwg_Polyline_feat_to_polyg_feat_to_line check2_dwg_Polyline_output
where will be put in like this:
erase('check3_dwg_Polyline','check3_dwg_Polyline_feat_to_polyg_feat_to_line','output_name')
zip the list into chunks of check3, check2… Then you can do your for loop.
n = len(li) // 3
a = zip(*[li[i:i+n] for i in range(0, len(li), n)])
(pprint(list(a)) would output
[('check3_dwg_Polyline',
'check3_dwg_Polyline_feat_to_polyg',
'check3_dwg_Polyline_feat_to_polyg_feat_to_line'),
('check2_dwg_Polyline',
'check2_dwg_Polyline_feat_to_polyg',
'check2_dwg_Polyline_feat_to_polyg_feat_to_line')]

How to separate an input on python into different lists?

I have a code in which I require 3 different inputs to be put into separate lists. Currently I have 3 lists set up:
A = []
B = []
C = []
I also currently have 3 different inputs, one for each list, and I wish to combine these inputs into one input, separating each factor of this by a comma or semicolon.
For example:
Apple,365,rope
Using python, how would I separate each factor in the input so they can be put into different lists?
I have tried searching for how to separate using an input but this has not worked as I do not know exactly what the input will be.
Assuming that your input is on the command line using the input() function, you can do the following:
A = []
B = []
C = []
# let's say you input "Apple,365,rope"
my_input = input()
# we split it on each commma into a list -> ["Apple", "365","rope"]
split_input_list = myinput.split(',')
# finally we put each input into the respective list
A.append(split_input_list[0])
B.append(split_input_list[1])
C.append(split_input_list[2])
A = []
B = []
C = []
# if string
your_input = "Apple,365,rope"
your_input = your_input.split(",")
A = [your_input[0]]
B = [your_input[1]]
C = [your_input[2]]
print A, B, C
# if tuple
your_input = ("Apple", "365" , "rope")
A = [your_input[0]]
B = [your_input[1]]
C = [your_input[2]]
print A, B, C

How to compress by removing duplicates in python?

I have strings with blocks of the same character in, eg '1254,,,,,,,,,,,,,,,,982'. What I'm aiming to do is replace that with something along the lines of '1254(,16)982' so that the original string can be reconstructed. If anyone could point me in the right direction that would be greatly appreciated
You're looking for run-length encoding: here is a Python implementation based loosely on this one.
import itertools
def runlength_enc(s):
'''Return a run-length encoded version of the string'''
enc = ((x, sum(1 for _ in gp)) for x, gp in itertools.groupby(s))
removed_1s = [((c, n) if n > 1 else c) for c, n in enc]
joined = [["".join(g)] if n == 1 else list(g)
for n, g in itertools.groupby(removed_1s, key=len)]
return list(itertools.chain(*joined))
def runlength_decode(enc):
return "".join((c[0] * c[1] if len(c) == 2 else c) for c in enc)
For your example:
print runlength_enc("1254,,,,,,,,,,,,,,,,982")
# ['1254', (',', 16), '982']
print runlength_decode(runlength_enc("1254,,,,,,,,,,,,,,,,982"))
# 1254,,,,,,,,,,,,,,,,982
(Note that this will be efficient only if there are very long runs in your string).
If you don't care about the exact compressed form you may want to look at zlib.compress and zlib.decompress. zlibis a standard Python library that can compress a single string and will probably get better compression than a self implemented compression algorithm.
using regular expressions:
s = '1254,,,,,,,,,,,,,,,,982'
import re
c = re.sub(r'(.)\1+', lambda m: '(%s%d)' % (m.group(1), len(m.group(0))), s)
print c # 1254(,16)982
using itertools
import itertools
c = ''
for chr, g in itertools.groupby(s):
k = len(list(g))
c += chr if k == 1 else '(%s%d)' % (chr, k)
print c # 1254(,16)982

Categories