python pattern cutting of strings in a list - python

I have a dictionary variable "d" with key ,an integer, and value as a list of strings.
368501900 ['GH131.hmm ', 'CBM1.hmm ']
368499531 ['AA8.hmm ']
368500556 ['AA7.hmm ']
368500559 ['GT2.hmm ']
368507728 ['GH16.hmm ']
368496466 ['AA2.hmm ']
368504803 ['GT21.hmm ']
368503093 ['GT1.hmm ', 'GT4.hmm ']
The code is like this:
d = dict()
for key in d:
dictValue = d[key]
dictMerged = list(sorted(set(dictValue), key=dictValue.index))
print (key, dictMerged)
However, I want to remove string after the numbers in the lists so I can have a result like this:
368501900 ['GH', 'CBM']
368499531 ['AA']
368500556 ['AA']
368500559 ['GT']
368507728 ['GH']
368496466 ['AA']
368504803 ['GT']
368503093 ['GT']
I think the code should be inserted between dictValue and dictMerged, but I cannot make a logic.
Please, any ideas?

import this at the beginning
import re
now use this line between dictValue and dictMerged
new_dict_value = [re.sub(r'\d.*', '', x) for x in dictValue]
and then use new_dict_value in the next line

String objects have a nice .isdigit() method. Here are some non-re solutions for cleaning your data.
Plain old loop:
values = ['GT1.hmm ', 'GT4.hmm ']
clean_values = []
for item in values:
clean_item = []
for c in item:
if c.isdigit():
break
clean_item.append(c)
clean_values.append("".join(clean_item))
list comprehension using a StopIteration exception to act as a break inside of a generator expression: (Note using this stop() method in a list comprehension doesn't work, it requires a generator expression, normally denoted by (), but inside of a .join() these are optional.
def stop():
raise StopIteration
values = ['GT1.hmm ', 'GT4.hmm ']
clean_values = ["".join(c if not c.isdigit() else stop() for c in item) for item in values]
list comprehension using itertools.takewhile :
from itertools import takewhile
values = ['GT1.hmm ', 'GT4.hmm ']
clean_values = ["".join(takewhile(lambda c: not c.isdigit(),item)) for item in values]
Examples derived from:
http://tech.pro/tutorial/1554/four-tricks-for-comprehensions-in-python#breaking_the_loop

Related

python translating strings, and changing them back using dictionary

I've got this code that translates a string (that starts out as a list) using a dictionary. I wanted the code to translate the string, then un-translate it back to the original.
This is the code that I've got so far:
words = ['Abra', ' ', 'cadabra', '!']
clues = {'A':'Z', 'a':'z', 'b':'y', 'c':'x'}
def converter(words, clues):
words = ''.join(words)
for item in words:
if item in clues.keys():
words = words.replace(item, clues[item])
return words
def reversal(clues):
clues = {v: k for k, v in clues.items()}
print(clues)
x = converter(words, clues)
print(x)
reversal(clues)
x = converter(words, clues)
print(x)
Only, this will print
"Zyrz xzdzyrz!"
"Zyrz xdzyrz!"
I'm not sure why it's not printing:
"Zyrz xzdzyrz!"
"Abra cadabra!"
Is there an error in my code that is causing it to act this way? I checked clues and it IS reversed properly after it goes through the function. What am I doing wrong?
Python already has the translate method on all strings, just call it!
def converter(text, clues, reverse=False):
if reverse:
clues = {v: k for k, v in clues.items()}
table = str.maketrans(clues)
return text.translate(table)
Usage:
words = ['Abra', ' ', 'cadabra', '!']
clues = {'A':'Z', 'a':'z', 'b':'y', 'c':'x'}
# join the text into a single string:
x = ''.join(words)
# convert first
x = converter(x, clues)
print(x) # -> you get `Zyrz xzdzyrz!`
#back to original
x = converter(x, clues, reverse=True)
print(x) # -> you get `Abra cadabra!`
Looks like you're trying to do a dictionary operation in place within a function. Your function needs to return the reversed version of the dictionary which you then need to pick up in your main:
# Your stuff here
def reversal(clues):
return {v: k for k, v in clues.items()}
x = converter(words, clues)
print(x)
clues_reversed = reversal(clues)
x = converter(words, clues_reversed)
print(x)

How to replace text between parentheses in Python?

I have a dictionary containing the following key-value pairs: d={'Alice':'x','Bob':'y','Chloe':'z'}
I want to replace the lower case variables(values) by the constants(keys) in any given string.
For example, if my string is:
A(x)B(y)C(x,z)
how do I replace the characters in order to get a resultant string of :
A(Alice)B(Bob)C(Alice,Chloe)
Should I use regular expressions?
re.sub() solution with replacement function:
import re
d = {'Alice':'x','Bob':'y','Chloe':'z'}
flipped = dict(zip(d.values(), d.keys()))
s = 'A(x)B(y)C(x,z)'
result = re.sub(r'\([^()]+\)', lambda m: '({})'.format(','.join(flipped.get(k,'')
for k in m.group().strip('()').split(','))), s)
print(result)
The output:
A(Alice)B(Bob)C(Alice,Chloe)
Extended version:
import re
def repl(m):
val = m.group().strip('()')
d = {'Alice':'x','Bob':'y','Chloe':'z'}
flipped = dict(zip(d.values(), d.keys()))
if ',' in val:
return '({})'.format(','.join(flipped.get(k,'') for k in val.split(',')))
else:
return '({})'.format(flipped.get(val,''))
s = 'A(x)B(y)C(x,z)'
result = re.sub(r'\([^()]+\)', repl, s)
print(result)
Bonus approach for particular input case A(x)B(y)C(Alice,z):
...
s = 'A(x)B(y)C(Alice,z)'
result = re.sub(r'\([^()]+\)', lambda m: '({})'.format(','.join(flipped.get(k,'') or k
for k in m.group().strip('()').split(','))), s)
print(result)
I assume you want to replace the values in a string with the respective keys of the dictionary. If my assumption is correct you can try this without using regex.
First the swap the keys and values using dictionary comprehension.
my_dict = {'Alice':'x','Bob':'y','Chloe':'z'}
my_dict = { y:x for x,y in my_dict.iteritems()}
Then using list_comprehension, you replace the values
str_ = 'A(x)B(y)C(x,z)'
output = ''.join([i if i not in my_dict.keys() else my_dict[i] for i in str_])
Hope this is what you need ;)
Code
import re
d={'Alice':'x','Bob':'y','Chloe':'z'}
keys = d.keys()
values = d.values()
s = "A(x)B(y)C(x,z)"
for i in range(0, len(d.keys())):
rx = r"" + re.escape(values[i])
s = re.sub(rx, keys[i], s)
print s
Output
A(Alice)B(Bob)C(Alice,Chloe)
Also you could use the replace method in python like this:
d={'x':'Alice','y':'Bob','z':'Chloe'}
str = "A(x)B(y)C(x,z)"
for key in d:
str = str.replace(key,d[key])
print (str)
But yeah you should swipe your dictionary values like Kishore suggested.
This is the way that I would do it:
import re
def sub_args(text, tosub):
ops = '|'.join(tosub.keys())
for argstr, _ in re.findall(r'(\(([%s]+?,?)+\))' % ops, text):
args = argstr[1:-1].split(',')
args = [tosub[a] for a in args]
subbed = '(%s)' % ','.join(map(str, args))
text = re.sub(re.escape(argstr), subbed, text)
return text
text = 'A(x)B(y)C(x,z)'
tosub = {
'x': 'Alice',
'y': 'Bob',
'z': 'Chloe'
}
print(sub_args(text, tosub))
Basically you just use the regex pattern to find all of the argument groups and substitute in the proper values--the nice thing about this approach is that you don't have to worry about subbing where you don't want to (for example, if you had a string like 'Fn(F,n)'). You can also have multi-character keys, like 'F(arg1,arg2)'.

Converting strings within a list into floats

I have a list of numerical values that are of type "string" right now. Some of the elements in this list have more than one value, e.g.:
AF=['0.056', '0.024, 0.0235', '0.724', '0.932, 0.226, 0.634']
The other thing is that some of the elements might be a .
With that being said, I've been trying to convert the elements of this list into floats (while still conserving the tuple if there's more than one value), but I keep getting the following error:
ValueError: could not convert string to float: .
I've tried a LOT of things to solve this, with the latest one being:
for x in AF:
if "," in x: #if there are multiple values for one AF
elements= x.split(",")
for k in elements: #each element of the sub-list
if k != '.':
k= map(float, k)
print(k) #check to see if there are still "."
else:
pass
But when I run that, I still get the same error. So I printed k from the above loop and sure enough, there were still . in the list, despite me stating NOT to include those in the string-to-float conversion.
This is my desired output:
AF=[0.056, [0.024, 0.0235], 0.724, [0.932, 0.226, 0.634]]
def convert(l):
new = []
for line in l:
if ',' in line:
new.append([float(j) for j in line.split(',')])
else:
try:
new.append(float(line))
except ValueError:
pass
return new
>>> convert(AF)
[0.056, [0.024, 0.0235], 0.724, [0.932, 0.226, 0.634]]
If you try this:
result = []
for item in AF:
if item != '.':
values = list(map(float, item.split(', ')))
result.append(values)
You get:
[[0.056], [0.024, 0.0235], [0.724], [0.932, 0.226, 0.634]]
You can simplify using a comprehension list:
result = [list(map(float, item.split(', ')))
for item in AF
if item != '.']
With re.findall() function (on extended input list):
import re
AF = ['0.056', '0.024, 0.0235, .', '.', '0.724', '0.932, 0.226, 0.634', '.']
result = []
for s in AF:
items = re.findall(r'\b\d+\.\d+\b', s)
if items:
result.append(float(items[0]) if len(items) == 1 else list(map(float, items)))
print(result)
The output:
[0.056, [0.024, 0.0235], 0.724, [0.932, 0.226, 0.634]]

how to get repeated integer value from string specific operators in python

I have a string value that contains numbers symbols and character.
The string is :
"1=value.2=value.4=value + 3=value.4=value+5=value"
How to find how many repeated key value in whole string and also find repeated key values separated by + operator?
Is there a better way than splitting the string?
Is there a better way than splitting the string?
Not really. You can use re.split() to split the string on every . and + character. And you can use len() to then get the length of the resulting list which corresponds to the number of "key, value" pairs:
>>> from re import split
>>> string = "1=value.2=value.4=value + 3=value.4=value+5=value"
>>> kv_pairs = re.split('\.|\+', string)
>>> kv_pairs
['1=value', '2=value', '4=value ', ' 3=value', '4=value', '5=value']
>>> len(kv_pairs) # number of key, value pairs
6
>>>
If you want to count repeated key, value pairs, you can use collections.Counter(). Note however, that in your example there are no repeated key, value pairs:
>>> from collections import Counter
>>> {k: v for k, v in dict(Counter(kv_pairs)).items() if v > 1}
{}
>>>
If you also want to include the separators, you can wrap the regex in parenthesis you form a capture group:
>>> kv_pairs = re.split('(\.|\+)', string)
>>> kv_pairs
['1=value', '.', '2=value', '.', '4=value ', '+', ' 3=value', '.', '4=value', '+', '5=value']
>>>
here is the another way . If my understanding is right.
import re
s = "1=value.2=value.4=value + 3=value.4=value+5=value"
# finding the all value like '1=value' as a list
new_dict_list = re.findall(r'\d+=?\w+', s)
# declaring dict for creating key value dict
new_dict={}
# declare a dict for creating key counting repetition
count_dict_key = {}
for ls in new_dict_list:
# split the list element for finding and storing key and value
split_value = ls.split('=')
key = split_value[0]
value = split_value[1]
# checking the repetition key
if split_value[0] in new_dict:
count_dict_key[key] = ''.join(new_dict_list).count(key)
else:
new_dict[split_value[0]]=value
count_dict_key[split_value[0]]=1
print("printing new dictionary....\n",new_dict)
print("counting key...\n",count_dict_key)

Changing string in 'list format' to a string

I have the following list;
lst = ["['atama', 'karada', 'kami', 'kao', 'hitai', 'me', 'mayu', 'mabuta', 'matsuge', 'hana']",
"['head', 'body', 'hair', 'face', 'forehead', 'eye', 'eyebrow', 'eyelid', 'eyelash', 'nose']"]
I need to get the contents of each item set as a list, so that I can print the items individually. Eg.
for item in lst:
for word in list(item):
print word
>>
atama
karada
kami
kao
etc.
Any ideas how I could format the str(item)|s to lists once again?
>>> import ast
>>> L = ["['atama', 'karada', 'kami', 'kao', 'hitai', 'me', 'mayu', 'mabuta', 'matsuge', 'hana']",
"['head', 'body', 'hair', 'face', 'forehead', 'eye', 'eyebrow', 'eyelid', 'eyelash', 'nose']"]
>>> for item in L:
for word in ast.literal_eval(item):
print word
atama
karada
kami
kao
hitai
me
mayu
mabuta
matsuge
hana
head
body
hair
face
forehead
eye
eyebrow
eyelid
eyelash
nose
I can think of several methods:
1) Manually extract each list item:
lst = [[item.strip()[1:-1] for item in element[3:-3].split(',')] for element in lst]
2) Use eval:
lst[:] = eval(lst[0]), eval(lst[1])
3) Use json:
import json
lst = [json.loads(i) for i in lst]
Methods 1 or 3 are preferred. eval is unsafe, as any string passed to eval will be (surprise, surprise) evaluated. Only use eval if you have complete control over what is being passed to it.
4) Another solution that occured to me, use regular expressions:
import re
lst = [re.findall("['\"](\w+)['\"]", item) for item in lst]
(one, two) = (list(lst[0]), list(lst[1]))

Categories