String Comparison : Python [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have a group of strings which look like this:
M.HpyFIX.dna|GTNAAC
M1.HpyFXIII.dna|CCATC
M.HpyFI.dna|CAGT
M2.HpyFXIII.dna|CCATC
M.HpyFVI.dna|TGCA
M.HpyFVIII.dna|TCNNGA
M.HpyFORFX.dna|CCNNGG
M.HpyFII.dna|TCGA
M.HpyFVII.dna|ATTAAT
M.HpyFXII.dna|GTCA
M.HpyFV.dna|CCGG
M.HpyFXI.dna|CTNAG
M.HpyFIII.dna|GATC
M.HpyFIV.dna|GANTC
I wanna compare them only based on the string after the | (pipe). I dont want to use string.strip('|'). In the above case i would like to get each string one by one and apply the functions I have except for M1.HpyFXIII.dna|CCATC and M2.HpyFXIII.dna|CCATC which i would like to get into in a temporary list and then apply apply the functions.
The reason I want to use string comparisons is that I am using ETE to build phylogenetic trees and its much simpler with string comparisons

If not s.split('|')[1] to get the part of the string after the |, then perhaps
s[s.index('|')+1:]
Which grabs the substring from all characters past the | to the end of the string.
I wouldn't call using split as above a "massive headache", however, and it's arguably easier to read.
To transform the entire list, you can create a function that does what you want it to do, then use a list comprehension or map.

You could use the split() method, and then take the second string in the returned list.
_junk, myString = 'M.HpyFIX.dna|GTNAAC'.split('|')
Or if you don't want to store it in a string:
'M.HpyFIX.dna|GTNAAC'.split('|')[1]

Treat as csv file with custom delimiter
>>> import csv
>>> import collections
>>> with open('in.txt') as in_file:
... reader = csv.reader(in_file, delimiter='|')
... data = list(reader) #exhaust generator, convert it to list
... #now you have loaded your data in two-dimensional array, lets find dups
... dup_values = [x for x, y in collections.Counter([r[1] for r in data]).items() if y > 1]
... for r in data:
... if r[1] in dup_values:
... print r
...
['M1.HpyFXIII.dna', 'CCATC']
['M2.HpyFXIII.dna', 'CCATC']

Other option is str.partition:
x = "M.HpyFIX.dna|GTNAAC"
object, _, sequence = x.partition("|")
print(sequence)
# or grab the third element
print(x.partition("|")[1])

ls = ['M.HpyFIX.dna|GTNAAC', 'M1.HpyFXIII.dna|CCATC', 'M.HpyFVII.dna|ATTAAT']
nls = [ l.split('|')[1] for l in ls ]

Related

I need to convert the given list of String format to a single list [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
need to convert this list :
a = ["['0221', '02194', '02211']"]
type = list
to this list :
a = ['0221', '02194', '02211']
type = list
If your new to python this code would seem like very complicated, but i will explain whats in this piece of code:
a=["['0221', '02194', '02211']"]
a1=[]
nums_str=""
for i in a:
for j in i:
try:
if j=="," or j=="]":
a1.append(nums_str)
nums_str=""
nums=int(j)
nums_str+=str(nums)
except Exception as e:
pass
else:
a=a1.copy()
print(a)
print(type(a))
Steps:
Used for loop to read the content in list a.
Then again used a for loop to read each character in the string of i.
Then used try to try if i can typecast j into int so that it would only add the numbers to nums_str.
Then appended the nums_str to the list a1 if j if = "," or "]".
Continued the above process on each item in a.
After the for loop gets over, i change a to copy of a1.
You can use astliteral_eval to convert strings to Python data structures. In this case, we want to convert the first element in the list a to a list.
import ast
a = ast.literal_eval(a[0])
print(a)
# ['0221', '02194', '02211']
Note: Python built-in function eval also works but it's considered unsafe on arbitray strings. With eval:
a = eval(a[0]) # same desired output
You can try list comprehension:
a = a[0][2:][:-2].split("', '")
output:
a = ['0221', '02194', '02211']

Regex in Python to extract and sum all the numbers following strings that match certain format [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have strings like this one:
DL21032953:200,SWUS202106150117:72,SWUS202106150052:120,SWUS202106150055:108,SWUS202106150047:60,SWUS202106150045:72,SWUS202106150088:108,SWUS202106150085:120,SWUS202106150081:108,SWUS202106150075:108,SWUS202106150078:108,SWUS202106150165:96,SWUS202106150205:72,SWUS202106150168:84,SWUS202106150167:72,SWUS202106150227:48,DL21047822:240
I'd like to extract all the numbers after "DL...:" and sum them together. For example, in this case, the numbers in bold: 200 + 240 = 440. Is there a way to perform such an operation?
Use something like DL\d+:(\d+), then convert to int map(int and sum
import re
s = "DL21032953:200,SWUS2... 50227:48,DL21047822:240"
numbers = sum(map(int, re.findall(r"DL\d+:(\d+)", s)))
print(numbers) # 440
Your best shot is something like this:
import re
mystr = "DL21032953:200,SWUS202106150117:72,SWUS202106150052:120,SWUS202106150055:108,SWUS202106150047:60,SWUS202106150045:72,SWUS202106150088:108,SWUS202106150085:120,SWUS202106150081:108,SWUS202106150075:108,SWUS202106150078:108,SWUS202106150165:96,SWUS202106150205:72,SWUS202106150168:84,SWUS202106150167:72,SWUS202106150227:48,DL21047822:240"
sum(int(n) for n in re.findall(r'DL\d+:(\d+)\b', mystr))
Whilst realising that the OP requested an answer using regex -- below is a non-regex approach.
import json
PATTERN = "DL"
input = "DL21032953:200,SWUS202106150117:72,SWUS202106150052:120,SWUS202106150055:108,SWUS202106150047:60,SWUS202106150045:72,SWUS202106150088:108,SWUS202106150085:120,SWUS202106150081:108,SWUS202106150075:108,SWUS202106150078:108,SWUS202106150165:96,SWUS202106150205:72,SWUS202106150168:84,SWUS202106150167:72,SWUS202106150227:48,DL21047822:240"
# Create a dict from input string
d = dict(map(lambda x: x.split(":"), input.split(",")))
# Type conversion for dict values
d_converted = {k: int(v) for k, v in d.items()}
# Filter converted dict by pattern
d_filtered = dict(
filter(lambda x: x[0].startswith(PATTERN) == True, d_converted.items())
)
# Sum the filtered dict values
sum(d_filtered.values()) # 440

Writing to files (Python) [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Say i have a list formatted something like:a = [a,2,b,3,c,4,d,3]
and i want to write to any file that allows to create superscripts, like:
a^2
b^3
c^4
and so forth. What possible ways can this be done (The indices need to be formatted properly, like actual indices)?
As simple as this:
files=open('write.txt','a')
a = ['a','2','b','3','c','4','d','3']
count=0
while count<len(a):
files.write(a[count]+'^'+a[count+1]+'\n')
count=count+2
Here is a simple way to accomplish that. Replace the print statement with your write and you'll be in good shape.
First prep your list by dividing it into 2 pieces:
a = ['a',2,'b',3,'c',4,'d',3]
first = a[0::2]
second = a[1::2]
Next, loop the first list with enumeration and add the second value:
for i, f in enumerate(first):
super = '%s^%s' % (f, second[i])
print(super) # replace with write function
Output looks like this:
a^2
b^3
c^4
d^3
This should keep it simple!
It's basically just opening a file and then joining successive elements with a ^ and then joining all of these with a line-break. Finally this is written to a file and the file is closed:
with open('filename.txt', 'w') as file:
it = iter(a)
file.write('\n'.join('^'.join([first, str(second)]) for first, second in zip(it, it)))
If you don't want to use any joins and comprehensions you can also use formatting:
with open('filename.txt', 'w') as file:
template = '{}^{}\n' * (len(a) // 2)
formatted = template.format(*a)
file.write(formatted)

How to autodefine elements from a list in Python 2.7? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I want to create a list of both strings and integers, and make the strings work as variables whose value is the integers number.
list = ("y","1","x","3")
str(list[0]) == int(list[1])
str(list[2]) == int(list[3])
z = x + y
print(z)
I tried this, but it does't work anyway. Anybody knows a possible solution for that?
Use a dictionary:
data = {"y": 1, "x": 3}
z = data["y"] + data["x"]
print(z) # 4
Also:
list = ("x", "1", "y", "3")
Does not create a list, that creates a tuple. Also, don't use names like list as it is using the same name as the built-in list.
In [1]: exec 'y=1+1'
In [2]: y
Out[2]: 2
Needless to say that a dictionary is way better and that you should not trust user-provided input, personally I highly discourage you to pursue this path.
You can use zip() function to get the pairs and then use exec() to assign the integers to names:
>>> items = ("y","1","x","3")
>>> for i,j in zip(items[0::2], items[1::2]):
... exec("{}={}".format(i,j))
...
>>> x+y
4
But note that you need to be sure about the identity of your items, because using exec() might harm your machine.
Or as a more pythonic approach you can use a dict:
>>> items = ("y","1","x","3")
>>> the_dict = {i:int(j) for i, j in zip(items[0::2], items[1::2])}
>>>
>>> the_dict['x'] + the_dict['y']
4
Again in this case you need be sure of the type of items because converting the digits might raise an exception if they are not valid digits.

element that appear more that once in the list in Python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Please help (I know that it's a silly question):
I have a list d = [' ABA', ' AAB', ' BAA', ' BAA', ' AAB', ' ABA']. How can I exclude elements that appear more than once?
To exclude items from the list that appear more than once:
d = [x for x in d if d.count(x) == 1]
For the example provided above, d will bind to an empty list.
Others have posted good solutions to remove duplicates.
Convert to a set then back again:
list(set(d))
If order matters, you can pass the values through a dict that remembers the original indices. This approach, while expressible as a single expression, is considerably more complicated:
[x for (i, x) in sorted((i, x) for (x, i) in dict((x, i) for (i, x) in reversed(list(enumerate(d)))).iteritems())]
Of course, you don't have to use comprehensions. For this problem, a fairly simple solution is available:
a = []
for x in d:
if x not in a:
a.append(x)
Note that both the order-preserving solutions assume that you want to keep the first occurrence of each duplicated element.
Lets say you got a list named Words and a list UniqueWords, start a loop on Words, on each iteration you check if the list UniqueWords contains the iterated element, if so then continue, if not then add it to the UniqueWords. In the end you will have a list without duplicates. Another way you could do is a loop in a loop and instead of adding you'd remove it if it was found more than once :)
I bet there are far more efficient ways though.
If you're not worried about the order, d = list(set(d))).
If order matters check out the unique_everseen function in the itertools recpies documentation. It give a relatively clean iterator-based solution.
If order doesn't matter, convert to a set.
If you shouldn't have done that already, make sure to read the python docs on itertools, especially product(), permutations() and combinations().

Categories