My list looks like this :
['', 'CCCTTTCGCGACTAGCTAATCTGGCATTGTCAATACAGCGACGTTTCCGTTACCCGGGTGCTGACTTCATACTT
CGAAGA', 'ACCGGGCCGCGGCTACTGGACCCATATCATGAACCGCAGGTG', '', '', 'AGATAAGCGTATCACG
ACCTCGTGATTAGCTTCGTGGCTACGGAAGACCGCAACAGGCCGCTCTTCTGATAAGTGTGCGG', '', '', 'ATTG
TCTTACCTCTGGTGGCATTGCAACAATGCAAATGAGAGTCACAAGATTTTTCTCCGCCCGAGAATTTCAAAGCTGT', '
TGAAGAGAGGGTCGCTAATTCGCAATTTTTAACCAAAAGGCGTGAAGGAATGTTTGCAGCTACGTCCGAAGGGCCACATA
', 'TTTTTTTAGCACTATCCGTAAATGGAAGGTACGATCCAGTCGACTAT', '', '', 'CCATGGACGGTTGGGGG
CCACTAGCTCAATAACCAACCCACCCCGGCAATTTTAACGTATCGCGCGGATATGTTGGCCTC', 'GACAGAGACGAGT
TCCGGAACTTTCTGCCTTCACACGAGCGGTTGTCTGACGTCAACCACACAGTGTGTGTGCGTAAATT', 'GGCGGGTGT
CCAGGAGAACTTCCCTGAAAACGATCGATGACCTAATAGGTAA', '']
Those are sample DNA sequences read from a file. The list can have various length, and one sequence can have 10 as well as 10,000 letters. In a source file, they are delimited by empty lines, hence empty items in list. How can I join all items in between empty ones ?
Try this, it's a quick and dirty solution that works fine, but won't be efficient if the input list is really big:
lst = ['GATTACA', 'etc']
[x for x in ''.join(',' if not e else e for e in lst).split(',') if x]
This is how it works, using generator expressions and list comprehensions from the inside-out:
',' if not e else e for e in lst : replace all '' strings in the list with ','
''.join(',' if not e else e for e in lst) : join together all the strings. Now the spaces between sequences will be separated by one or more ,
''.join(',' if not e else e for e in lst).split(',') : split the string at the points where there are , characters, this produces a list
[x for x in ''.join(',' if not e else e for e in lst).split(',') if x] : finally, remove the empty strings, leaving a list of sequences
Alternatively, the same functionality could be written in a longer way using explicit loops, like this:
answer = [] # final answer
partial = [] # partial answer
for e in lst:
if e == '': # if current element is an empty string …
if partial: # … and there's a partial answer
answer.append(''.join(partial)) # join and append partial answer
partial = [] # reset partial answer
else: # otherwise it's a new element of partial answer
partial.append(e) # add it to partial answer
else: # this part executes after the loop exits
if partial: # if one partial answer is left
answer.append(''.join(partial)) # add it to final answer
The idea is the same: we keep track of the non empty-strings and accumulate them, and whenever an empty string is found, we add all the accumulated values to the answer, taking care of adding the last sublist after the loop ends. The result ends up in the answer variable, and this solution only makes a single pass across the input.
Related
I have a list that has some elements of type string. Each item in the list has characters that are unwanted and want to be removed. For example, I have the list = ["string1.", "string2."]. The unwanted character is: ".". Therefore, I don't want that character in any element of the list. My desired list should look like list = ["string1", "string2"] Any help? I have to remove some special characters; therefore, the code must be used several times.
hola = ["holamundoh","holah","holish"]
print(hola[0])
print(hola[0][0])
for i in range(0,len(hola),1):
for j in range(0,len(hola[i]),1):
if (hola[i][j] == "h"):
hola[i] = hola[i].translate({ord('h'): None})
print(hola)
However, I have an error in the conditional if: "string index out of range". Any help? thanks
Modifying strings is not efficient in python because strings are immutable. And when you modify them, the indices may become out of range at the end of the day.
list_ = ["string1.", "string2."]
for i, s in enumerate(list_):
l[i] = s.replace('.', '')
Or, without a loop:
list_ = ["string1.", "string2."]
list_ = list(map(lambda s: s.replace('.', ''), list_))
You can define the function for removing an unwanted character.
def remove_unwanted(original, unwanted):
return [x.replace(unwanted, "") for x in original]
Then you can call this function like the following to get the result.
print(remove_unwanted(hola, "."))
Use str.replace for simple replacements:
lst = [s.replace('.', '') for s in lst]
Or use re.sub for more powerful and more complex regular expression-based replacements:
import re
lst = [re.sub(r'[.]', '', s) for s in lst]
Here are a few examples of more complex replacements that you may find useful, e.g., replace everything that is not a word character:
import re
lst = [re.sub(r'[\W]+', '', s) for s in lst]
I have elements in a list as:
temp_list = ["% Work\n"," Hard\n"," Or\n"," Go\n"," Home\n","%","% Happy Coding","%"]
I want to achieve this:
final_list = ["Work Hard Or Go Home","Happy Coding"]
The percentage sign in the elements is the separator between two comments of new lines.
Join the words and then split on %:
temp_list = ["% Work\n"," Hard\n"," Or\n"," Go\n"," Home\n","%","% Happy Coding","%"]
final_list = []
for line in map(str.strip, "".join(temp_list).split("%")):
if not line:
continue
final_list.append(line.replace("\n", ""))
print(final_list)
Prints:
['Work Hard Or Go Home', 'Happy Coding']
You can accomplish this using the iterators map, filter, and some string functions lstrip and replace
map takes a function and an iterator applies the function to every element and returns a new iterator
filter takes a function and an iterable, removes elements who do not return true
when its function is called on them.
lstrip remove whitespace from left side of string
replace(a,b) replaces a with b in string
flat = ""
# Make a normal string from your array
for elem in temp_list:
flat += elem
# First separate string by %
# Next filter out empty list elements
# Replace every \n with nothing and remove whitespace from left side.
your_groupings = list(
map(lambda el: el.replace("\n","").lstrip(),
filter(lambda el: len(el) != 0,
flat.split("%"))))
print(your_groupings)
> ['Work Hard Or Go Home', 'Happy Coding']
ls=[]
msg=""
for i in temp_list:
if i=="%":
ls.append (msg [1:].strip ().replace ("\n",""))
msg=""
else:
msg+=i
print(ls)
Here.. check for "%" if element is "%" then u need add msg to the list by removing spaces and replacing "\n" with "". Else append i to msg
I'm trying to extract numbers that are mixed in sentences. I am doing this by splitting the sentence into elements of a list, and then I will iterate through each character of each element to find the numbers. For example:
String = "is2 Thi1s T4est 3a"
LP = String.split()
for e in LP:
for i in e:
if i in ('123456789'):
result += i
This can give me the result I want, which is ['2', '1', '4', '3']. Now I want to write this in list comprehension. After reading the List comprehension on a nested list?
post I understood that the right code shall be:
[i for e in LP for i in e if i in ('123456789') ]
My original code for the list comprehension approach was wrong, but I'm trying to wrap my heads around the result I get from it.
My original incorrect code, which reversed the order:
[i for i in e for e in LP if i in ('123456789') ]
The result I get from that is:
['3', '3', '3', '3']
Could anyone explain the process that leads to this result please?
Just reverse the same process you found in the other post. Nest the loops in the same order:
for i in e:
for e in LP:
if i in ('123456789'):
print(i)
The code requires both e and LP to be set beforehand, so the outcome you see depends entirely on other code run before your list comprehension.
If we presume that e was set to '3a' (the last element in LP from your code that ran full loopss), then for i in e will run twice, first with i set to '3'. We then get a nested loop, for e in LP, and given your output, LP is 4 elements long. So that iterates 4 times, and each iteration, i == '3' so the if test passes and '3' is added to the output. The next iteration of for i in e: sets i = 'a', the inner loop runs 4 times again, but not the if test fails.
However, we can't know for certain, because we don't know what code was run last in your environment that set e and LP to begin with.
I'm not sure why your original code uses str.split(), then iterates over all the characters of each word. Whitespace would never pass your if filter anyway, so you could just loop directly over the full String value. The if test can be replaced with a str.isdigit() test:
digits = [char for char in String if char.isdigit()]
or a even a regular expression:
digits = re.findall(r'\d', String)
and finally, if this is a reordering puzzle, you'd want to split out your strings into a number (for ordering) and the remainder (for joining); sort the words on the extracted number, and extract the remainder after sorting:
# to sort on numbers, extract the digits and turn to an integer
sortkey = lambda w: int(re.search(r'\d+', w).group())
# 'is2' -> 2, 'Th1s1' -> 1, etc.
# sort the words by sort key
reordered = sorted(String.split(), key=sortkey)
# -> ['Thi1s', 'is2', '3a', 'T4est']
# replace digits in the words and join again
rejoined = ' '.join(re.sub(r'\d+', '', w) for w in reordered)
# -> 'This is a Test'
From the question you asked in a comment ("how would you proceed to reorder the words using the list that we got as index?"):
We can use custom sorting to accomplish this. (Note that regex is not required, but makes it slightly simpler. Use any method to extract the number out of the string.)
import re
test_string = 'is2 Thi1s T4est 3a'
words = test_string.split()
words.sort(key=lambda s: int(re.search(r'\d+', s).group()))
print(words) # ['Thi1s', 'is2', '3a', 'T4est']
To remove the numbers:
words = [re.sub(r'\d', '', w) for w in words]
Final output is:
['This', 'is', 'a', 'Test']
I'm new to coding and I was trying to make a script that will join the tuples inside 'sl', which are a sequence of letters, into a new tuple called 's' with the items as strings. and then print out the longest string inside s.
this is the code I came up with (or short version). When I try to print the max item of 's' in this code, returns a
max() arg is empty
error.
sl = [['m','o','o','n'],['d','a','y'],['h','e','l','l','o']]
s = []
s = (''.join(i) for i in sl) # join the letters inside sl, put them into s
print(max(s, key=len)) # print longest string inside s
but I still can iterate throught s with:
for i in s:
print(i)
and will print the words inside s, joined
I suppose that (''.join(i) for i in sl) isnt properly joining them as strings. Is there a way that the words inside 's' are join as strings?
It works, just replace () with []
sl = [['m','o','o','n'],['d','a','y'],['h','e','l','l','o']]
s = []
s = [''.join(i) for i in sl]
print(s)
print(max(s, key=len))
I have a list that contain many elements.
I was able to find a way to remove duplicates, blank values, and white space.
The only thing left is to:
remove any thing that contain (ae) string.
remove from the list any thing that contain the period (.)
Order of the resulting list is not important.
The final list should only contain:
FinalList = ['eth-1/1/0', 'jh-3/0/1', 'eth-5/0/0','jh-5/9/9']
Code:
XYList = ['eth-1/1/0', 'ae1', 'eth-1/1/0', 'eth-1/1/0', 'ae1', 'jh-3/0/1','jh-5/9/9', 'jh-3/0/1.3321', 'jh-3/0/1.53', 'ae0', '', 'eth-5/0/0', 'ae0', '', 'eth-5/0/0', 'ae0', 'eth-5/0/0', '', 'jh-2.1.2']
XYUnique = set(XYList)
XYNoBlanks = (filter(None,XY))
RemovedWhitespace = [item.strip() for item in XYNoBlanks]
# the order of the list is not important
# the final result should be
FinalList = ['eth-1/1/0', 'jh-3/0/1', 'eth-5/0/0','jh-5/9/9']
The entire conversion sequence (excluding uniqueness) can be accomplished with a list comprehension:
FinalList = [elem.strip() for elem in set(XYList) if elem and "." not in elem and "ae" not in elem]
filtered_l = [s for s in XYList if 'ae' not in s and '.' not in s]