I have a list that contain many elements.
I was able to find a way to remove duplicates, blank values, and white space.
The only thing left is to:
remove any thing that contain (ae) string.
remove from the list any thing that contain the period (.)
Order of the resulting list is not important.
The final list should only contain:
FinalList = ['eth-1/1/0', 'jh-3/0/1', 'eth-5/0/0','jh-5/9/9']
Code:
XYList = ['eth-1/1/0', 'ae1', 'eth-1/1/0', 'eth-1/1/0', 'ae1', 'jh-3/0/1','jh-5/9/9', 'jh-3/0/1.3321', 'jh-3/0/1.53', 'ae0', '', 'eth-5/0/0', 'ae0', '', 'eth-5/0/0', 'ae0', 'eth-5/0/0', '', 'jh-2.1.2']
XYUnique = set(XYList)
XYNoBlanks = (filter(None,XY))
RemovedWhitespace = [item.strip() for item in XYNoBlanks]
# the order of the list is not important
# the final result should be
FinalList = ['eth-1/1/0', 'jh-3/0/1', 'eth-5/0/0','jh-5/9/9']
The entire conversion sequence (excluding uniqueness) can be accomplished with a list comprehension:
FinalList = [elem.strip() for elem in set(XYList) if elem and "." not in elem and "ae" not in elem]
filtered_l = [s for s in XYList if 'ae' not in s and '.' not in s]
Related
I have a list that has some elements of type string. Each item in the list has characters that are unwanted and want to be removed. For example, I have the list = ["string1.", "string2."]. The unwanted character is: ".". Therefore, I don't want that character in any element of the list. My desired list should look like list = ["string1", "string2"] Any help? I have to remove some special characters; therefore, the code must be used several times.
hola = ["holamundoh","holah","holish"]
print(hola[0])
print(hola[0][0])
for i in range(0,len(hola),1):
for j in range(0,len(hola[i]),1):
if (hola[i][j] == "h"):
hola[i] = hola[i].translate({ord('h'): None})
print(hola)
However, I have an error in the conditional if: "string index out of range". Any help? thanks
Modifying strings is not efficient in python because strings are immutable. And when you modify them, the indices may become out of range at the end of the day.
list_ = ["string1.", "string2."]
for i, s in enumerate(list_):
l[i] = s.replace('.', '')
Or, without a loop:
list_ = ["string1.", "string2."]
list_ = list(map(lambda s: s.replace('.', ''), list_))
You can define the function for removing an unwanted character.
def remove_unwanted(original, unwanted):
return [x.replace(unwanted, "") for x in original]
Then you can call this function like the following to get the result.
print(remove_unwanted(hola, "."))
Use str.replace for simple replacements:
lst = [s.replace('.', '') for s in lst]
Or use re.sub for more powerful and more complex regular expression-based replacements:
import re
lst = [re.sub(r'[.]', '', s) for s in lst]
Here are a few examples of more complex replacements that you may find useful, e.g., replace everything that is not a word character:
import re
lst = [re.sub(r'[\W]+', '', s) for s in lst]
Given nested list: mistake_list = [['as','as*s','sd','*ssa'],['a','ds','dfg','mal']]
Required output: corrected_list = [['a','ds','dfg','mal']]
Now the given list can contain hundreds or thousands of sublists in which the strings may or may not contain the special character *, but if it does that whole sublist has to be removed.
I have shown an example above where the mistake_list is the input nested list, and corrected_list is the output nested list.
NOTE: all sublists have an equal number of elements (I don't think it is necessary to know this though)
The filter function can help you:
mistake_list = [['as','as*s','sd','*ssa'],['a','ds','dfg','mal']]
corrected_list = list(filter(lambda l: not any("*" in x for x in l), mistake_list))
print(corrected_list)
[['a', 'ds', 'dfg', 'mal']]
You can use list comprehension:
mistake_list = [['as','as*s','sd','*ssa'],['a','ds','dfg','mal']]
corrected_list = [sublst for sublst in mistake_list if not any('*' in s for s in sublst)]
print(corrected_list) # [['a', 'ds', 'dfg', 'mal']]
The filtering condition here checks whether there is any '*' character in each item of sublst.
I have elements in a list as:
temp_list = ["% Work\n"," Hard\n"," Or\n"," Go\n"," Home\n","%","% Happy Coding","%"]
I want to achieve this:
final_list = ["Work Hard Or Go Home","Happy Coding"]
The percentage sign in the elements is the separator between two comments of new lines.
Join the words and then split on %:
temp_list = ["% Work\n"," Hard\n"," Or\n"," Go\n"," Home\n","%","% Happy Coding","%"]
final_list = []
for line in map(str.strip, "".join(temp_list).split("%")):
if not line:
continue
final_list.append(line.replace("\n", ""))
print(final_list)
Prints:
['Work Hard Or Go Home', 'Happy Coding']
You can accomplish this using the iterators map, filter, and some string functions lstrip and replace
map takes a function and an iterator applies the function to every element and returns a new iterator
filter takes a function and an iterable, removes elements who do not return true
when its function is called on them.
lstrip remove whitespace from left side of string
replace(a,b) replaces a with b in string
flat = ""
# Make a normal string from your array
for elem in temp_list:
flat += elem
# First separate string by %
# Next filter out empty list elements
# Replace every \n with nothing and remove whitespace from left side.
your_groupings = list(
map(lambda el: el.replace("\n","").lstrip(),
filter(lambda el: len(el) != 0,
flat.split("%"))))
print(your_groupings)
> ['Work Hard Or Go Home', 'Happy Coding']
ls=[]
msg=""
for i in temp_list:
if i=="%":
ls.append (msg [1:].strip ().replace ("\n",""))
msg=""
else:
msg+=i
print(ls)
Here.. check for "%" if element is "%" then u need add msg to the list by removing spaces and replacing "\n" with "". Else append i to msg
I am new to programming, having a question:
I have two lists:
list = ["ich", "du", "etc", "."]
abbr = ["etc.", "U.S"]
I need to identify abbreviations in the first list using the list of given abbreviations in the second.
I need to go through elements of first list and if element of the list and next element of the list together are contained in abbreviation list, then merge both of them to get list like,
list = ["ich", "du", "etc."]
My problem is how to merge both of them and how to join element to the next element. How I can use here next element instead of "."
for elem in list:
if ''.join((elem, ".")) in abbr:
You can zip the sequence of the list and itself with one index apart by padded by an extra empty string in the end, so that you can iterate through the sequence in pairs, merge the pair when they are found in the abbreviation list (which is more efficient as a set) and skip the next pair since the next string has already been merged:
lst = ["ich", "du", "etc", "."]
abbr = {"etc.", "U.S"}
pairs = zip(lst, lst[1:] + [''])
merged = []
for a, b in pairs:
ab = a + b
if ab in abbr:
merged.append(ab)
next(pairs, None)
else:
merged.append(a)
print(merged)
This outputs:
['ich', 'du', 'etc.']
Note that if you are using Python 2.7 or earlier versions, you can initialize pairs with the iter function added instead:
pairs = iter(zip(lst, lst[1:] + ['']))
You can go like this:
for elem, nextelem in zip(list,list[1:]):
You can do something like below
lst = ["ich", "du", "etc", "."]
abbr = ["etc.", "U.S"]
for elem, nextelem in zip(lst[:-1],lst[1:]):
if elem + nextelem in abbr:
lst.remove(elem)
lst.remove(nextelem)
lst.append(elem + nextelem)
print(lst)
Output
['ich', 'du', 'etc.']
My list looks like this :
['', 'CCCTTTCGCGACTAGCTAATCTGGCATTGTCAATACAGCGACGTTTCCGTTACCCGGGTGCTGACTTCATACTT
CGAAGA', 'ACCGGGCCGCGGCTACTGGACCCATATCATGAACCGCAGGTG', '', '', 'AGATAAGCGTATCACG
ACCTCGTGATTAGCTTCGTGGCTACGGAAGACCGCAACAGGCCGCTCTTCTGATAAGTGTGCGG', '', '', 'ATTG
TCTTACCTCTGGTGGCATTGCAACAATGCAAATGAGAGTCACAAGATTTTTCTCCGCCCGAGAATTTCAAAGCTGT', '
TGAAGAGAGGGTCGCTAATTCGCAATTTTTAACCAAAAGGCGTGAAGGAATGTTTGCAGCTACGTCCGAAGGGCCACATA
', 'TTTTTTTAGCACTATCCGTAAATGGAAGGTACGATCCAGTCGACTAT', '', '', 'CCATGGACGGTTGGGGG
CCACTAGCTCAATAACCAACCCACCCCGGCAATTTTAACGTATCGCGCGGATATGTTGGCCTC', 'GACAGAGACGAGT
TCCGGAACTTTCTGCCTTCACACGAGCGGTTGTCTGACGTCAACCACACAGTGTGTGTGCGTAAATT', 'GGCGGGTGT
CCAGGAGAACTTCCCTGAAAACGATCGATGACCTAATAGGTAA', '']
Those are sample DNA sequences read from a file. The list can have various length, and one sequence can have 10 as well as 10,000 letters. In a source file, they are delimited by empty lines, hence empty items in list. How can I join all items in between empty ones ?
Try this, it's a quick and dirty solution that works fine, but won't be efficient if the input list is really big:
lst = ['GATTACA', 'etc']
[x for x in ''.join(',' if not e else e for e in lst).split(',') if x]
This is how it works, using generator expressions and list comprehensions from the inside-out:
',' if not e else e for e in lst : replace all '' strings in the list with ','
''.join(',' if not e else e for e in lst) : join together all the strings. Now the spaces between sequences will be separated by one or more ,
''.join(',' if not e else e for e in lst).split(',') : split the string at the points where there are , characters, this produces a list
[x for x in ''.join(',' if not e else e for e in lst).split(',') if x] : finally, remove the empty strings, leaving a list of sequences
Alternatively, the same functionality could be written in a longer way using explicit loops, like this:
answer = [] # final answer
partial = [] # partial answer
for e in lst:
if e == '': # if current element is an empty string …
if partial: # … and there's a partial answer
answer.append(''.join(partial)) # join and append partial answer
partial = [] # reset partial answer
else: # otherwise it's a new element of partial answer
partial.append(e) # add it to partial answer
else: # this part executes after the loop exits
if partial: # if one partial answer is left
answer.append(''.join(partial)) # add it to final answer
The idea is the same: we keep track of the non empty-strings and accumulate them, and whenever an empty string is found, we add all the accumulated values to the answer, taking care of adding the last sublist after the loop ends. The result ends up in the answer variable, and this solution only makes a single pass across the input.