I have a problem, in a 2D list:
t = [['\n'], ['1', '1', '1', '1\n']]
I want to remove the "\n" from the nested lists.
You can strip all strings in the nested lists:
t = [[s.strip() for s in nested] for nested in t]
This would remove all whitespace (spaces, tabs, newlines, etc.) from the start and end of each string.
Use str.rstrip('\n') if you need to be more precise:
t = [[s.rstrip('\n') for s in nested] for nested in t]
If you need to remove empty values too, you may have to filter twice:
t = [[s.rstrip('\n') for s in nested if not s.isspace()] for nested in t]
t = [nested for nested in t if nested]
where the first line only includes a stripped string if it contains more than just whitespace, and the second loop removes entirely empty lists. In Python 2, you could also use:
t = filter(None, nested)
for the latter line.
Related
Input is given in ONE stretch as:
'[[F1,S1],[F2,S2],[F3,S3],[F1,S2],[F2,S3],[F3,S2],[F2,S1],[F4,S1],[F4,S3],[F5,S1]]'
and I want to convert the "string of a list of lists" into a "list of lists with all individual elements as strings"
[['F1','S1'],['F2','S2'],['F3','S3'],['F1','S2'],['F2','S3'],['F3','S2'],['F2','S1'],['F4','S1'],['F4','S3'],['F5','S1']]
How to?
I'm going to make the assumption that the input string is always formatted without any whitespace around the characters [, , or ]. This can be achieved without anything fancy or dangerous like eval:
Remove the [[ and ]] from the start and end with a string slice.
Then, split on ],[ which separates the inner lists from each other.
Then, split each inner list on , which separates the elements from each other.
There are two special cases to deal with. First, if the outer list is empty, then the string doesn't begin or end with [[ and ]]. Second, if one of the inner lists is empty, the result of split will produce a list containing a single empty string, when the correct output should be an empty list.
def parse_2d_list(s):
if s == '[]':
return []
else:
parts = s[2:-2].split('],[')
return [p.split(',') if p else [] for p in parts]
Output:
>>> parse_2d_list('[[F1,S1],[F2,S2],[F3,S3]]')
[['F1', 'S1'], ['F2', 'S2'], ['F3', 'S3']]
This will be better
def parse_2d_list(s):
parts = s[2:-2].split('],[')
return [p.split(',') for p in parts]
I have a list and I want to find if the string is present in the list of strings.
li = ['Convenience','Telecom Pharmacy']
txt = '1 convenience store'
I want to match the txt with the Convenience from the list.
I have tried
if any(txt.lower() in s.lower() for s in li):
print s
print [s for s in li if txt in s]
Both the methods didn't give the output.
How to match the substring with the list?
You could use set() and intersection:
In [19]: set.intersection(set(txt.lower().split()), set(s.lower() for s in list1))
Out[19]: {'convenience'}
I think split is your answer. Here is the description from the python documentation:
string.split(s[, sep[, maxsplit]])
Return a list of the words of the string s. If the optional second argument sep is absent or None, the words are separated by arbitrary
strings of whitespace characters (space, tab, newline, return,
formfeed). If the second argument sep is present and not None, it
specifies a string to be used as the word separator. The returned list
will then have one more item than the number of non-overlapping
occurrences of the separator in the string. If maxsplit is given, at
most maxsplit number of splits occur, and the remainder of the string
is returned as the final element of the list (thus, the list will have
at most maxsplit+1 elements). If maxsplit is not specified or -1, then
there is no limit on the number of splits (all possible splits are
made).
The behavior of split on an empty string depends on the value of sep. If sep is not specified, or specified as None, the result will be
an empty list. If sep is specified as any string, the result will be a
list containing one element which is an empty string.
Use the split command on your txt variable. It will give you a list back. You can then do a compare on the two lists to find any matches. I personally would write the nested for loops to check the lists manually, but python provides lots of tools for the job. The following link discusses different approaches to matching two lists.
How can I compare two lists in python and return matches
Enjoy. :-)
I see two things.
Do you want to find if the pattern string matches EXACTLY an item in the list? In this case, nothing simpler:
if txt in list1:
#do something
You can also do txt.upper() or .lower() if you want list case insensitive
But If you want as I understand, to find if there is a string (in the list) which is part of txt, you have to use "for" loop:
def find(list1, txt):
#return item if found, false otherwise
for i in list1:
if i.upper() in txt.upper(): return i
return False
It should work.
Console output:
>>>print(find(['Convenience','Telecom Pharmacy'], '1 convenience store'))
Convenience
>>>
You can try this,
>> list1 = ['Convenience','Telecom Pharmacy']
>> txt = '1 convenience store'
>> filter(lambda x: txt.lower().find(x.lower()) >= 0, list1)
['Convenience']
# Or you can use this as well
>> filter(lambda x: x.lower() in txt.lower(), list1)
['Convenience']
I am currently filtering out all non-alphanumeric characters from this list.
cleanlist = []
for s in dirtylist:
s = re.sub("[^A-Za-z0-9]", "", str(s))
cleanlist.append(s)
What would be the most efficient way to also filter out whitespaces from this list?
this will strip whitespace from strings and wont add empty strings to your cleanlist
cleanlist = []
for s in dirtylist:
s = re.sub("[^A-Za-z0-9]", "", str(s).strip())
if s:
cleanlist.append(s)
I'd actually go and use list comprehension for this, but your code is already efficient.
pattern = re.compile("[^A-Za-z0-9]")
cleanlist = [pattern.sub('', s) for s in dirtylist if str(s)]
Also, this is a duplicate: Stripping everything but alphanumeric chars from a string in Python
The largest efficiency comes from using the full power of regular expression processing: don't iterate through the list.
Second, do not convert individual characters from string to string. Very simply:
cleanlist = re.sub("[^A-Za-z0-9]+", "", dirtylist)
Just to be sure, I tested this against a couple of list comprehension and string replacement methods; the above is the fastest by at least 20%.
I would like to merge certain parts of a list together depending on whether a comma is present. If I user inputs: "1231,fdkgjdkfj45,294d", I would like it to be converted to ["1231", "45", "294"]. I am able to delete everything in the list that isn't a number (using list comprehension) but I would like the program to recognize where a comma is, then merge the items in the list prior to the comma together (up until the previous comma).
I understand I haven't worded this amazingly but I think you should be able to understand what I mean.
The steps I feel are necessary are as follow:
Delete everything in the list that isn't a number or a comma (Done this, using another list and list comprehension)
Check if there are any commas next to each other and then delete duplicates. (I should be able to do this rather easily)
Use a "for" loop to check the positions in the list, and when finding a comma, merge all items in the list prior to this comma, and up until the previous comma, together. (This is what I cannot do)
Any responses would be highly appreciated.
You can split the list on the ',' character, then iterate over each sublist and join the characters that are digits.
>>> s = "1231,fdkgjdkfj45,294d"
>>> [''.join(i for i in chunk if i.isdigit()) for chunk in s.split(',')]
['1231', '45', '294']
If you are not yet familiar with list comprehensions (which is what is shown above) here is a more step-by-step solution that is approximately equivalent
numList = []
for chunk in s.split(','):
digits = []
for char in chunk:
if char.isdigit():
digits.append(char)
numList.append(''.join(digits))
>>> numList
['1231', '45', '294']
You just need regex..
>>> import re
>>> str_="1231,fdkgjdkfj45,294d"
>>> re.findall(r'[0-9]+',str_) #[0-9] tells regex to look for digits only while + tells to look for one or more of them
['1231', '45', '294']
I need to split a string. I am using this:
def ParseStringFile(string):
p = re.compile('\W+')
result = p.split(string)
But I have an error: my result has two empty strings (''), one before 'Лев'. How do I get rid of them?
As nhahtdh pointed out, the empty string is expected since there's a \n at the start and end of the string, but if they bother you, you can filter them very quickly and efficiently.
>>> filter(None, ['', 'text', 'more text', ''])
['text', 'more text']
You could remove all newlines from the string before matching it:
p.split(string.strip('\n'))
Alternatively, split the string and then remove the first and last element:
result = p.split(string)[1:-1]
The [1:-1] takes a copy of the result and includes all indexes starting at 1 (i.e. removing the first element), and ending at -2 (i.e. the second to last element. The second index is exclusive)
A longer and less elegant alternative would be to modify the list in-place:
result = p.split(string)
del result[-1] # remove last element
del result[0] # remove first element
Note that in these two solutions the first and last element must be the empty string. If sometimes the input doesn't contain these empty strings at the beginning or end, then they will misbehave. However they are also the fastest solutions.
If you want to remove all empty strings in the result, even if they happen inside the list of results you can use a list-comprehension:
[word for word in p.split(string) if word]