Reversing list with strings in python - python

I have this list,
last_names = [
'Hag ', 'Hag ', 'Basmestad ', 'Grimlavaag ', 'Kleivesund ',
'Fintenes ', 'Svalesand ', 'Molteby ', 'Hegesen ']
and I want to print i reversed, so 'Hegesen' comes first, then ' Molteby' and at the end 'Hag'.
I have tried last_names.reverse(), but that returnes None..
Any help?

.reverse returns None because it reverses in-place:
>>> last_names = [
... 'Hag ', 'Hag ', 'Basmestad ', 'Grimlavaag ', 'Kleivesund ',
... 'Fintenes ', 'Svalesand ', 'Molteby ', 'Hegesen ']
>>> last_names.reverse()
>>> last_names
['Hegesen ', 'Molteby ', 'Svalesand ', 'Fintenes ', 'Kleivesund ', 'Grimlavaag ', 'Basmestad ', 'Hag ', 'Hag ']
To do this in an expression, do last_names[::-1].

As stated before, .reverse reverses the list in place, a more pythonic way to reverse a list and return it, is to use reversed:
>>> list(reversed([1,2,3]))
[3, 2, 1]

Related

How to find the longest continuous stretch of matching elements in 2 lists

I have 2 lists:
a = [
'Okay. ',
'Yeah. ',
'So ',
'my ',
'thinking ',
'is, ',
'so ',
'when ',
"it's ",
'set ',
'up ',
'just ',
'one ',
'and ',
"we're ",
'like ',
'next ',
'to ',
'each ',
'other '
]
b = [
'Okay. ',
'Yeah. ',
'Everything ',
'as ',
'normal ',
'as ',
'possible. ',
'Yeah. ',
'Yeah. ',
'Okay. ',
'Is ',
'that ',
'better? ',
'Yeah. ',
'So ',
'my ',
'thinking ',
'is, ',
'so ',
'when '
]
Each list is slightly different. However, there will be moments when a stretch of continuous elements in a will match a stretch of continuous elements in b.
For example:
The first 2 elements in both lists match. The matching list would be ['Okay.', 'Yeah.']. This is only 2 elements long.
There is a longer stretch of matching words. You can see that each contains the following continuous set:
['Yeah. ','So ','my ','thinking ','is, ','so ','when ']
This continuous matching sequence has 7 elements. This is the longest sequence.
I want the index of where this sequence starts for each list. For a, this should be 1 and for b this should be 13.
I understand that I can make every possible ordered sequence in a, starting with the longest, and check for a match in b, stopping once I get the match. However, this seems inefficent.
How I would solve this:
from difflib import SequenceMatcher
match = SequenceMatcher(None, a, b).find_longest_match()
print(a[match.a:match.a + match.size])
print(b[match.b:match.b + match.size])
You get:
['Yeah. ', 'So ', 'my ', 'thinking ', 'is, ', 'so ', 'when ']
['Yeah. ', 'So ', 'my ', 'thinking ', 'is, ', 'so ', 'when ']
So, we start from the top of 'a', and search through 'b' to find the longest match. Since this only continues as long as there is a match, it isn't terribly inefficient.
a = [
'Okay. ',
'Yeah. ',
'So ',
'my ',
'thinking ',
'is, ',
'so ',
'when ',
"it's ",
'set ',
'up ',
'just ',
'one ',
'and ',
"we're ",
'like ',
'next ',
'to ',
'each ',
'other '
]
b = [
'Okay. ',
'Yeah. ',
'Everything ',
'as ',
'normal ',
'as ',
'possible. ',
'Yeah. ',
'Yeah. ',
'Okay. ',
'Is ',
'that ',
'better? ',
'Yeah. ',
'So ',
'my ',
'thinking ',
'is, ',
'so ',
'when '
]
start = None
maxlen = 0
for i in range(len(a)):
for j in range(len(b)):
for k in range(min(len(a)-i,len(b)-j)):
if a[i+k] != b[j+k]:
break
if k > maxlen:
start = (i,j)
maxlen = k
print(start,maxlen)
Output:
(1, 13) 6

Insert 'BCH' into map

I wanted to insert 'BCH' inside a specific location in a list, but it gave me an error message.
Here is my code:
map = [[' ', ' ', ' ', ' '], \
[' ', ' ', ' ', ' '], \
[' ', ' ', ' ', ' '], \
[' ', ' ', ' ', ' ']
]
building = 'BCH'
map[0][1].append(building)
The error message they gave was "AttributeError: 'str' object has no attribute 'append' "
Strings are immutable, you can't use .append() on it. If you want to concatenate to the string, use an assignment.
map[0][1] += building

python - is it possible to compare the list between 2 lists using the specific digit?

I am a new student who is learning to programme with python and I have 2 example lists which are
selected_ipc = ['H01L']
df = [[ 'F24J3/02 ', 'A123'], [ 'G01N31/10 ', 'A124'], [ 'H01L27/14 ', 'A125'], ['G21H1/10 ', 'A126'], ['H01L21/36 ', 'A127']]
I have created a simple code like this
for item in selected_ipc:
for item1 in df:
if item == item1:
print (item)
else:
print("No match")
and the results are returned 'No match' while my expected result is
[[ 'H01L27/14 ', 'A125'], ['H01L21/36 ', 'A127']]
therefore, I would like to ask is it possible to compare the first list with the first 4 digits in the second list?
thank you in advance
You could use startswith:
selected_ipc = ['H01L']
df = ['F24J3/02 ', 'G01N31/10 ', 'H01L27/14 ', 'G21H1/10 ', 'H01L21/36 ']
for item in selected_ipc:
for item1 in df:
if item1.startswith(item):
print(item1)
else:
print("No match")
Output
No match
No match
H01L27/14
No match
H01L21/36
UPDATE
For a nested list you could use a list comprehension:
selected_ipc = ['H01L']
df = [['F24J3/02 ', 'A123'], ['G01N31/10 ', 'A124'], ['H01L27/14 ', 'A125'], ['G21H1/10 ', 'A126'],
['H01L21/36 ', 'A127']]
result = [lst for lst in df if any(lst[0].startswith(e) for e in selected_ipc)]
print(result)
Output
[['H01L27/14 ', 'A125'], ['H01L21/36 ', 'A127']]
As an alternative you could use a less pythonic way with two loops:
selected_ipc = ['H01L']
df = [['F24J3/02 ', 'A123'], ['G01N31/10 ', 'A124'], ['H01L27/14 ', 'A125'], ['G21H1/10 ', 'A126'],
['H01L21/36 ', 'A127']]
result = []
for lst in df:
found = False
for e in selected_ipc:
if lst[0].startswith(e):
found = True
result.append(lst)
break
if not found:
print("No match")
print(result)
Output
No match
No match
No match
[['H01L27/14 ', 'A125'], ['H01L21/36 ', 'A127']]
selected_ipc = ['H01L']
df = ['F24J3/02 ', 'G01N31/10 ', 'H01L27/14 ', 'G21H1/10 ', 'H01L21/36 ']
l = []
for i in df:
if selected_ipc[0] in i:
l.append(i)
print l
you can do it with list comprehensions like below
selected_ipc = ['H01L']
df = ['F24J3/02 ', 'G01N31/10 ', 'H01L27/14 ', 'G21H1/10 ', 'H01L21/36 ']
for item in selected_ipc:
match_lst = [item1 for item1 in df if item in item1]
print(match_lst)
UPDATE
If you want check for the other elements(instead of first one) of the lists in list "df" then you can checkout the below code
selected_ipc = ['H01L', 'G01N', 'A126']
df = [['F24J3/02 ', 'A123'], ['G01N31/10 ', 'A124'], ['H01L27/14 ', 'A125'], ['G21H1/10 ', 'A126'],
['H01L21/36 ', 'A127']]
match_lst = [item1 for item1 in df if any(i.startswith(item) for item in selected_ipc for i in item1)]
print(match_lst)
Output
[['G01N31/10 ', 'A124'], ['H01L27/14 ', 'A125'], ['G21H1/10 ', 'A126'], ['H01L21/36 ', 'A127']]
Use list comprehension check if the key is in the item if so add it to your list
res = [i for i in df if selected_ipc[0] in i[0]]
# [['H01L27/14 ', 'A125'], ['H01L21/36 ', 'A127']]

How to escape specific whitespaces when splitting line into words with regex

I want to split a string into a list of words (here "word" means arbitrary sequence of non-whitespace characters), but also keep the groups of consecutive whitespaces that have been used as separators (because the number of whitespaces is significant in my data). For this simple task, I know that the following regex would do the job (I use Python as an illustrative language, but the code can be easily adapted to any language including regexes):
import re
regexA = re.compile(r"(\S+)")
print(regexA.split("aa b+b cc dd! :ee "))
produces the expected output:
['', 'aa', ' ', 'b+b', ' ', 'cc', ' ', 'dd!', ' ', ':ee', ' ']
Now the hard part: when a word includes an opening parenthesis, all the whitespaces encountered until the matching closing parenthesis should not be considered as word separators. In other words:
regexB.split("aa b+b cc(dd! :ee (ff gg) hh) ii ")
should produce:
['', 'aa', ' ', 'b+b', ' ', 'cc(dd! :ee (ff gg) hh)', ' ', 'ii', ' ']
Using
regexB = re.compile(r'([^(\s]*\([^)]*\)|\S+)')
works for a single pair of parentheses, but fails when there are inner parentheses. How could I improve the regex to correctly skip inner parentheses?
And the final question: in my data, only words starting with % should be tested for the "parenthesis rule" (regexB), the other words should be treated by regexA. I have no idea how to combine two regexes in a single split.
Any hint is warmly welcome...
In the PCRE regex engine, sub-routine is supported and recursive pattern seems workable for the case including balanced nested parentheses.
(?m)\s+(?=[^()]*(\([^()]*(?1)?[^()]*\))*[^()]*$)
Demo,,, in which (?1) means calling sub-routine 1, (\([^()]*(?1)?[^()]*\)), namely recursive pattern which includes caller, (?1)
But python does not support sub-routinepattern in regex.
So I tried first replacing every ( , ) with another distinctive character( # in this example) and applying the regex to split and finally turn # back to ( or ) respectively in my pythone script.
Regex for spliting.
(?m)(\s+)(?=[^#]*(?:(?:#[^#]*){2})*$)
Demo,,, in which I changed your separator \S+ to consecutive spaces \s+ because #,(,) are included in [\S]' possible characters set.
Python script may be like this
import re
ss="""aa b+b cc(dd! :ee ((ff gg)) hh) ii """
ss=re.sub(r"\(|\)","#",ss) #repacing every `(`,`)` to `#`
regx=re.compile(r"(?m)(\s+)(?=[^#]*(?:(?:#[^#]*){2})*$)")
m=regx.split(ss)
for i in range(len(m)): # turn `#` back to `(` or `)` respectively
n= m[i].count('#')
if n < 2: continue
else:
for j in range(int(n/2)):
k=m[i].find('#'); m[i]=m[i][:k]+'('+m[i][k+1:]
m[i]= m[i].replace("#",')')
print(m)
Output is
['aa', ' ', 'b+b', ' ', 'cc(dd! :ee ((ff gg)) hh)', ' ', 'ii', ' ', '']
Finally after having tested several ideas based on the answers proposed by #Wiktor Stribiżew and #Thm Lee, I came to bunch of solutions dealing with different levels of complexity. To reduce dependency, I wanted to stick to the re module from the Python standard library, so here is the code:
import re
text = "aa b%b( %cc(dd! (:ee ff) gg) %hh ii) "
# Solution 1: don't process parentheses at all
regexA = re.compile(r'(\S+)')
print(regexA.split(text))
# Solution 2: works for non-nested parentheses
regexB = re.compile(r'(%[^(\s]*\([^)]*\)|\S+)')
print(regexB.split(text))
# Solution 3: works for one level of nested parentheses
regexC = re.compile(r'(%[^(\s]*\((?:[^()]*\([^)]*\))*[^)]*\)|\S+)')
print(regexC.split(text))
# Solution 4: works for arbitrary levels of nested parentheses
n, words = 0, []
for word in regexA.split(text):
if n: words[-1] += word
else: words.append(word)
if n or (word and word[0] == '%'):
n += word.count('(') - word.count(')')
print(words)
Here is the generated output:
Solution 1: ['', 'aa', ' ', 'b%b(', ' ', '%cc(dd!', ' ', '(:ee', ' ', 'ff)', ' ', 'gg)', ' ', '%hh', ' ', 'ii)', ' ']
Solution 2: ['', 'aa', ' ', 'b%b(', ' ', '%cc(dd! (:ee ff)', ' ', 'gg)', ' ', '%hh', ' ', 'ii)', ' ']
Solution 3: ['', 'aa', ' ', 'b%b(', ' ', '%cc(dd! (:ee ff) gg)', ' ', '%hh', ' ', 'ii)', ' ']
Solution 4: ['', 'aa', ' ', 'b%b(', ' ', '%cc(dd! (:ee ff) gg)', ' ', '%hh', ' ', 'ii)', ' ']
As stated in the OP, for my specific data, escaping whitespaces in parentheses has only to be done for words starting with %, other parentheses (e.g. word b%b( in my example) are not considered are special. If you want to escape whitespaces inside any pair of parentheses, simply remove the %char in the regexes. Here is the result with that modification:
Solution 1: ['', 'aa', ' ', 'b%b(', ' ', '%cc(dd!', ' ', '(:ee', ' ', 'ff)', ' ', 'gg)', ' ', '%hh', ' ', 'ii)', ' ']
Solution 2: ['', 'aa', ' ', 'b%b( %cc(dd! (:ee ff)', ' ', 'gg)', ' ', '%hh', ' ', 'ii)', ' ']
Solution 3: ['', 'aa', ' ', 'b%b( %cc(dd! (:ee ff) gg)', ' ', '%hh', ' ', 'ii)', ' ']
Solution 4: ['', 'aa', ' ', 'b%b( %cc(dd! (:ee ff) gg) %hh ii)', ' ']

Removing all punctuation from a list and return the entire list in Python

I have a list that I'm trying to strip all punctuation and the character "·" from and then returning that list without any of the above. However, when I try to return the list, only the first word of the list appears and I'm not sure where I went wrong with this.
Here is the list I'm trying to strip punctuation from:
['in·vis·i·ble', 'in·vis·i·bil·i·ty, ', 'in·vis·i·ble·ness, ', 'in·vis·i·bly, ', 'qua·si-in·vis·i·ble, ', 'qua·si-in·vis·i·bly, ', 'inˌvisiˈbility, ', 'inˈvisibleness, ', 'inˈvisibly, ']
Here's what I'm getting: ['invisible']
Here is a portion of my code (it's part of a larger function)
syl = []
for words in span:
if words not in syl:
syl.append(words)
for text in syl:
drop_sep = re.sub(r'·', '', text)
return drop_sep
Use a list comprehension where each element of the resulting list is a string with all occurrences of dot substring '·' replaced by the void '':
[word.replace('·', '') for word in words]
Example
>>> words = ['in·vis·i·ble',
... 'in·vis·i·bil·i·ty, ',
... 'in·vis·i·ble·ness, ',
... 'in·vis·i·bly, ',
... 'qua·si-in·vis·i·ble, ',
... 'qua·si-in·vis·i·bly, ',
... 'inˌvisiˈbility, ',
... 'inˈvisibleness, ',
... 'inˈvisibly, ']
>>>
>>> from pprint import pprint
>>> pprint([word.replace('·', '') for word in words])
['invisible',
'invisibility, ',
'invisibleness, ',
'invisibly, ',
'quasi-invisible, ',
'quasi-invisibly, ',
'inˌvisiˈbility, ',
'inˈvisibleness, ',
'inˈvisibly, ']

Categories