I am trying to extract just the data upto and including the $ symbol from a spreadsheet.
I have isolated the data to give me just the column containing the data but what I am trying to do is extract any and all symbols that follow a $ symbol.
For example:
$AAPL $LOW $TSLA and so on from the entire dataset but I don't need or want $1000 $600 and so on - just letters only and either a period or a space follows but just the characters a-z is what I am trying to get.
I haven't been successful in full extraction and my code is starting to get messy so I'll provide the code that will bring back the data for you to see for yourself. I am using Jupyter Notebook.
import mysql.connector
import pandas
googleSheedID = '15fhpxqWDRWkNtEFhi9bQyWUg8pDn4B-R2N18s1xFYTU'
worksheetName = 'Sheet1'
URL = 'https://docs.google.com/spreadsheets/d/{0}/gviz/tq?tqx=out:csv&sheet={1}'.format(
googleSheedID,
worksheetName
)
df = pandas.read_csv(URL)
del df['DATE']
del df['USERNAME']
del df['LINK']
del df['LINK2']
df[df["TWEET"].str.contains("RT")==False]
print(df)
Not sure if I understand what you want correctly, but the following codes give all elements that comes after $ before (blank space).
import mysql.connector
import pandas
googleSheedID = '15fhpxqWDRWkNtEFhi9bQyWUg8pDn4B-R2N18s1xFYTU'
worksheetName = 'Sheet1'
URL = 'https://docs.google.com/spreadsheets/d/{0}/gviz/tq?tqx=out:csv&sheet={1}'.format(
googleSheedID,
worksheetName
)
df = pandas.read_csv(URL)
del df['DATE']
del df['USERNAME']
del df['LINK']
del df['LINK2']
unique_results = []
for i in range(len(df['TWEET'])):
if 'RT' in df["TWEET"][i]:
continue
else:
for j in range(len(df['TWEET'][i])-1):
if df['TWEET'][i][j] == '$':
if df['TWEET'][i][j+1] == '1' or df['TWEET'][i][j+1] == '2' or df['TWEET'][i][j+1] == '3' or\
df['TWEET'][i][j+1] == '4' or df['TWEET'][i][j+1] == '5' or df['TWEET'][i][j+1] == '6' or\
df['TWEET'][i][j+1] == '7' or df['TWEET'][i][j+1] == '8' or df['TWEET'][i][j+1] == '9' or df['TWEET'][i][j+1] == '0':
continue
else:
start = j
for k in range(start, len(df['TWEET'][i])):
if df['TWEET'][i][k] == ' ' or df['TWEET'][i][k:k+1] == '\n':
end = k
break
results = df['TWEET'][i][start:end]
if results not in unique_results:
unique_results.append(results)
print(unique_results)
edit: fixed the code
The outputs are:
['$GME', '$SNDL', '$FUBO', '$AMC', '$LOTZ', '$CLOV', '$USAS', '$AIHS', '$PLM', '$LODE', '$TTNP', '$IMTE', '', '$NAK.', '$NAK', '$CRBP', '$AREC', '$NTEC', '$NTN', '$CBAT', '$ZYNE', '$HOFV', '$GWPH', '$KERN', '$ZYNE,', '$AIM', '$WWR', '$CARV', '$VISL', '$SINO', '$NAKD', '$GRPS', '$RSHN', '$MARA', '$RIOT', '$NXTD', '$LAC', '$BTC', '$ITRM', '$CHCI', '$VERU', '$GMGI', '$WNBD', '$KALV', '$EGOC', '$Veru', '$MRNA', '$PVDG', '$DROP', '$EFOI', '$LLIT', '$AUVI', '$CGIX', '$RELI', '$TLRY', '$ACB', '$TRCH', '$TRCH.', '$TSLA', '$cciv', '$sndl', '$ANCN', '$TGC', '$tlry', '$KXIN', '$AMZN', '$INFI', '$LMND', '$COMS', '$VXX', '$LEDS', '$ACY', '$RHE', '$SINO.', '$GPL', '$SPCE', '$OXY', '$CLSN', '$FTFT', '$FTFT.....', '$BIEI', '$EDRY', '$CLEU', '$FSR', '$SPY', '$NIO', '$LI', '$XPEV,', '$UL', '$RGLG', '$SOS', '$QS', '$THCB', '$SUNW', '$MICT', '$BTC.X', '$T', '$ADOM', '$EBON', '$CLPS', '$HIHO', '$ONTX', '$WNRS', '$SOLO', '$Mara,', '$Riot,', '$SOS,', '$GRNQ,', '$RCON,', '$FTFT,', '$BTBT,', '$MOGO,', '$EQOS,', '$CCNC', '$CCIV', '$tsla', '$fsr', '$wkhs', '$ride', '$nio', '$NETE', '$DPW', '$MOSY', '$SSNT', '$PLTR', '$GSAH:', '$EQOS', '$MTSL', '$CMPS', '$CHIF', '$MU', '$HST', '$SNAP', '$CTXR', '$acy', '$FUBOTV', '$DPBE', '$HYLN', '$SPOT', '$NSAV', '$HYLN,', '$aabb', '$AAL', '$BBIG', '$ITNS', '$CTIB', '$AMPG', '$ZI', '$NUVI', '$INTC', '$TSM', '$AAPL', '$MRJT', '$RCMT', '$IZEA', '$BBIG,', '$ARKK', '$LIAUTO', '$MARA:', '$SOS:', '$XOM', '$ET', '$BRNW', '$SYPR', '$LCID', '$QCOM', '$FIZZ', '$TRVG', '$SLV', '$RAFA', '$TGCTengasco,', '$BYND', '$XTNT', '$NBY', '$sos', '$KMPH', '$', '$(0.60)', '$(0.64)', '$BIDU', '$rkt', '$GTT', '$CHUC', '$CLF', '$INUV', '$RKT', '$COST', '$MDCN', '$HCMC', '$UWMC', '$riot', '$OVID', '$HZON', '$SKT', '$FB', '$PLUG', '$BA', '$PYPL', '$PSTH.', '$NVDA', '$AMPG.', '$aese.', '$spy', '$pltr', '$MSFT', '$AMD', '$QQQ', '$LTNC', '$WKHS', '$EYES', '$RMO', '$GNUS', '$gme', '$mdmp', '$kern', '$AEI', '$BABA', '$YALA', '$TWTR', '$WISH', '$GE', '$ORCL', '$JUPW', '$TMBR', '$SSYS', '$NKE', '$AMPGAmpliTech', '$$$', '$$', '$RGLS', '$HOGE', '$GEGR', '$nclh', '$IGAC', '$FCEL', '$TKAT', '$OCG', '$YVR', '$IPDN.', '$IPDN', "$SINO's", '$WIMI', '$TKAT.', '$BAC', '$LZR', '$LGHL', '$F', '$GM', '$KODK', '$atvk', '$ATVK', '$AIKI', '$DS', '$AI', '$WTII', '$oxy', '$DYAI', '$DSS', '$ZKIN', '$MFH', '$WKEY', '$MKGI', '$DLPN', '$PSWW', '$SNOW', '$ALYA', '$AESE', '$CSCW', '$CIDM', '$HOFV.', '$LIVX', '$FNKO', '$HPR', '$BRQS', '$GIGM', '$APOP', '$EA', '$CUEN', '$TMBR?', '$FLNT,', '$APPS', '$METX', '$STG', '$WSRC', '$AMHC', '$VIAC', '$MO', '$UAVL', '$CS', '$MDT', '$GYST', '$CBBT', '$ASTC', '$AACG', '$WAFU.', '$WAFU', '$CASI', '$mmmw', '$MVIS', '$SNOA', '$C', '$KR', '$EWZ', '$VALE', '$EWZ.', '$CSCO', '$PINS', '$XSPA', '$VPRX', '$CEMI', '$M', '$BMRA', '$SPX', '$akt', '$SURG', '$NCLH', '$ARSN', '$ODT', '$SGBX', '$CRWD.', '$TGRR', '$PENN', '$BB', '$XOP', '$XL', '$FREQ', '$IDRA', '$DKNG', '$COHN', '$ADHC', '$ISWH', '$LEGO', '$OTRA', '$NAAC', '$HCAR', '$PPGH', '$SDAC', '$PNTM', '$OUST', '$IO', '$HQGE', '$HENC', '$KYNC', '$ATNF', '$BNSO', '$HDSN', '$AABB', '$SGH', '$BMY', '$VERY', '$EARS', '$ROKU', '$PIXY', '$APRE', '$SFET', '$SQ', '$EEIQ', '$REDU', '$CNWT', '$NFLX', '$RGBPP', '$RGBP', '$SHOP', '$VITL', '$RAAS', '$CPNG', '$JKS', '$COMP', '$NAFS']
You can use regular expressions.
\$[a-zA-Z]+
After reading the df execute the below code
import re
# Create Empty list for final results
results = []
final_results = []
for row_num in range(len(df['TWEET'])):
string_to_check = df['TWEET'][row_num]
# Check for RT at the beginning of the string only.
# if 'RT' in df["TWEET"][row_num] would have found the "RT" anywhere in the string.
if re.match(r"^RT", string_to_check):
continue
else:
# Check for all words starting with $ and followed by only alphabets.
# This will find $FOOBAR but not $600, $6FOOBAR & $FOO6BAR
rel_text_l = re.findall(r"\$[a-zA-Z]+", string_to_check)
# Check for empty list
if rel_text_l:
# Add elements of list to another list directly
results.extend(rel_text_l)
# Making list of the set of list to remove duplicates
final_results = list(set(results))
print(results)
print(final_results)
The results are
['$GME', '$FOOBAR', '$FOO', '$SNDL', '$FUBO', '$AMC', '$GME', '$LOTZ', '$CLOV', '$USAS', '$GOBLIN', '$LTNC']
['$LTNC', '$GOBLIN', '$AMC', '$FOO', '$FOOBAR', '$LOTZ', '$CLOV', '$SNDL', '$GME', '$USAS', '$FUBO']
Notice that $GME is removed once in final_results
If you were not bothered about remove tweets starting with RT, all this could be achieved in one line of code.
direct_result = list(set(re.findall(r"\$[a-zA-Z]+", str(df['TWEET']))))
Good day.
I have a list that looks like this :
my_list = [
'BENEFICIAR',
'PROFIL COMPANIE',
'MONITORIZEAZĂ',
'DELIA',
'Administrator',
'Alexandros',
'Administrator',
'Andreas',
'Administrator',
'BENEFICIAR',
'MONITORIZEAZĂ',
'LEBADA',
'Mobil: 0721',
'Email: sales#lebada.com',
'',
'BENEFICIAR',
'PROFIL',
'MONITORIZEAZĂ',
'AVA',
'Email: office#ava.com',
'Site: ',
'Adresa: ',
'Oras: ',
'Judet: ',
'Cifra afaceri:',
'Numar angajati: ',
'Profit: ',
'CUI: ',
'Tel:']
My code looks like this:
n=-1
lista_beneficiar_mail={}
for lines in my_list:
n += 1
if lines == "MONITORIZEAZĂ":
ben = line[n+1]
for lines in my_list[n+1:]:
if lines.startswith('Email'):
mail=lines
else:
continue
if lines == "MONITORIZEAZĂ":
break
lista_beneficiar_mail[ben]=mail
The idea is that from that list I need to create a dict that has as keys the n+1 indexed value from 'MONITORIZEAZA' and the value of the keys the first email that comes after.
So...this does not seem to work, I know I am doing something wrong and I have reached a wall where my mind does not know what to search what to test anymore. I know it's a easy answer.
Thank you in advance.
To find the first instance of an email after any 'key_value' we can just check for the first email after it using a bool to check, we can then use the bool to check whether that key_value has found it's email.
my_list = [
'BENEFICIAR',
'PROFIL COMPANIE',
'MONITORIZEAZĂ',
'DELIA',
'Administrator',
'Alexandros',
'Administrator',
'Andreas',
'Administrator',
'BENEFICIAR',
'MONITORIZEAZĂ',
'LEBADA',
'Mobil: 0721',
'Email: sales#lebada.com',
'',
'BENEFICIAR',
'PROFIL',
'MONITORIZEAZĂ',
'AVA',
'Email: office#ava.com',
'Site: ',
'Adresa: ',
'Oras: ',
'Judet: ',
'Cifra afaceri:',
'Numar angajati: ',
'Profit: ',
'CUI: ',
'Tel:']
my_dict = {}
ben = ""
found = True #the variable that we use to check
for i in range(len(my_list)):
if (not found and my_list[i].startswith("Email:")): #if we haven't found the last key_value's email and we got a match
found = True # We found it
my_dict[ben] = my_list[i]
if (my_list[i] == "MONITORIZEAZĂ"):
ben = my_list[i+1]
found = False # New counter if we haven't found the it
This is also more efficient as we only loop through the list once whereas your other approach meant that we check some cases multiple times.
Another approach would be using a while loop. This way you aren't repeating elements you already looped on them.
Like that:
lista_beneficiar_mail={}
n=0
while n < len(my_list):
if my_list[n] == "MONITORIZEAZĂ":
ben = my_list[n+1]
mail = ''
for i in range(n+1, len(my_list)):
if my_list[i].startswith('Email'):
mail=my_list[i]
n = i
break
if mail: #in case there is no email in the list you shouldn't add it
lista_beneficiar_mail[ben]=mail
n+= 1
I am attempting to make a plan, which is a list of classes that can only be added when the required classes have been completed or the co-requisite classes are being taken in the same semester.
Below I have my code that almost works but it always reuses the classes even though they have already been completed/used. I tried to prevent this with and (class_list[i][0] not in classes_done), I was hoping that it wouldn't go into the if statement but it seems like it's being ignored.
The rest of this if statement seems to work fine. (class_list[i][3] == '' or class_list[i][3] in classes_done) does this class have a required completed class if yes has it been completed?
(class_list[i][2] in classes_for_semester or class_list[i][2] == '')does this class have a co-requisite class if yes is it in the class_for_semester or already completed?
The class_list variable is organized like this['name', 'credit', 'co-requisite', 'required completed classes', 'empty']. I added the other variables as comments to show what they look like.
class PlanGenerator:
def generator(max_credit_allowed, min_credit_allowed, classes_done, class_list):
classes_for_semester = []
credits_for_semester = 0
semester = 0
full_plan = []
# class_list = [['MA 241 ', '4', '', '', ''], ['PS 150 ', '3', 'MA 241 ', '', ''], ['UNIV 101', '1', '', '', ''], ['COM 122', '3', '', '', ''], ...]
# max_credit_allowed = 16
# min_credit_allowed = 12
# classes_done=['UNIV 101']
while len(classes_done) != len(class_list): # keep going until all classes are used
while int(min_credit_allowed) > credits_for_semester: # keep going until at least the minimum credits are in the semester
semester += 1
for i in range(len(class_list)): # looping over the class list
if int(class_list[i][1]) + credits_for_semester < max_credit_allowed: #if this class was to be added would it go over the max credit for semester if yes go to next class
if (class_list[i][3] == '' or class_list[i][3] in classes_done) and (class_list[i][2] in classes_for_semester or class_list[i][2] in classes_done or class_list[i][2] == '') and (class_list[i][0] not in classes_done):
classes_for_semester.append(class_list[i][0])
credits_for_semester += int(class_list[i][1])
print('classes for semester', classes_for_semester)
print('semester credits', credits_for_semester)
classes_done.append(classes_for_semester)
full_plan.append(semester)
full_plan.append(classes_for_semester)
print('full plan', full_plan)
classes_for_semester = []
credits_for_semester = 0
print('done')
print(full_plan)
I hope my explanation makes sense.
Maybe somebody can understand my mistake and help me find a good solution.
Also if you have anything that you see would make this code more simple please let me know.
Much appreciated
First, your while int(min_credit_allowed) > credits_for_semester line is leading to an infinite loop. It needs to be changed to
while len(classes_done) != len(class_list) and int(min_credit_allowed) > credits_for_semester: # Remove the second while loop
Secondly, you're appending a list to a list, so you get a 2-D list for classes_done with
classes_done.append(classes_for_semester)
This should be
classes_done += classes_for_semester
so that you add the items from classes_for_semester into classes_done, rather than adding a list.
Your new code should look like this:
def generator(max_credit_allowed, min_credit_allowed, classes_done, class_list):
classes_for_semester = []
credits_for_semester = 0
semester = 0
full_plan = []
# class_list = [['MA 241 ', '4', '', '', ''], ['PS 150 ', '3', 'MA 241 ', '', ''], ['UNIV 101', '1', '', '', ''], ['COM 122', '3', '', '', ''], ...]
# max_credit_allowed = 16
# min_credit_allowed = 12
# classes_done=['UNIV 101']
while len(classes_done) != len(class_list) and int(min_credit_allowed) > credits_for_semester: # keep going until at least the minimum credits are in the semester
semester += 1
for i in range(len(class_list)): # looping over the class list
if int(class_list[i][1]) + credits_for_semester < max_credit_allowed: #if this class was to be added would it go over the max credit for semester if yes go to next class
if (class_list[i][3] == '' or class_list[i][3] in classes_done) and (class_list[i][2] in classes_for_semester or class_list[i][2] in classes_done or class_list[i][2] == '') and (class_list[i][0] not in classes_done):
classes_for_semester.append(class_list[i][0])
credits_for_semester += int(class_list[i][1])
print('classes for semester', classes_for_semester)
print('semester credits', credits_for_semester)
classes_done += classes_for_semester
full_plan.append(semester)
full_plan.append(classes_for_semester)
print('full plan', full_plan)
classes_for_semester = []
credits_for_semester = 0
print('done')
print(full_plan)
I would highly recommend using None instead of '' for the non-existent values, that way you can do a simple value is None check instead of an equality check to an empty string.
For the lists of class information you're passing in, I would change them to classes, dictionaries, or namedtuples (find out more about them here) so that you can easily refer to the values by name rather than numbers.
class_list[i].class_name or class_list[i]['class_name'] are a lot easier to debug in the future than magic indices. You can even change your for loop to use the actual class details as a variable instead of i in range(len(class_list)) like so:
for c in class_list:
if int(c.credits) .... # Using a class or namedtuple approach as suggested above
And one minor thing that probably isn't a huge issue but could become a concern if these lists were to grow long: consider using sets instead of lists for storing things like classes_done and classes_for_semester. It also prevents duplicates from being stored (assuming you don't want to store the same class more than once).
To provide a concrete example of the namedtuple suggestion, you can do the following:
from collections import namedtuple
ClassList = namedtuple('ClassList', ['class_name', 'credits', 'coreq', 'prereq'])
class_list = [
ClassList(class_name='MA 241', credits=4, coreq=None, prereq=None),
ClassList(class_name='PS 150', credits=3, coreq='MA 241', prereq=None),
# ...
]
So your for loop becomes
for c in class_list:
if c.credits + credits_for_semester < max_credits_allowed:
if (c.prereq is None or c.prereq in classes_done) and \
(c.coreq in classes_for_semester or c.coreq in classes_done or c.coreq is None) and \
(c.class_name not in classes_done):
classes_for_semester.append(c.class_name)
credits_for_semester += c.credits
classes_done += classes_for_semester
full_plan.append(semester)
full_plan.append(classes_for_semester)
classes_for_semester = []
credits_for_semester = 0
I'm trying to parse the item names and it's corresponding values from the below snippet. dt tag holds names and dd containing values. There are few dt tags which do not have corresponding values. So, all the names do not have values. What I wish to do is keep the values blank against any name if the latter doesn't have any values.
These are the elements I would like to scrape data from:
content="""
<div class="movie_middle">
<dl>
<dt>Genres:</dt>
<dt>Resolution:</dt>
<dd>1920*1080</dd>
<dt>Size:</dt>
<dd>1.60G</dd>
<dt>Quality:</dt>
<dd>1080p</dd>
<dt>Frame Rate:</dt>
<dd>23.976 fps</dd>
<dt>Language:</dt>
</dl>
</div>
"""
I've tried like below:
soup = BeautifulSoup(content,"lxml")
title = [item.text for item in soup.select(".movie_middle dt")]
result = [item.text for item in soup.select(".movie_middle dd")]
vault = dict(zip(title,result))
print(vault)
It gives me messy results (wrong pairs):
{'Genres:': '1920*1080', 'Resolution:': '1.60G', 'Size:': '1080p', 'Quality:': '23.976 fps'}
My expected result:
{'Genres:': '', 'Resolution:': '1920*1080', 'Size:': '1.60G', 'Quality:': '1080p','Frame Rate:':'23.976 fps','Language:':''}
Any help on fixing the issue will be highly appreciated.
You can loop through the elements inside dl. If the current element is dt and the next element is dd, then store the value as the next element, else set the value as empty string.
dl = soup.select('.movie_middle dl')[0]
elems = dl.find_all() # Returns the list of dt and dd
data = {}
for i, el in enumerate(elems):
if el.name == 'dt':
key = el.text.replace(':', '')
# check if the next element is a `dd`
if i < len(elems) - 1 and elems[i+1].name == 'dd':
data[key] = elems[i+1].text
else:
data[key] = ''
You can use BeautifulSoup to parse the dl structure, and then write a function to create the dictionary:
from bs4 import BeautifulSoup as soup
import re
def parse_result(d):
while d:
a, *_d = d
if _d:
if re.findall('\<dt', a) and re.findall('\<dd', _d[0]):
yield [a[4:-5], _d[0][4:-5]]
d = _d[1:]
else:
yield [a[4:-5], '']
d = _d
else:
yield [a[4:-5], '']
d = []
print(dict(parse_result(list(filter(None, str(soup(content, 'html.parser').find('dl')).split('\n')))[1:-1])))
Output:
{'Genres:': '', 'Resolution:': '1920*1080', 'Size:': '1.60G', 'Quality:': '1080p', 'Frame Rate:': '23.976 fps', 'Language:': ''}
For a slightly longer, although cleaner solution, you can create a decorator to strip the HTML tags of the output, thus removing the need for the extra string slicing in the main parse_result function:
def strip_tags(f):
def wrapper(data):
return {a[4:-5]:b[4:-5] for a, b in f(data)}
return wrapper
#strip_tags
def parse_result(d):
while d:
a, *_d = d
if _d:
if re.findall('\<dt', a) and re.findall('\<dd', _d[0]):
yield [a, _d[0]]
d = _d[1:]
else:
yield [a, '']
d = _d
else:
yield [a, '']
d = []
print(parse_result(list(filter(None, str(soup(content, 'html.parser').find('dl')).split('\n')))[1:-1]))
Output:
{'Genres:': '', 'Resolution:': '1920*1080', 'Size:': '1.60G', 'Quality:': '1080p', 'Frame Rate:': '23.976 fps', 'Language:': ''}
from collections import defaultdict
test = soup.text.split('\n')
d = defaultdict(list)
for i in range(len(test)):
if (':' in test[i]) and (':' not in test[i+1]):
d[test[i]] = test[i+1]
elif ':' in test[i]:
d[test[i]] = ''
d
defaultdict(list,
{'Frame Rate:': '23.976 fps',
'Genres:': '',
'Language:': '',
'Quality:': '1080p',
'Resolution:': '1920*1080',
'Size:': '1.60G'})
The logic here is that you know that every key will have a colon. Knowing this, you can write an if else statement to capture the unique combinations, whether that is key followed by key or key followed by value
Edit:
In case you wanted to clean your keys, below replaces the : in each one:
d1 = { x.replace(':', ''): d[x] for x in d.keys() }
d1
{'Frame Rate': '23.976 fps',
'Genres': '',
'Language': '',
'Quality': '1080p',
'Resolution': '1920*1080',
'Size': '1.60G'}
The problem is that empty elements are not present. Since there is no hierarchy between the <dt> and the <dd>, I'm afraid you'll have to craft the dictionary yourself.
vault = {}
category = ""
for item in soup.find("dl").findChildren():
if item.name == "dt":
if category == "":
category = item.text
else:
vault[category] = ""
category = ""
elif item.name == "dd":
vault[category] = item.text
category = ""
Basically this code iterates over the child elements of the <dl> and fills the vault dictionary with the values.
I have a list of strings in python and if an element of the list contains the word "parthipan" I should print a message. But the below script is not working
import re
a = ["paul Parthipan","paul","sdds","sdsdd"]
last_name = "Parthipan"
my_regex = r"(?mis){0}".format(re.escape(last_name))
if my_regex in a:
print "matched"
The first element of the list contains the word "parthipan", so it should print the message.
If you want to do this with a regexp, you can't use the in operator. Use re.search() instead. But it works with strings, not a whole list.
for elt in a:
if re.search(my_regexp, elt):
print "Matched"
break # stop looking
Or in more functional style:
if any(re.search(my_regexp, elt) for elt in a)):
print "Matched"
You don't need regex for this simply use any.
>>> a = ["paul Parthipan","paul","sdds","sdsdd"]
>>> last_name = "Parthipan".lower()
>>> if any(last_name in name.lower() for name in a):
... print("Matched")
...
Matched
Why not:
a = ["paul Parthipan","paul","sdds","sdsdd"]
last_name = "Parthipan"
if any(last_name in ai for ai in a):
print "matched"
Also what for is this part:
...
import re
my_regex = r"(?mis){0}".format(re.escape(last_name))
...
EDIT:
Im just too blind to see what for do You need regex here. It would be best if You would give some real input and output. This is small example which could be done in that way too:
a = ["paul Parthipan","paul","sdds","sdsdd",'Mala_Koala','Czarna,Pala']
last_name = "Parthipan"
names=[]
breakers=[' ','_',',']
for ai in a:
for b in breakers:
if b in ai:
names.append(ai.split(b))
full_names=[ai for ai in names if len(ai)==2]
last_names=[ai[1] for ai in full_names]
if any(last_name in ai for ai in last_names):
print "matched"
But if regex part is really needed I cant imagine how to find '(?mis)Parthipan' in 'Parthipan'. Most simple would be in reverse direction 'Parthipan' in '(?mis)Parthipan'. Like here...
import re
a = ["paul Parthipan","paul","sdds","sdsdd",'Mala_Koala','Czarna,Pala']
last_name = "Parthipan"
names=[]
breakers=[' ','_',',']
for ai in a:
for b in breakers:
if b in ai:
names.append(ai.split(b))
full_names=[ai for ai in names if len(ai)==2]
last_names=[r"(?mis){0}".format(re.escape(ai[1])) for ai in full_names]
print last_names
if any(last_name in ai for ai in last_names):
print "matched"
EDIT:
Yhm, with regex You have few possibilities...
import re
a = ["paul Parthipan","paul","sdds","sdsdd",'jony-Parthipan','koala_Parthipan','Parthipan']
lastName = "Parthipan"
myRegex = r"(?mis){0}".format(re.escape(lastName))
strA=';'.join(a)
se = re.search(myRegex, strA)
ma = re.match(myRegex, strA)
fa = re.findall(myRegex, strA)
fi=[i.group() for i in re.finditer(myRegex, strA, flags=0)]
se = '' if se is None else se.group()
ma = '' if ma is None else ma.group()
print se, 'match' if any(se) else 'no match'
print ma, 'match' if any(ma) else 'no match'
print fa, 'match' if any(fa) else 'no match'
print fi, 'match' if any(fi) else 'no match'
output, only first one seems ok, so only re.search gives proper solution:
Parthipan match
no match
['Parthipan', 'Parthipan', 'Parthipan', 'Parthipan'] match
['Parthipan', 'Parthipan', 'Parthipan', 'Parthipan'] match