creating a key, value1, value2 dictionary from list - python

I am trying to create a key: value1, value2 dictionary from a byte list and having issues with it.
Here is my list
[b'Expected in April 2018',
b'Murder At Koh E Fiza',
b'34',
b'06 April 2018',
b'Subedar Joginder Singh',
b'0',
b'06 April 2018',
b'Blackmail',
b'86',
b'06 April 2018',
b'Missing',
b'0',
b'13 April 2018',
b'October',
b'59',
b'13 April 2018',
b'Mercury',
b'0',
b'20 April 2018',
b'Omerta',
b'50']
I have tried following code:
b = dict(zip(list[1::3],(list[2::3]+list[0::3])))
but I don't get third value as key value pair.
I have also tried
b = dict(zip(list[1::3],list[2::3]+list[0::3]))
same issue I am getting following output with both of these statements
{b'Murder At Koh E Fiza': b'34', b'Subedar Joginder Singh': b'0',
b'Blackmail': b'86', b'Missing': b'0', b'October': b'59', b'Mercury': b'0',
b'Omerta': b'50'}
I am looking for the following output
b'Murder At Koh E Fiza': b'34',b'Expected in April 2018',
b'Subedar Joginder Singh': b'0',b'06 April 2018',
Please let me know

I think you are looking to associate a list or a tuple with each key in your dictionary. So something like this should work:
dict( zip(list[1::3], zip( list[2::3], list[0::3] ) ))
Which results in
{'Mercury': ('0', '13 April 2018'), 'Murder At Koh E Fiza': ('34', 'Expected in April 2018'), 'October': ('59', '13 April 2018'), 'Missing': ('0', '06 April 2018'), 'Blackmail': ('86', '06 April 2018'), 'Omerta': ('50', '20 April 2018'), 'Subedar Joginder Singh': ('0', '06 April 2018')}

You can use zip alongside with a dict comprehension
a = [b'Expected in April 2018',
b'Murder At Koh E Fiza',
b'34',
b'06 April 2018',
b'Subedar Joginder Singh',
b'0',
b'06 April 2018',
b'Blackmail',
b'86',
b'06 April 2018',
b'Missing',
b'0',
b'13 April 2018',
b'October',
b'59',
b'13 April 2018',
b'Mercury',
b'0',
b'20 April 2018',
b'Omerta',
b'50']
final = {v: [m, k] for k, v, m in zip(a, a[1:], a[2:])}
print(final.get(b'Murder At Koh E Fiza')
print(final.get(b'Subedar Joginder Singh'))
output:
[b'34', b'Expected in April 2018']
[b'0', b'06 April 2018']

You can find that out without zipping as well:
my_list = ['Expected in April 2018', 'Murder At Koh E Fiza', '34', '06
April 2018', 'Subedar Joginder Singh', '0', '06 April 2018',
'Blackmail', '86', '06 April 2018', 'Missing', '0', '13 April 2018',
'October', '59', '13 April 2018', 'Mercury', '0', '20 April 2018',
'Omerta', '50']
n = 3
composite_list = [my_list[x:x+n] for x in range(0, len(my_list),n)]
d = {n[1]: [n[0], n[2]] for n in composite_list}

Related

sort a list of string numerically in python

I know there are many questions regarding this type of sorting, I tried many time by referring to those questions and also by going through the re topic in python too
My question is:
class Example(models.Model):
_inherit = 'sorting.example'
def unable_to_sort(self):
data_list = ['Abigail Peterson Jan 25','Paul Williams Feb 1','Anita Oliver Jan 24','Ernest Reed Jan 28']
self.update({'list_of_birthday_week': ','.join(r for r in data_list)})
I need to be sorted according to the month & date like:
data_list = ['Anita Oliver Jan 24','Abigail Peterson Jan 25','Ernest Reed Jan 28','Paul Williams Feb 1']
is there any way to achieve this ?
Use a regex to extract the date than use it as key of sorted function.
import re
pattern = r'(\b(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)\D?(?:\d{1,2}\D?))'
sort_by_date = lambda x: datetime.strptime(re.search(pattern, x).group(0), '%b %d')
out = sorted(data_list, key=sort_by_date)
Output:
>>> out
['Anita Oliver Jan 24',
'Abigail Peterson Jan 25',
'Ernest Reed Jan 28',
'Paul Williams Feb 1']
Input:
>>> data_list
['Abigail Peterson Jan 25',
'Paul Williams Feb 1',
'Ernest Reed Jan 28',
'Anita Oliver Jan 24']
You need to extract the date part from the string, and then turn the date string into a comparable format. For the first task, regexen would be a decent choice here, and for the second part, datetime.strptime would be appropriate:
>>> import re
>>> from datetime import *
>>>
>>> re.search('\w+ \d+$', 'Abigail Peterson Jan 25')
<re.Match object; span=(17, 23), match='Jan 25'>
>>> re.search('\w+ \d+$', 'Abigail Peterson Jan 25')[0]
'Jan 25'
>>>
>>> datetime.strptime('Jan 25', '%b %d')
datetime.datetime(1900, 1, 25, 0, 0)
>>> datetime.strptime(re.search('\w+ \d+$', 'Abigail Peterson Jan 25')[0], '%b %d')
datetime.datetime(1900, 1, 25, 0, 0)
Then turn that into a callback for list.sort:
>>> data_list.sort(key=lambda i: datetime.strptime(re.search('\w+ \d+$', i)[0], '%b %d'))
['Anita Oliver Jan 24', 'Abigail Peterson Jan 25', 'Ernest Reed Jan 28', 'Paul Williams Feb 1']
You can also use split() to accomplish that.
from datetime import datetime
...
def unable_to_sort(self):
data_list = ['Abigail Peterson Jan 25','Paul Williams Feb 1','Anita Oliver Jan 24','Ernest Reed Jan 28']
def get_date(data):
name, str_date = data.split(" ")[:-2], data.split(" ")[-2:]
month, day = str_date
return datetime.strptime(f"{month} {day}", "%b %d")
sorted_data_list = sorted(data_list, key=get_date)
self.update({'list_of_birthday_week': ','.join(r for r in sorted_data_list)})
You can use sorted function with keys datetime.strptime() and date value.
from datetime import datetime
data_list = ['Anita Oliver Jan 24','Abigail Peterson Jan 25','Ernest Reed Jan 28','Paul Williams Feb 1']
k=[x.split() for x in data_list]
days_sorted = sorted(k, key=lambda x: (datetime.strptime(x[2],'%b'),x[3]))
[['Anita', 'Oliver', 'Jan', '24'],
['Abigail', 'Peterson', 'Jan', '25'],
['Ernest', 'Reed', 'Jan', '28'],
['Paul', 'Williams', 'Feb', '1']]

Text mining on string dates from fuzzy example

I'm a bit stuck with proceeding with the custom function that would be able to parse in any given language, with a set of rules that would be able to identify and parse months and years including some special cases (that I will show).
Input data:
dates_eg = [["Issued Feb 2018 · No Expiration Date"], ["Oct 2021 - Present"],
["Jan 2019 - Oct 2021 · 2 yrs 10 mos"], ["1994 - 2000"], ["Sep 2010 – Sep 2015 • 5 yrs"],
["Nov 2019 – Present"], ["Sep 2015 – Mar 2017 • 1 yr 6 mos"], ["Apr 2019 – Present • 2 yrs 8 mos"],
["Issued: Aug 2018 · Expired Aug 2020"], ["Mar 2017 - Dec 2017 · 10 mos"],
["May 2021 - Present · 9 mos"]]
NOTE: No API translation should be used. Instead of it, I've created a list of FLAG words (in certain languages) and STOPWORDS that would be able to identify all the cases.
Present, Issued, Expired, No Expiration Date -> all of these four keywords/stopwords would be identified through the lists (see underneath).
Current functions work workflow:
e.g. "Issued Feb 2018 · No Expiration Date"
Firstly, algorithm named "raw_dates_extractor" is separating the start date ('Issued Feb 2018') and the end date ('No Expiration Date') with a separator ' · '
Secondly, clean_extract_dates algorithm is identifying 'Issued' and gets rid of it
All the aforementioned resulting in:
{'Start_date_raw0': 'Feb 2018', 'Start_date_raw1': 'Oct 2021', 'Start_date_raw2': 'Jan 2019', 'Start_date_raw3': '1994', 'Start_date_raw4': 'Sep 2010', 'Start_date_raw5': 'Nov 2019', 'Start_date_raw6': 'Sep 2015', 'Start_date_raw7': 'Apr 2019', 'Start_date_raw8': 'Aug 2018', 'Start_date_raw9': 'Mar 2017', 'Start_date_raw10': 'May 2021', 'End_date_raw0': 'No Expiration Date', 'End_date_raw1': 'Present', 'End_date_raw2': 'Oct 2021', 'End_date_raw3': '2000', 'End_date_raw4': 'Sep 2015', 'End_date_raw5': 'Present', 'End_date_raw6': 'Mar 2017', 'End_date_raw7': 'Present', 'End_date_raw8': 'Aug 2020', 'End_date_raw9': 'Dec 2017', 'End_date_raw10': 'Present'}
However, the desired output would go further by extracting/identifying years, months, and stopwords/flag words (like 'No Expiration Date', 'Present' would be flagged as 'ONGOING' = 1 otherwise 0.
From the first initial raw example:
"Issued Feb 2018 · No Expiration Date" would be:
Desired result:
{'start_year': 2021, 'start_month': "Apr", 'end_year': None,
'end_month': None, 'is_on_going': True})
Complete Code attempt:
Lists with Expired, Issued, No Expiration Date, Present
import re
import pandas as pd
# "Present" word in all language combination
present_all_lang = ["současnost", "I dag", "Heute", "Present", "actualidad", "aujourd’hui", "वर्तमान"
"Saat ini", "Presente",
"現在", "현재", "Kini", "heden", "Nå", "obecnie", "o momento"
"Prezent", "настоящее время", "nu", "ปัจจุบัน ",
"Kasalukuyan", "Halen", "至今", "現在"]
# "Issued" word in all language combination
issued_all_lang = ["Vydáno", "Udstedt", "Ausgestellt", "Issued", "Expedición", "Émise le", "नव॰", "Diterbitkan"
"Data di rilascio",
"発行日", "발행일", "Dikeluarkan", "Toegekend", "Utstedt", "Wydany", "Emitido em"
"Дата выдачи", "Utfärdat", "ออกเมื่อ",
"Inisyu", "tarihinde verildi", "颁发日期", "發照日期"]
# "No experience date" in all language combination
no_experience_all_lang = ["Bez data vypršení", "Ingen udløbsdato", "Kein Ablaufdatum", "No Expiration Date",
"Sin fecha de vencimiento", "Sans date d’expiration", "समाप्ति की कोई तारीख नहीं",
"Tidak Ada Tanggal Kedaluwarsa", "Nessuna data di scadenza", "有効期限なし"
"만료일 없음",
"Tiada Tarikh Tamat Tempoh", "Geen vervaldatum" "Ingen utløpsdato",
"Brak daty ważności", "Sem data de expiração", "Без истечения срока действия"
"Inget utgångsdatum", "ไม่มีวันหมดอายุ",
"Walang Petsa ng Pagtatapos"
"Sona Erme Tarihi Yok", "长期有效", "永久有效"]
# "Expired" in all lang
expires_all_lang = ["Vyprší", "Udløbet", "Gültig bis", "Expired", "Vencimiento", "Expire le",
"को समाप्त हो गया है", "Kedaluwarsa", "Data di scadenza", "有効期限", " 만료됨", "Tamat Tempoh",
"Verloopt op", "Utløper", "Wygasa", "Expira em", "A expirat la", "Срок действия истек",
"Upphörde att gälla", "หมดอายุเมื่อ", "Mag-e-expire sa", "tarihinde süresi bitiyor", "有效期至", "過期"]
# colon my appear get rid of it!
def raw_dates_extractor(list_of_examples: dates_eg) -> dict:
raw_start_date, raw_end_date, merged_start_end_dates = {}, {}, {}
for row, datum in enumerate(list_of_examples):
date_splitter = re.split("[·•–-]", datum[0])
raw_start_date[f'Start_date_raw{row}'] = date_splitter[0].strip(' ')
raw_end_date[f'End_date_raw{row}'] = date_splitter[1].strip(' ')
merged_start_end_dates = raw_start_date | raw_end_date
return merged_start_end_dates
output:
raw_dates_extractor(dates_eg)
{'Start_date_raw0': 'Issued Feb 2018', 'Start_date_raw1': 'Oct 2021', 'Start_date_raw2': 'Jan 2019', 'Start_date_raw3': '1994', 'Start_date_raw4': 'Sep 2010', 'Start_date_raw5': 'Nov 2019', 'Start_date_raw6': 'Sep 2015', 'Start_date_raw7': 'Apr 2019', 'Start_date_raw8': 'Issued: Aug 2018', 'Start_date_raw9': 'Mar 2017', 'Start_date_raw10': 'May 2021', 'End_date_raw0': 'No Expiration Date', 'End_date_raw1': 'Present', 'End_date_raw2': 'Oct 2021', 'End_date_raw3': '2000', 'End_date_raw4': 'Sep 2015', 'End_date_raw5': 'Present', 'End_date_raw6': 'Mar 2017', 'End_date_raw7': 'Present', 'End_date_raw8': 'Expired Aug 2020', 'End_date_raw9': 'Dec 2017', 'End_date_raw10': 'Present'}
def clean_extract_dates(input_dictionary: dict, aggregate_stop_words: list):
container_replica = {}
for date_element in input_dictionary:
query_req = input_dictionary[f'{date_element}']
if ":" in query_req:
query_req = query_req.replace(":", "")
if "." in query_req:
query_req = query_req.replace(".", "")
stopwords = aggregate_stop_words
query_words = query_req.split()
result_words = [word for word in query_words if word not in stopwords]
end_result = ' '.join(result_words)
container_replica[f'{date_element}'] = end_result
return container_replica
output:
clean_extract_dates(raw_dates_extractor(dates_eg), aggregate_lists)
{'Start_date_raw0': 'Feb 2018', 'Start_date_raw1': 'Oct 2021', 'Start_date_raw2': 'Jan 2019', 'Start_date_raw3': '1994', 'Start_date_raw4': 'Sep 2010', 'Start_date_raw5': 'Nov 2019', 'Start_date_raw6': 'Sep 2015', 'Start_date_raw7': 'Apr 2019', 'Start_date_raw8': 'Aug 2018', 'Start_date_raw9': 'Mar 2017', 'Start_date_raw10': 'May 2021', 'End_date_raw0': 'No Expiration Date', 'End_date_raw1': 'Present', 'End_date_raw2': 'Oct 2021', 'End_date_raw3': '2000', 'End_date_raw4': 'Sep 2015', 'End_date_raw5': 'Present', 'End_date_raw6': 'Mar 2017', 'End_date_raw7': 'Present', 'End_date_raw8': 'Aug 2020', 'End_date_raw9': 'Dec 2017', 'End_date_raw10': 'Present'}
aggregate_lists = issued_all_lang + no_experience_all_lang + expires_all_lang
saved_dictionary = clean_extract_dates(raw_dates_extractor(dates_eg), aggregate_lists)

Why is Matplotlib plotting values with an incorrect Y? [duplicate]

This question already has answers here:
Convert all strings in a list to int
(10 answers)
Closed 1 year ago.
I am trying to plot some points with a date on the X axis and a numeric score on the Y axis. However, when I plot the dates list and the score list (see below)
['03 January 2021', '05 January 2021', '06 January 2021', '08 January 2021', '10 January 2021', '10 January 2021', '22 January 2021', '22 January 2021', '18 February 2021', '24 February 2021', '03 March 2021', '31 March 2021', '31 March 2021', '21 April 2021', '21 April 2021']
['94', '97', '93', '94', '96', '97', '95', '93', '95', '97', '91', '97', '97', '95', '98']
using this code:
plt.plot(dates,scores, marker='o')
plt.xlabel('Date')
plt.ylabel('Score')
plt.xticks(rotation=45)
plt.show()
I get a plot that has a messed up Y axis - it is not in numerical order:
https://i.stack.imgur.com/2owQd.png
Does anyone know what's causing this or why the Y axis might be out of order?
Thank you Fariborz and mkrieger1!
That was the issue, and I didn't even think of it.

Concatenate ListA elements with partially matching ListB elements

Say I have two python lists as:
ListA = ['Jan 2018', 'Feb 2018', 'Mar 2018']
ListB = ['Sales Jan 2018','Units sold Jan 2018','Sales Feb 2018','Units sold Feb 2018','Sales Mar 2018','Units sold Mar 2018']
I need to get an output as:
List_op = ['Jan 2018 Sales Jan 2018 Units sold Jan 2018','Feb 2018 Sales Feb 2018 Units sold Feb 2018','Mar 2018 Sales Mar 2018 Units sold Mar 2018']
My approach so far:
res=set()
for i in ListB:
for j in ListA:
if j in i:
res.add(f'{i} {j}')
print (res)
this gives me result as:
{'Units sold Jan 2018 Jan 2018', 'Sales Feb 2018 Feb 2018', 'Units sold Mar 2018 Mar 2018', 'Units sold Feb 2018 Feb 2018', 'Sales Jan 2018 Jan 2018', 'Sales Mar 2018 Mar 2018'}
which is definitely not the solution I'm looking for.
What I think is regular expression could be a handful here but I'm not sure how to approach. Any help in this regard is highly appreciated.
Thanks in advance.
Edit:
Values in ListA and ListB are not necessarily to be in order. Therefore for a particular month/year value in ListA, the same month/year value from ListB has to be matched and picked for both 'Sales' and 'Units sold' component and needs to be concatenated.
My main goal here is to get the list which I can use later to generate a statement that I'll be using to write Hive query.
Added more explanation as suggested by #andrew_reece
Assuming no additional edge cases that need taking care of, your original code is not bad, just needs a slight update:
List_op = []
for a in ListA:
combined = a
for b in ListB:
if a in b:
combined += " " + b
List_op.append(combined)
List_op
['Jan 2018 Sales Jan 2018 Units sold Jan 2018',
'Feb 2018 Sales Feb 2018 Units sold Feb 2018',
'Mar 2018 Sales Mar 2018 Units sold Mar 2018']
Supposing ListA and ListB are sorted:
ListA = ['Jan 2018', 'Feb 2018', 'Mar 2018']
ListB = ['Sales Jan 2018','Units sold Jan 2018','Sales Feb 2018','Units sold Feb 2018','Sales Mar 2018','Units sold Mar 2018']
print([v1 + " " + v2 for v1, v2 in zip(ListA, [v1 + " " + v2 for v1, v2 in zip(ListB[::2], ListB[1::2])])])
This will print:
['Jan 2018 Sales Jan 2018 Units sold Jan 2018', 'Feb 2018 Sales Feb 2018 Units sold Feb 2018', 'Mar 2018 Sales Mar 2018 Units sold Mar 2018']
In my example I firstly concatenate ListB variables together and then join ListA with this new list.
String concatenation can become expensive. In Python 3.6+, you can use more efficient f-strings within a list comprehension:
res = [f'{i} {j} {k}' for i, j, k in zip(ListA, ListB[::2], ListB[1::2])]
print(res)
['Jan 2018 Sales Jan 2018 Units sold Jan 2018',
'Feb 2018 Sales Feb 2018 Units sold Feb 2018',
'Mar 2018 Sales Mar 2018 Units sold Mar 2018']
Using itertools.islice, you can avoid the expense of creating new lists:
from itertools import islice
zipper = zip(ListA, islice(ListB, 0, None, 2), islice(ListB, 1, None, 2))
res = [f'{i} {j} {k}' for i, j, k in zipper]

How can I use regular expressions

pattern = r"(Mon|Tues|Wednes|Thurs|Fri)day, (February|March) [0-9]{2}, [0-9]{4}\s*Day [0-9]{1}"
line = """
Wednesday, February 28, 2018
Day 4 3:00 Dismissal
All Day
Thursday, March 01, 2018
Day 5 1:30PM Dismissal
All Day
Friday, March 02, 2018
Day 6 3:00 Dismissal
All Day
Monday, March 05, 2018
Day 1 1:30 Dismissal
All Day
Tuesday, March 06, 2018
Day 2 3:00 Dismissal
All Day
Tuesday, March 06, 2018"""
result = re.findall(pattern, line)
print(result)
Won't work.
If you want to catch keys only, group it right:
pattern = r"((?:Mon|Tues|Wednes|Thurs|Fri)day), (February|March) ([0-9]{2}), ([0-9]{4})\s*Day ([0-9]{1})"
Will get:
[('Wednesday', 'February', '28', '2018', '4'), ('Thursday', 'March', '01', '2018', '5'), ('Friday', 'March', '02', '2018', '6'), ('Monday', 'March', '05', '2018', '1'), ('Tuesday', 'March', '06', '2018', '2')]
If you want to catch whole match string, don't group it, (like #ekhumoro said use ?: before a group):
pattern = r"(?:Mon|Tues|Wednes|Thurs|Fri)day, (?:February|March) [0-9]{2}, [0-9]{4}\s*Day [0-9]{1}"
Will get a list of str:
['Wednesday, February 28, 2018 \nDay 4', 'Thursday, March 01, 2018 \nDay 5', 'Friday, March 02, 2018 \nDay 6', 'Monday, March 05, 2018 \nDay 1', 'Tuesday, March 06, 2018 \nDay 2']

Categories