Related
I have a csv file passed into a function as a string:
csv_input = """
quiz_date,location,size
2022-01-01,london_uk,134
2022-01-02,edingburgh_uk,65
2022-01-01,madrid_es,124
2022-01-02,london_uk,125
2022-01-01,edinburgh_uk,89
2022-01-02,madric_es,143
2022-01-02,london_uk,352
2022-01-01,edinburgh_uk,125
2022-01-01,madrid_es,431
2022-01-02,london_uk,151"""
I want to print the sum of how many people were surveyed in each city by date, so something like:
Date. City. Pop-Surveyed
2022-01-01. London. 134
2022-01-01. Edinburgh. 214
2022-01-01. Madrid. 555
2022-01-02. London. 628
2022-01-02. Edinburgh. 65
2022-01-02. Madrid. 143
As I can't import pandas on my machine (can't install without internet access) I thought I could use a defaultdict to store the value of each city by date
from collections import defaultdict
survery_data = csv_input.split()[1:]
survery_data = [survey.split(',') for survey in survery_data]
survey_sum = defaultdict(dict)
for survey in survery_data:
date = survey[0]
city = survey[1].split("_")[0]
quantity = survey[-1]
survey_sum[date][city] += quantity
print(survey_sum)
But doing this returns a KeyError:
KeyError: 'london'
When I was hoping to have a defaultdict of
{'2022-01-01': {'london': 134}, {'edinburgh': 214}, {'madrid': 555}},
{'2022-01-02': {'london': 628}, {'edinburgh': 65}, {'madrid': 143}}
Is there a way to create a default dict that gives a structure so I could then iterate over to print out each column like above?
Try:
csv_input = """\
quiz_date,location,size
2022-01-01,london_uk,134
2022-01-02,edingburgh_uk,65
2022-01-01,madrid_es,124
2022-01-02,london_uk,125
2022-01-01,edinburgh_uk,89
2022-01-02,madric_es,143
2022-01-02,london_uk,352
2022-01-01,edinburgh_uk,125
2022-01-01,madrid_es,431
2022-01-02,london_uk,151"""
header, *rows = (
tuple(map(str.strip, line.split(",")))
for line in map(str.strip, csv_input.splitlines())
)
tmp = {}
for date, city, size in rows:
key = (date, city.split("_")[0])
tmp[key] = tmp.get(key, 0) + int(size)
out = {}
for (date, city), size in tmp.items():
out.setdefault(date, []).append({city: size})
print(out)
Prints:
{
"2022-01-01": [{"london": 134}, {"madrid": 555}, {"edinburgh": 214}],
"2022-01-02": [{"edingburgh": 65}, {"london": 628}, {"madric": 143}],
}
Changing
survey_sum = defaultdict(dict)
to
survey_sum = defaultdict(lambda: defaultdict(int))
allows the return of
defaultdict(<function survey_sum.<locals>.<lambda> at 0x100edd8b0>, {'2022-01-01': defaultdict(<class 'int'>, {'london': 134, 'madrid': 555, 'edinburgh': 214}), '2022-01-02': defaultdict(<class 'int'>, {'edingburgh': 65, 'london': 628, 'madrid': 143})})
Allowing iterating over to create a list.
I have the following function which produces results;
myNames = ['ULTA', 'CSCO', ...]
def get_from_min_match(var):
temp = []
count_elem = generate_elem_count()
for item in count_elem:
if var <= count_elem[item]:
temp.append(item)
return set(temp) if len(set(temp)) > 0 else "None"
def generate_elem_count():
result_data = []
for val in mapper.values():
if type(val) == list:
result_data += val
elif type(val) == dict:
for key in val:
result_data.append(key)
count_elem = {elem: result_data.count(elem) for elem in result_data}
return count_elem
I call this function like this;
myNames_dict_1 = ['AME', 'IEX', 'PAYC']
myNames_dict_1 = ['ULTA', 'CSCO', 'PAYC']
mapper = {1: myNames_dict_1, 2: myNames_dict_2}
print(" These meet three values ", get_from_min_match(3))
print(" These meet four values ", get_from_min_match(4))
The output I get from these functions are as follows;
These meet three values {'ULTA', 'CSCO', 'SHW', 'MANH', 'TTWO', 'SAM', 'RHI', 'PAYC', 'AME', 'CCOI', 'RMD', 'AMD', 'UNH', 'AZO', 'APH', 'EW', 'FFIV', 'IEX', 'IDXX', 'ANET', 'SWKS', 'HRL', 'ILMN', 'PGR', 'ATVI', 'CNS', 'EA', 'ORLY', 'TSCO'}
These meet four values {'EW', 'PAYC', 'TTWO', 'AME', 'IEX', 'IDXX', 'ANET', 'RMD', 'SWKS', 'HRL', 'UNH', 'CCOI', 'ORLY', 'APH', 'PGR', 'TSCO'}
Now, I want to insert the output, of the get_from_min_match function into a Sqlite database. Its structure looks like this;
dbase.execute("INSERT OR REPLACE INTO min_match (DATE, SYMBOL, NAME, NUMBEROFMETRICSMET) \
VALUES (?,?,?,?)", (datetime.today(), symbol, name, NUMBEROFMETRICSMET?))
dbase.commit()
So, it's basically a new function to calculate the "NUMBEROFMETRICSMET" parameter rather than calling each of these functions many times. And I want the output of the function inserted into the database. How to achieve this? Here 3, 4 would be the number of times the companies matched.
date ULTA name 3
date EW name 4
...
should be the result.
How can I achieve this? Thanks!
I fixed this by just using my already written function;
count_elem = generate_elem_count()
print("Count Elem: " + str(count_elem))
This prints {'AMPY': 1} and so on.
I have this dictionary of lists of dictionaries (I cannot change the structure for the work):
dict_countries = {'gb': [{'datetime': '1955-10-10 17:00:00', 'city': 'chester'},
{'datetime': '1974-10-10 23:00:00', 'city': 'chester'}],
'us': [{'datetime': '1955-10-10 17:00:00', 'city': 'hudson'}]
}
And the function:
def Seen_in_the_city(dict_countries:dict,)-> dict:
city_dict = {}
for each_country in dict_countries.values():
for each_sight in each_country:
citi = each_sight["city"]
if citi in city_dict.keys():
city_dict[each_sight["city"]] =+1
else:
city_dict[citi] =+1
return city_dict
I get:
{'chester': 1,'hudson': 1}
instead of
{'chester': 2,'hudson': 1}
You can try using Counter (a subclass of dict) from the collections module in the Python Standard Library:
from collections import Counter
c = Counter()
for key in dict_countries:
for d in dict_countries[key]:
c.update(v for k, v in d.items() if k == 'city')
print(c)
Output
Counter({'chester': 2, 'hudson': 1})
Try:
output = dict()
for country, cities in dict_countries.items():
for city in cities:
if city["city"] not in output:
output[city["city"]] = 0
output[city["city"]] += 1
You don't need to say +1 in order to add a positive number. Also in the if citi statement, += 1 means adding 1 to the existing value (1+1) where as =+1 is basically saying giving it a value of 1 once again.
if citi in city_dict.keys():
city_dict[each_sight["city"]] +=1
else:
city_dict[citi] = 1
You can use groupby from itertools
from itertools import groupby
print({i: len(list(j)[0]) for i,j in groupby(dict_countries.values(), key=lambda x: x[0]["city"])})
If you don't want additional imports (not that you shouldn't use Counter) here's another way:
dict_countries = {'gb': [{'datetime': '1955-10-10 17:00:00', 'city': 'chester'},
{'datetime': '1974-10-10 23:00:00', 'city': 'chester'}],
'us': [{'datetime': '1955-10-10 17:00:00', 'city': 'hudson'}]
}
def Seen_in_the_city(dict_countries:dict,)-> dict:
city_dict = {}
for each_country in dict_countries.values():
for each_sight in each_country:
citi = each_sight["city"]
city_dict[citi] = city_dict.get(citi, 0) + 1
return city_dict
print(Seen_in_the_city(dict_countries))
I am trying to use a returning into clause with cx_oracle (version 7.3) to grab the ids generated by a sequence in one of my tables. However I am not getting the value of the sequence field that I am expecting. I want the value of an_ash_s.nextval
The call to my function looks like this:
self.insert_into_columns_with_return('AN_SHIPMENT', payload.shipment_columns, payload.shipment_rows, id_sequence='an_ash_s.nextval')
where payload.shipment_columns looks like
['ASH_ID', 'ASH_AJ_ID', 'ASH_CD_PROCESS_STATUS', 'ASH_PROCESS_ID', 'ASH_SHIPMENT_KEY', 'ASH_ORG_OPERATIONAL_ID', 'ASH_SEED_EQUIP', 'ASH_SEED_EQUIP_CODE', 'ASH_SHIP_DATE', 'ASH_SHIP_DATE_DSP', 'ASH_SHIP_DIRECTION', 'ASH_SHIP_DIRECTION_DSP', 'ASH_FREIGHT_TERMS', 'ASH_FREIGHT_TERMS_DSP', 'ASH_WEIGHT', 'ASH_WEIGHT_MEASURE', 'ASH_HAZ_FLAG', 'ASH_CREATE_DATE', 'ASH_PACKAGE_COUNT', 'ASH_SHIPPING_CLASS_ID', 'ASH_SHIPPING_CLASS_TYPE', 'ASH_ORG_CONSIGNOR_ID', 'ASH_ORG_CONSIGNOR_NAME', 'ASH_LOC_ORIG_ID', 'ASH_LOC_ORIG_COUNTRY_ID', 'ASH_LOC_ORIG_STATE_CODE', 'ASH_LOC_ORIG_CITY', 'ASH_LOC_ORIG_POSTAL_CODE', 'ASH_ORG_CONSIGNEE_ID', 'ASH_ORG_CONSIGNEE_NAME', 'ASH_LOC_DEST_ID', 'ASH_LOC_DEST_COUNTRY_ID', 'ASH_LOC_DEST_STATE_CODE', 'ASH_LOC_DEST_CITY', 'ASH_LOC_DEST_POSTAL_CODE', 'ASH_ERROR_MESSAGE', 'ASH_USE_CURRENT_DATE']
And Payload Shipment rows looks like:
[[310, '5', None, 'Test', '*ISP_CLT', 'LTL', 'LTL', datetime.date(2019, 4, 15), '03/15/2019', 'I', 'Inbound', 'P', 'Pre-Paid', 3000, 'LB', 'N', datetime.date(2019, 3, 24), None, '70', None, None, None, '241144', 'US', 'GA', 'ANYTOWN', '25451', None, None, '12345', 'US', 'VA', 'BANKS', '45678', None, 'N']]
Based on the feedback I have received I have modified my function to look like this:
def insert_into_columns_with_return(self, table_name, columns, rows, id_sequence=None):
arrstr = rows
col_str = ''
for col_id in range(1, len(columns) + 1):
col_str += columns[col_id - 1]
if col_id < len(columns):
col_str += ', '
with self.conn.cursor() as cur:
intCol = cur.var(int)
childIdVar = cur.var(int, arraysize=len(arrstr))
cur.setinputsizes(None, childIdVar)
if(id_sequence == None):
sql = "INSERT INTO {table_name} ({column_names}) VALUES (:arrstr) RETURNING ASH_ID INTO :intCol"
sql = sql.format(table_name=table_name, column_names =col_str)
elif (id_sequence != None):
sql = "INSERT INTO {table_name} ({column_names}) VALUES ( {id_sequence}, :arrstr ) RETURNING ASH_ID INTO :intCol"
sql = sql.format(table_name=table_name, column_names=col_str, id_sequence=id_sequence)
cur.executemany(sql, [tuple(x) for x in arrstr])
for ix, stri in enumerate(arrstr):
print("IDs Str", stri, "is", childIdVar.getvalue(ix))
self.conn.commit()
However I am now getting an error cx_Oracle.DatabaseError: ORA-01036: illegal variable name/number
I thought the goal was to feed execute many a list of tuples but I think it does not accept it.
As your code is not complete and usable, I tried the following bit of code, taken from the doc here
cursor = con.cursor()
intCol = cursor.var(int)
arrstr=[ ("First" ),
("Second" ),
("Third" ),
("Fourth" ),
("Fifth" ),
("Sixth" ),
("Seventh" ) ]
print("Adding rows", arrstr)
print(intCol.getvalue())
childIdVar = cursor.var(int, arraysize=len(arrstr))
cursor.setinputsizes(None, childIdVar)
cursor.executemany("insert into treturn values (tret.nextval, :arrstr) returning c1 into :intCol",
[(i,) for i in arrstr])
for ix, stri in enumerate(arrstr):
print("IDs Str", stri, "is",
childIdVar.getvalue(ix))
and I get this output
Adding rows ['First', 'Second', 'Third', 'Fourth', 'Fifth', 'Sixth', 'Seventh']
None
IDs Str First is [24]
IDs Str Second is [25]
IDs Str Third is [26]
IDs Str Fourth is [27]
IDs Str Fifth is [28]
IDs Str Sixth is [29]
IDs Str Seventh is [30]
class MyOwnClass:
# list who contains the queries
queries = []
# a template dict
template_query = {}
template_query['name'] = 'mat'
template_query['age'] = '12'
obj = MyOwnClass()
query = obj.template_query
query['name'] = 'sam'
query['age'] = '23'
obj.queries.append(query)
query2 = obj.template_query
query2['name'] = 'dj'
query2['age'] = '19'
obj.queries.append(query2)
print obj.queries
It gives me
[{'age': '19', 'name': 'dj'}, {'age': '19', 'name': 'dj'}]
while I expect to have
[{'age': '23' , 'name': 'sam'}, {'age': '19', 'name': 'dj'}]
I thought to use a template for this list because I'm gonna to use it very often and there are some default variable who does not need to be changed.
Why does doing it the template_query itself changes? I'm new to python and I'm getting pretty confused.
this is because you are pointing to the same dictionary each time ... and overwriting the keys ...
# query = obj.template_query - dont need this
query = {}
query['name'] = 'sam'
query['age'] = '23'
obj.queries.append(query)
query2 = {} #obj.template_query-dont need this
query2['name'] = 'dj'
query2['age'] = '19'
obj.queries.append(query2)
this should demonstrate your problem
>>> q = {'a':1}
>>> lst = []
>>> lst.append(q)
>>> q['a']=2
>>> lst
[{'a': 2}]
>>> lst.append(q)
>>> lst
[{'a': 2}, {'a': 2}]
you could implement your class differently
class MyOwnClass:
# a template dict
#property
def template_query():
return {'name':'default','age':-1}
this will make obj.template_query return a new dict each time
This is because query and query2 are both referring to the same object. obj.template_query, in this case.
Better to make a template factory:
def template_query(**kwargs):
template = {'name': 'some default value',
'age': 'another default value',
'car': 'generic car name'}
template.update(**kwargs)
return template
That creates a new dictionary every time it's called. So you can do:
>>> my_query = template_query(name="sam")
>>> my_query
{'name': 'sam', 'age': 'another default value', 'car': 'generic car name'}
You're copying the same dict into query2. Instead, you might want to create the needed dict by creating a function template_query() and constructing a new dict each time:
class MyOwnClass:
# a template dict
def template_query():
d = {}
d['name'] = 'mat'
d['age'] = '12'
d['car'] = 'ferrari'
return d