Dictionary from web scv file in Python - python

I try to return a dictionary,that contains 3 different values.
(The total number of sales per Branch and per Customer Type)
I need to return dictinary like
{"A": {"Member": 230, "Normal": 351},
"B": {"Member": 123, "Normal": 117},
"C": {"Member": 335, "Normal": 18}}
What am I doing wrong, can someone explain me, please ?
import csv
dict_from_csv = {}
file = open('supermarket_sales.csv')
csv_reader = csv.reader(file)
next(csv_reader)
for row in csv_reader:
branch= row[1]
Customer = row[3]
total = float(row[9])
curent_total= dict_from_csv.get(branch)
if dict_from_csv.get(branch) is None:
dict_from_csv[branch]={}
else:
if dict_from_csv[branch].get(Customer) is None:
dict_from_csv[branch]={Customer: 0}
else :
curent_total= dict_from_csv.get(branch)
print(dict_from_csv)
I can return only:
{'A': {'Member': 0},
'C': {'Normal': 0},
'B': {'Normal': 0}}
scv file: https://app.box.com/s/f4hcfkferizntbev3ou8hso6nyfccf74

First mistake:
dict_from_csv[branch]={Customer: 0}
it creates new dictionary with new Customer but it also removes previous dictionary with prediction customer.
It should be
dict_from_csv[branch][Customer] = 0
and it adds new customer to existing dictionary.
Second mistake: you don't have += 1 to count elements
dict_from_csv[branch][customer] += 1
Full working code:
import csv
f = open('supermarket_sales.csv')
csv_reader = csv.reader(f)
next(csv_reader)
dict_from_csv = {}
for row in csv_reader:
branch= row[1]
customer = row[3]
if branch not in dict_from_csv:
dict_from_csv[branch] = {}
if customer not in dict_from_csv[branch]:
dict_from_csv[branch][customer] = 0
dict_from_csv[branch][customer] += 1
print(dict_from_csv)
Result:
{
'A': {'Member': 167, 'Normal': 173},
'C': {'Normal': 159, 'Member': 169},
'B': {'Member': 165, 'Normal': 167}
}
BTW: if you would use pandas.DataFrame then using
import pandas as pd
df = pd.read_csv('/home/furas/supermarket_sales.csv')
sizes = df.groupby(['Branch','Customer type']).size()
print(sizes)
you would get
Branch Customer type
A Member 167
Normal 173
B Member 165
Normal 167
C Member 169
Normal 159
dtype: int64
Bigger problem makes to convert it to expected dictionary.
There is .to_dict() and .to_json() but they don't create expected dictionary.
But this works for me:
result = dict()
for ((brand, customer), number) in sizes.items():
if brand not in result:
result[brand] = dict()
result[brand][customer] = number
print(result)

Related

Appending dictionaries to a list from a csv file

I have a csv file like:
Rack
Tube
Well
sample_vol
solvent_vol
1
0
A1
230
400
1
1
B1
200
20
2
2
G1
5
30
3
1
A1
90
40
3
20
A1
100
90
And i'm trying to make mappings between the different columns for each row, using dictionaries. But I'm stuck at how to make separate dictionaries within a single list for each of the different values of "Rack".
Basically I need an output like:
print(rack_list)
[{'0':230,'1':200},{'2':5},{'1':90,'20':200}]
Where each dict in the list stores the mappings for each Rack.
This is what I have so far:
csv_reader = csv.DictReader(csvfile)
header = csv_reader.fieldnames
solvent_volume_map = {}
sample_volume_map = {}
max_rack = None
rack = None
rack_list = []
for csv_row in csv_reader:
rack = int(csv_row["Rack"])
if max_rack == None or max_rack < rack:
max_rack = rack
destination_well = csv_row['Well']
source_tube = csv_row['Tube']
source_rack = csv_row['Rack']
print(source_rack)
try:
solvent_volume = float(csv_row['solvent_vol'])
sample_volume = float(csv_row['sample_vol'])
except ValueError as e:
# blank csv entry
solvent_volume = "skip"
sample_volume = "skip"
solvent_volume_map[destination_well] = solvent_volume
for i in range(max_rack):
sample_volume_map[source_tube] = sample_volume
rack_list.append(sample_volume_map)
You can go with pandas package or else with csv.
with csv package
Source Code 1
import csv
with open("./test.csv", newline="") as f:
csv_reader = csv.DictReader(f)
header = csv_reader.fieldnames
rack_idx_map = {} # mapping of rack number and corresponding index no. in rack_list
idx = 0 # index number
rack_list = []
for csv_row in csv_reader:
rack = int(csv_row["Rack"])
if rack in rack_idx_map: # if rack number is present in rack_idx_map
rack_list[rack_idx_map[rack]][csv_row["Tube"]] = int(csv_row["sample_vol"])
else: # if new rack number then add new dict and it's mapping
rack_list.append({csv_row["Tube"]: int(csv_row["sample_vol"])})
rack_idx_map[rack] = idx
idx += 1
print(rack_list)
print(rack_idx_map) # rack 1 mapped at index 0, rack 2 mapped at index 1 and so on
OUTPUT
[{'0': 230, '1': 200}, {'2': 5}, {'1': 90, '20': 100}]
{1: 0, 2: 1, 3: 2}
Source Code 2
import csv
with open("./test.csv", newline="") as f:
csv_reader = csv.DictReader(f)
header = csv_reader.fieldnames
rack = None
rack_list = []
temp_dict = {}
prev_rack = 1
for csv_row in csv_reader:
rack = int(csv_row["Rack"])
if rack != prev_rack:
rack_list.append(temp_dict)
temp_dict = {}
temp_dict[csv_row["Tube"]] = int(csv_row["sample_vol"])
prev_rack = rack
rack_list.append(temp_dict)
rack_list
OUTPUT:
[{'0': 230, '1': 200}, {'2': 5}, {'1': 90, '20': 100}]
PS of Source Code 2:
Assuming Rack is in sorted order and it start's from 1
with pandas package
Source Code
import pandas as pd
df = pd.read_csv("./test.csv") # read csv file
# pandas will by default type cast the data type
df["Tube"] = df["Tube"].astype(str) # will cast Tube from int to str
df.groupby("Rack")[["Tube", "sample_vol"]].apply(lambda row: dict([*row.values])).tolist()
# grouping data based on Rack then selecting Tube and sample_vol column then converting it's row value to dict and back to list
OUTPUT:
[{'0': 230, '1': 200}, {'2': 5}, {'1': 90, '20': 100}]
You can use pandas:
import pandas as pd
df = pd.read_csv('1.csv')
rack_list = df.groupby(['Rack'])[['Tube','sample_vol']].apply(lambda g:dict(map(tuple, g.values.tolist()))).tolist()
print(rack_list)
Output:
[{0: 230, 1: 200}, {2: 5}, {1: 90, 20: 100}]

Prompting user to enter column names from a csv file (not using pandas framework)

I am trying to get the column names from a csv file with nearly 4000 rows. There are about 14 columns.
I am trying to get each column and store it into a list and then prompt the user to enter themselves at least 5 columns they want to look at.
The user should then be able to type how many results they want to see (they should be the smallest results from that column).
For example, if they choose clothing_brand, "8", the 8 least expensive brands are displayed.
So far, I have been able to use "with" and get a list that contains each column, but I am having trouble prompting the user to pick at least 5 of those columns.
You can very well use the Python input to get the input from user, if you want to prompt no. of times, use the for loop to get inputs. Check Below code:
def get_user_val(no_of_entries = 5):
print('Enter {} inputs'.format(str(no_of_entries)))
val_list = []
for i in range(no_of_entries):
val_list.append(input('Enter Input {}:'.format(str(i+1))))
return val_list
get_user_val()
I hope I didn't misunderstand what you mean, the code below is what you want?
You can put the data into the dict then sorted it.
Solution1
from io import StringIO
from collections import defaultdict
import csv
import random
import pprint
def random_price():
return random.randint(1, 10000)
def create_test_data(n_row=4000, n_col=14, sep=','):
columns = [chr(65+i) for i in range(n_col)] # A, B ...
title = sep.join(columns)
result_list = [title]
for cur_row in range(n_row):
result_list.append(sep.join([str(random_price()) for _ in range(n_col)]))
return '\n'.join(result_list)
def main():
if 'load CSV':
test_content = create_test_data(n_row=10, n_col=5)
dict_brand = defaultdict(list)
with StringIO(test_content) as f:
rows = csv.reader(f, delimiter=',')
for idx, row in enumerate(rows):
if idx == 0: # title
columns = row
continue
for i, value in enumerate(row):
dict_brand[columns[i]].append(int(value))
pprint.pprint(dict_brand, indent=4, compact=True, width=120)
user_choice = input('input columns (brand)')
number_of_results = 5 # input('...')
watch_columns = user_choice.split(' ') # D E F
for col_name in watch_columns:
cur_brand_list = dict_brand[col_name]
print(sorted(cur_brand_list, reverse=True)[:number_of_results])
# print(f'{col_name} : {sorted(cur_brand_list)}') # ASC
# print(f'{col_name} : {sorted(cur_brand_list, reverse=True)}') # DESC
if __name__ == '__main__':
main()
defaultdict(<class 'list'>,
{ 'A': [9424, 6352, 5854, 5870, 912, 9664, 7280, 8306, 9508, 8230],
'B': [1539, 1559, 4461, 8039, 8541, 4540, 9447, 512, 7480, 5289],
'C': [7701, 6686, 1687, 3134, 5723, 6637, 6073, 1925, 4207, 9640],
'D': [4313, 3812, 157, 6674, 8264, 2636, 765, 2514, 9833, 1810],
'E': [139, 4462, 8005, 8560, 5710, 225, 5288, 6961, 6602, 4609]})
input columns (brand)C D
[9640, 7701, 6686, 6637, 6073]
[9833, 8264, 6674, 4313, 3812]
Solution2: Using Pandas
def pandas_solution(test_content: str, watch_columns= ['C', 'D'], number_of_results=5):
with StringIO(test_content) as f:
df = pd.read_csv(StringIO(f.read()), usecols=watch_columns,
na_filter=False) # it can add performance (ignore na)
dict_result = defaultdict(list)
for col_name in watch_columns:
dict_result[col_name].extend(df[col_name].sort_values(ascending=False).head(number_of_results).to_list())
df = pd.DataFrame.from_dict(dict_result)
print(df)
C D
0 9640 9833
1 7701 8264
2 6686 6674
3 6637 4313
4 6073 3812

Best simple and effective solution for data organization

I'm new with Python programming so I'm doing a bunch of practice exercise in order to improve my skills.
Therefore, I would like to show you guys my approach on this example and if you could let me know what you think I would be grateful!
Exercise:
Given a list of costumers IDs, I have to segment them by the following logic:
If ID is multiple of 7 and multiple of 3 then segment 'A'
If ID is multiple of 3 then segment 'B'
If ID is multiple of 7, then segment 'C'
Else, segment 'D'
What I've done:
from collections import Counter
from datetime import datetime
import os.path
import json
date = datetime.today().strftime('%Y-%m-%d')
customer_indices = [list of IDs] ex: [123981,12398,123157,12371...]
def segment(customer):
if customer % 7 == 0 & customer % 3 == 0:
return 'A'
elif customer % 7 == 0:
return 'B'
elif customer % 3 == 0:
return 'C'
else:
return 'D'
def split_customers(customers):
a = []
b = []
c = []
d = []
for customer in customers:
if customer % 7 == 0 & customer % 3 == 0:
a.append(customer)
elif customer % 7 == 0:
b.append(customer)
elif customer % 3 == 0:
c.append(customer)
else:
d.append(customer)
return a,b,c,d
segmentation = [segment(customer) for customer in customer_indices]
print('Segmentation list: ')
print(segmentation)
print('\n')
segmentation_counter = Counter(segmentation)
print('Count of clients per segment: ')
print(f"A: {segmentation_counter['A']}")
print(f"B: {segmentation_counter['B']}")
print(f"C: {segmentation_counter['C']}")
print(f"D: {segmentation_counter['D']}")
a, b, c, d = split_customers(customer_indices)
main_dict = {'Date': date,
'Segmentation': {
'A Clients': {
'Count': segmentation_counter['A'],
'Customers': a},
'B Clients': {
'Count': segmentation_counter['B'],
'Customers': b},
'C Clients': {
'Count': segmentation_counter['C'],
'Customers': c},
'D Clients': {
'Count': segmentation_counter['D'],
'Customers': d}}}
main_list = [main_dict]
if not os.path.exists('Data/customer_segmentation.json'):
os.makedirs('Data')
if os.path.isfile('Data/customer_segmentation.json'):
with open('Data/customer_segmentation.json') as file:
data = json.load(file)
file.close()
data.append(main_dict)
with open('Data/customer_segmentation.json', 'w') as file:
json.dump(data, file, indent=2)
file.close()
else:
file = open('Data/customer_segmentation.json', 'w')
json.dump(main_list, file, indent=2)
file.close()
The original code has a with open txt function at first that extracts the client's ID list from a txt within the same directory of this .py
The main idea of this solution it would be to execute this every day so the json file will update with a new list for each day that it's run, so if I want to do an analysis of the segmentation growth, it would be pretty easy to do so.
What do you think?

writing data to csv from dictionaries with multiple values per key

Background
I am storing data in dictionaries. The dictionaries can be off different length and in a particular dictionary there could be keys with multiple values. I am trying to spit out the data on a CSV file.
Problem/Solution
Image 1 is how my actual output prints out. Image 2 shows how i would want my output to actually printout. Image 2 is the desired output.
CODE
import csv
from itertools import izip_longest
e = {'Lebron':[25,10],'Ray':[40,15]}
c = {'Nba':5000}
def writeData():
with open('file1.csv', mode='w') as csv_file:
fieldnames = ['Player Name','Points','Assist','Company','Total Employes']
writer = csv.writer(csv_file)
writer.writerow(fieldnames)
for employee, company in izip_longest(e.items(), c.items()):
row = list(employee)
row += list(company) if company is not None else ['', ''] # Write empty fields if no company
writer.writerow(row)
writeData()
I am open to all solutions/suggestions that can help me get my desired output format.
For a much simpler answer, you just need to add one line of code to what you have:
row = [row[0]] + row[1]
so:
for employee, company in izip_longest(e.items(), c.items()):
row = list(employee)
row = [row[0]] + row[1]
row += list(company) if company is not None else ['', ''] # Write empty fields if no company
from collections import defaultdict
values = defaultdict(dict)
values[Name1] = {Points: [], Assist: [], Company: blah, Total_Employees: 123}
for generating the output, traverse through each item in the values to give you names, and populate other values using the key_values in the nested dict.
Again, make sure that there no multiple entries with same name, or choose the one with unique entries in the defaultdict.
Demo for the example-
>>> from collections import defaultdict
>>> import csv
>>> values = defaultdict(dict)
>>> vals = [["Lebron", 25, 10, "Nba", 5000], ["Ray", 40, 15]]
>>> fields = ["Name", "Points", "Assist", "Company", "Total Employes"]
>>> for item in vals:
... if len(item) == len(fields):
... details = dict()
... for j in range(1, len(fields)):
... details[fields[j]] = item[j]
... values[item[0]] = details
... elif len(item) < len(fields):
... details = dict()
... for j in range(1, len(fields)):
... if j+1 <= len(item):
... details[fields[j]] = item[j]
... else:
... details[fields[j]] = ""
... values[item[0]] = details
...
>>> values
defaultdict(<class 'dict'>, {'Lebron': {'Points': 25, 'Assist': 10, 'Company': 'Nba', 'Total Employes': 5000}, 'Ray': {'Points': 40, 'Assist': 15, 'Company': '', 'Total Employes': ''}})
>>> csv_file = open('file1.csv', 'w')
>>> writer = csv.writer(csv_file)
>>> for i in values:
... row = [i]
... for j in values[i]:
... row.append(values[i][j])
... writer.writerow(row)
...
23
13
>>> csv_file.close()
Contents of 'file1.csv':
Lebron,25,10,Nba,5000
Ray,40,15,,

re reading a csv file in python without loading it again

I made the following code which works but I want to improve it. I don't want to re-read the file, but if I delete sales_input.seek(0) it won't iterate throw each row in sales. How can i improve this?
def computeCritics(mode, cleaned_sales_input = "data/cleaned_sales.csv"):
if mode == 1:
print "creating customer.critics.recommendations"
critics_output = open("data/customer/customer.critics.recommendations",
"wb")
ID = getCustomerSet(cleaned_sales_input)
sales_dict = pickle.load(open("data/customer/books.dict.recommendations",
"r"))
else:
print "creating books.critics.recommendations"
critics_output = open("data/books/books.critics.recommendations",
"wb")
ID = getBookSet(cleaned_sales_input)
sales_dict = pickle.load(open("data/books/users.dict.recommendations",
"r"))
critics = {}
# make critics dict and pickle it
for i in ID:
with open(cleaned_sales_input, 'rb') as sales_input:
sales = csv.reader(sales_input) # read new
for j in sales:
if mode == 1:
if int(i) == int(j[2]):
sales_dict[int(j[6])] = 1
else:
if int(i) == int(j[6]):
sales_dict[int(j[2])] = 1
critics[int(i)] = sales_dict
pickle.dump(critics, critics_output)
print "done"
cleaned_sales_input looks like
6042772,2723,3546414,9782072488887,1,9.99,314968
6042769,2723,3546414,9782072488887,1,9.99,314968
...
where number 6 is the book ID and number 0 is the customer ID
I want to get a dict wich looks like
critics = {
CustomerID1: {
BookID1: 1,
BookID2: 0,
........
BookIDX: 0
},
CustomerID2: {
BookID1: 0,
BookID2: 1,
...
}
}
or
critics = {
BookID1: {
CustomerID1: 1,
CustomerID2: 0,
........
CustomerIDX: 0
},
BookID1: {
CustomerID1: 0,
CustomerID2: 1,
...
CustomerIDX: 0
}
}
I hope this isn't to much information
Here are some suggestions:
Let's first look at this code pattern:
for i in ID:
for j in sales:
if int(i) == int(j[2])
notice that i is only being compared with j[2]. That's its only purpose in the loop. int(i) == int(j[2]) can only be True at most once for each i.
So, we can completely remove the for i in ID loop by rewriting it as
for j in sales:
key = j[2]
if key in ID:
Based on the function names getCustomerSet and getBookSet, it sounds as if
ID is a set (as opposed to a list or tuple). We want ID to be a set since
testing membership in a set is O(1) (as opposed to O(n) for a list or tuple).
Next, consider this line:
critics[int(i)] = sales_dict
There is a potential pitfall here. This line is assigning sales_dict to
critics[int(i)] for each i in ID. Each key int(i) is being mapped to the very same dict. As we loop through sales and ID, we are modifying sales_dict like this, for example:
sales_dict[int(j[6])] = 1
But this will cause all values in critics to be modified simultaneously, since all keys in critics point to the same dict, sales_dict. I doubt that is what you want.
To avoid this pitfall, we need to make copies of the sales_dict:
critics = {i:sales_dict.copy() for i in ID}
def computeCritics(mode, cleaned_sales_input="data/cleaned_sales.csv"):
if mode == 1:
filename = 'customer.critics.recommendations'
path = os.path.join("data/customer", filename)
ID = getCustomerSet(cleaned_sales_input)
sales_dict = pickle.load(
open("data/customer/books.dict.recommendations", "r"))
key_idx, other_idx = 2, 6
else:
filename = 'books.critics.recommendations'
path = os.path.join("data/books", filename)
ID = getBookSet(cleaned_sales_input)
sales_dict = pickle.load(
open("data/books/users.dict.recommendations", "r"))
key_idx, other_idx = 6, 2
print "creating {}".format(filename)
ID = {int(item) for item in ID}
critics = {i:sales_dict.copy() for i in ID}
with open(path, "wb") as critics_output:
# make critics dict and pickle it
with open(cleaned_sales_input, 'rb') as sales_input:
sales = csv.reader(sales_input) # read new
for j in sales:
key = int(j[key_idx])
if key in ID:
other_key = int(j[other_idx])
critics[key][other_key] = 1
critics[key] = sales_dict
pickle.dump(dict(critics), critics_output)
print "done"
#unutbu's answer is better but if you are stuck with this structure you can put the whole file in memory:
sales = []
with open(cleaned_sales_input, 'rb') as sales_input:
sales_reader = csv.reader(sales_input)
[sales.append(line) for line in sales_reader]
for i in ID:
for j in sales:
#do stuff

Categories