I have a csv file like:
Rack
Tube
Well
sample_vol
solvent_vol
1
0
A1
230
400
1
1
B1
200
20
2
2
G1
5
30
3
1
A1
90
40
3
20
A1
100
90
And i'm trying to make mappings between the different columns for each row, using dictionaries. But I'm stuck at how to make separate dictionaries within a single list for each of the different values of "Rack".
Basically I need an output like:
print(rack_list)
[{'0':230,'1':200},{'2':5},{'1':90,'20':200}]
Where each dict in the list stores the mappings for each Rack.
This is what I have so far:
csv_reader = csv.DictReader(csvfile)
header = csv_reader.fieldnames
solvent_volume_map = {}
sample_volume_map = {}
max_rack = None
rack = None
rack_list = []
for csv_row in csv_reader:
rack = int(csv_row["Rack"])
if max_rack == None or max_rack < rack:
max_rack = rack
destination_well = csv_row['Well']
source_tube = csv_row['Tube']
source_rack = csv_row['Rack']
print(source_rack)
try:
solvent_volume = float(csv_row['solvent_vol'])
sample_volume = float(csv_row['sample_vol'])
except ValueError as e:
# blank csv entry
solvent_volume = "skip"
sample_volume = "skip"
solvent_volume_map[destination_well] = solvent_volume
for i in range(max_rack):
sample_volume_map[source_tube] = sample_volume
rack_list.append(sample_volume_map)
You can go with pandas package or else with csv.
with csv package
Source Code 1
import csv
with open("./test.csv", newline="") as f:
csv_reader = csv.DictReader(f)
header = csv_reader.fieldnames
rack_idx_map = {} # mapping of rack number and corresponding index no. in rack_list
idx = 0 # index number
rack_list = []
for csv_row in csv_reader:
rack = int(csv_row["Rack"])
if rack in rack_idx_map: # if rack number is present in rack_idx_map
rack_list[rack_idx_map[rack]][csv_row["Tube"]] = int(csv_row["sample_vol"])
else: # if new rack number then add new dict and it's mapping
rack_list.append({csv_row["Tube"]: int(csv_row["sample_vol"])})
rack_idx_map[rack] = idx
idx += 1
print(rack_list)
print(rack_idx_map) # rack 1 mapped at index 0, rack 2 mapped at index 1 and so on
OUTPUT
[{'0': 230, '1': 200}, {'2': 5}, {'1': 90, '20': 100}]
{1: 0, 2: 1, 3: 2}
Source Code 2
import csv
with open("./test.csv", newline="") as f:
csv_reader = csv.DictReader(f)
header = csv_reader.fieldnames
rack = None
rack_list = []
temp_dict = {}
prev_rack = 1
for csv_row in csv_reader:
rack = int(csv_row["Rack"])
if rack != prev_rack:
rack_list.append(temp_dict)
temp_dict = {}
temp_dict[csv_row["Tube"]] = int(csv_row["sample_vol"])
prev_rack = rack
rack_list.append(temp_dict)
rack_list
OUTPUT:
[{'0': 230, '1': 200}, {'2': 5}, {'1': 90, '20': 100}]
PS of Source Code 2:
Assuming Rack is in sorted order and it start's from 1
with pandas package
Source Code
import pandas as pd
df = pd.read_csv("./test.csv") # read csv file
# pandas will by default type cast the data type
df["Tube"] = df["Tube"].astype(str) # will cast Tube from int to str
df.groupby("Rack")[["Tube", "sample_vol"]].apply(lambda row: dict([*row.values])).tolist()
# grouping data based on Rack then selecting Tube and sample_vol column then converting it's row value to dict and back to list
OUTPUT:
[{'0': 230, '1': 200}, {'2': 5}, {'1': 90, '20': 100}]
You can use pandas:
import pandas as pd
df = pd.read_csv('1.csv')
rack_list = df.groupby(['Rack'])[['Tube','sample_vol']].apply(lambda g:dict(map(tuple, g.values.tolist()))).tolist()
print(rack_list)
Output:
[{0: 230, 1: 200}, {2: 5}, {1: 90, 20: 100}]
Related
I am using the unleashed_py library to extract Unleashed data.
The sample of the output is as below where there could be several items in the invoice:
[{
'OrderNumber': 'SO-00000742',
'QuoteNumber': None,
'InvoiceDate': '/Date(1658496322067)/',
'InvoiceLines': [{'LineNumber': 1,
'LineType': None},
{'LineNumber': 2,
'LineType': None}],
'Guid': '8f6b89da-1e6e-42288a24-902a-038041e04f06',
'LastModifiedOn': '/Date(1658496322221)/'}]
I need to get a df:
If I run the below script, the invoice lines just get appended with the common fields such as ordernumber, quotenumber, invoicedate, guide, and lastmodifiedon not getting repeated.
order_number = []
quote_number = []
invoice_date = []
invoicelines = []
invoice_line_number = []
invoice_line_type = []
guid = []
last_modified = []
for item in df:
order_number.append(item.get('OrderNumber'))
quote_number.append(item.get('QuoteNumber'))
invoice_date.append(item.get('InvoiceDate'))
guid.append(item.get('Guid'))
last_modified.append(item.get('LastModifiedOn'))
lines = item.get('InvoiceLines')
for item_sub_2 in lines:
invoice_line_number.append('LineNumber')
invoice_line_type.append('LineType')
df_order_number = pd.DataFrame(order_number)
df_quote_number = pd.DataFrame(quote_number)
df_invoice_date = pd.DataFrame(invoice_date)
df_invoice_line_number = pd.DataFrame(invoice_line_number)
df_invoice_line_type = pd.DataFrame(invoice_line_type)
df_guid = pd.DataFrame(guid)
df_last_modified = pd.DataFrame(last_modified)
df_row = pd.concat([
df_order_number,
df_quote_number,
df_invoice_date,
df_invoice_line_number,
df_invoice_line_type,
df_guid,
df_last_modified
], axis = 1)
What am I doing wrong?
You don't need to iterate, just create the dataframe out of the list of dictionaries you have, then explode InvoiceLines columns then apply pd.Series and join it with the original dataframe:
data = [{
'OrderNumber': 'SO-00000742',
'QuoteNumber': None,
'InvoiceDate': '/Date(1658496322067)/',
'InvoiceLines': [{'LineNumber': 1,
'LineType': None},
{'LineNumber': 2,
'LineType': None}],
'Guid': '8f6b89da-1e6e-42288a24-902a-038041e04f06',
'LastModifiedOn': '/Date(1658496322221)/'}]
df=pd.DataFrame(data).explode('InvoiceLines')
out=pd.concat([df['InvoiceLines'].apply(pd.Series),
df.drop(columns=['InvoiceLines'])],
axis=1)
OUTPUT:
#out
LineNumber LineType OrderNumber QuoteNumber InvoiceDate \
0 1.0 NaN SO-00000742 None /Date(1658496322067)/
0 2.0 NaN SO-00000742 None /Date(1658496322067)/
Guid LastModifiedOn
0 8f6b89da-1e6e-42288a24-902a-038041e04f06 /Date(1658496322221)/
0 8f6b89da-1e6e-42288a24-902a-038041e04f06 /Date(1658496322221)/
I'm leaving the date conversion and column renames for you cause I believe you can do that yourself.
fellow developers in the StackOverflow.
I have string data in
'key=apple; age=10; key=boy; age=3'
How can we convert it into the pandas' data frame such that key and age will be the header and all the values in the column?
key age
apple 10
boy 3
Try this:
import pandas as pd
data = 'key=apple; age=10; key=boy; age=3'
words = data.split(";")
key = []
age = []
for word in words:
if "key" in word:
key.append(word.split("=")[1])
else:
age.append(word.split("=")[1])
df = pd.DataFrame(key, columns=["key"])
df["age"] = age
print(df)
You can try this:
import pandas as pd
str_stream = 'key=apple; age=10; key=boy; age=3'
lst_kv = str_stream.split(';')
# lst_kv => ['key=apple', ' age=10', ' key=boy', ' age=3']
res= [{s.split('=')[0].strip(): s.split('=')[1] for s in lst_kv[i:i+2]}
for i in range(len(lst_kv)//2)
]
df = pd.DataFrame(res)
df
Output:
key age
0 apple 10
1 boy 10
More explanation for one line res :
res = []
for i in range(len(lst_kv)//2):
dct_tmp = {}
for s in lst_kv[i:i+2]:
kv = s.split('=')
dct_tmp[kv[0].strip()] = kv[1]
res.append(dct_tmp)
res
Output:
[{'key': 'apple', 'age': '10'}, {'age': '10', 'key': 'boy'}]
I try to return a dictionary,that contains 3 different values.
(The total number of sales per Branch and per Customer Type)
I need to return dictinary like
{"A": {"Member": 230, "Normal": 351},
"B": {"Member": 123, "Normal": 117},
"C": {"Member": 335, "Normal": 18}}
What am I doing wrong, can someone explain me, please ?
import csv
dict_from_csv = {}
file = open('supermarket_sales.csv')
csv_reader = csv.reader(file)
next(csv_reader)
for row in csv_reader:
branch= row[1]
Customer = row[3]
total = float(row[9])
curent_total= dict_from_csv.get(branch)
if dict_from_csv.get(branch) is None:
dict_from_csv[branch]={}
else:
if dict_from_csv[branch].get(Customer) is None:
dict_from_csv[branch]={Customer: 0}
else :
curent_total= dict_from_csv.get(branch)
print(dict_from_csv)
I can return only:
{'A': {'Member': 0},
'C': {'Normal': 0},
'B': {'Normal': 0}}
scv file: https://app.box.com/s/f4hcfkferizntbev3ou8hso6nyfccf74
First mistake:
dict_from_csv[branch]={Customer: 0}
it creates new dictionary with new Customer but it also removes previous dictionary with prediction customer.
It should be
dict_from_csv[branch][Customer] = 0
and it adds new customer to existing dictionary.
Second mistake: you don't have += 1 to count elements
dict_from_csv[branch][customer] += 1
Full working code:
import csv
f = open('supermarket_sales.csv')
csv_reader = csv.reader(f)
next(csv_reader)
dict_from_csv = {}
for row in csv_reader:
branch= row[1]
customer = row[3]
if branch not in dict_from_csv:
dict_from_csv[branch] = {}
if customer not in dict_from_csv[branch]:
dict_from_csv[branch][customer] = 0
dict_from_csv[branch][customer] += 1
print(dict_from_csv)
Result:
{
'A': {'Member': 167, 'Normal': 173},
'C': {'Normal': 159, 'Member': 169},
'B': {'Member': 165, 'Normal': 167}
}
BTW: if you would use pandas.DataFrame then using
import pandas as pd
df = pd.read_csv('/home/furas/supermarket_sales.csv')
sizes = df.groupby(['Branch','Customer type']).size()
print(sizes)
you would get
Branch Customer type
A Member 167
Normal 173
B Member 165
Normal 167
C Member 169
Normal 159
dtype: int64
Bigger problem makes to convert it to expected dictionary.
There is .to_dict() and .to_json() but they don't create expected dictionary.
But this works for me:
result = dict()
for ((brand, customer), number) in sizes.items():
if brand not in result:
result[brand] = dict()
result[brand][customer] = number
print(result)
I am trying to get the column names from a csv file with nearly 4000 rows. There are about 14 columns.
I am trying to get each column and store it into a list and then prompt the user to enter themselves at least 5 columns they want to look at.
The user should then be able to type how many results they want to see (they should be the smallest results from that column).
For example, if they choose clothing_brand, "8", the 8 least expensive brands are displayed.
So far, I have been able to use "with" and get a list that contains each column, but I am having trouble prompting the user to pick at least 5 of those columns.
You can very well use the Python input to get the input from user, if you want to prompt no. of times, use the for loop to get inputs. Check Below code:
def get_user_val(no_of_entries = 5):
print('Enter {} inputs'.format(str(no_of_entries)))
val_list = []
for i in range(no_of_entries):
val_list.append(input('Enter Input {}:'.format(str(i+1))))
return val_list
get_user_val()
I hope I didn't misunderstand what you mean, the code below is what you want?
You can put the data into the dict then sorted it.
Solution1
from io import StringIO
from collections import defaultdict
import csv
import random
import pprint
def random_price():
return random.randint(1, 10000)
def create_test_data(n_row=4000, n_col=14, sep=','):
columns = [chr(65+i) for i in range(n_col)] # A, B ...
title = sep.join(columns)
result_list = [title]
for cur_row in range(n_row):
result_list.append(sep.join([str(random_price()) for _ in range(n_col)]))
return '\n'.join(result_list)
def main():
if 'load CSV':
test_content = create_test_data(n_row=10, n_col=5)
dict_brand = defaultdict(list)
with StringIO(test_content) as f:
rows = csv.reader(f, delimiter=',')
for idx, row in enumerate(rows):
if idx == 0: # title
columns = row
continue
for i, value in enumerate(row):
dict_brand[columns[i]].append(int(value))
pprint.pprint(dict_brand, indent=4, compact=True, width=120)
user_choice = input('input columns (brand)')
number_of_results = 5 # input('...')
watch_columns = user_choice.split(' ') # D E F
for col_name in watch_columns:
cur_brand_list = dict_brand[col_name]
print(sorted(cur_brand_list, reverse=True)[:number_of_results])
# print(f'{col_name} : {sorted(cur_brand_list)}') # ASC
# print(f'{col_name} : {sorted(cur_brand_list, reverse=True)}') # DESC
if __name__ == '__main__':
main()
defaultdict(<class 'list'>,
{ 'A': [9424, 6352, 5854, 5870, 912, 9664, 7280, 8306, 9508, 8230],
'B': [1539, 1559, 4461, 8039, 8541, 4540, 9447, 512, 7480, 5289],
'C': [7701, 6686, 1687, 3134, 5723, 6637, 6073, 1925, 4207, 9640],
'D': [4313, 3812, 157, 6674, 8264, 2636, 765, 2514, 9833, 1810],
'E': [139, 4462, 8005, 8560, 5710, 225, 5288, 6961, 6602, 4609]})
input columns (brand)C D
[9640, 7701, 6686, 6637, 6073]
[9833, 8264, 6674, 4313, 3812]
Solution2: Using Pandas
def pandas_solution(test_content: str, watch_columns= ['C', 'D'], number_of_results=5):
with StringIO(test_content) as f:
df = pd.read_csv(StringIO(f.read()), usecols=watch_columns,
na_filter=False) # it can add performance (ignore na)
dict_result = defaultdict(list)
for col_name in watch_columns:
dict_result[col_name].extend(df[col_name].sort_values(ascending=False).head(number_of_results).to_list())
df = pd.DataFrame.from_dict(dict_result)
print(df)
C D
0 9640 9833
1 7701 8264
2 6686 6674
3 6637 4313
4 6073 3812
Background
I am storing data in dictionaries. The dictionaries can be off different length and in a particular dictionary there could be keys with multiple values. I am trying to spit out the data on a CSV file.
Problem/Solution
Image 1 is how my actual output prints out. Image 2 shows how i would want my output to actually printout. Image 2 is the desired output.
CODE
import csv
from itertools import izip_longest
e = {'Lebron':[25,10],'Ray':[40,15]}
c = {'Nba':5000}
def writeData():
with open('file1.csv', mode='w') as csv_file:
fieldnames = ['Player Name','Points','Assist','Company','Total Employes']
writer = csv.writer(csv_file)
writer.writerow(fieldnames)
for employee, company in izip_longest(e.items(), c.items()):
row = list(employee)
row += list(company) if company is not None else ['', ''] # Write empty fields if no company
writer.writerow(row)
writeData()
I am open to all solutions/suggestions that can help me get my desired output format.
For a much simpler answer, you just need to add one line of code to what you have:
row = [row[0]] + row[1]
so:
for employee, company in izip_longest(e.items(), c.items()):
row = list(employee)
row = [row[0]] + row[1]
row += list(company) if company is not None else ['', ''] # Write empty fields if no company
from collections import defaultdict
values = defaultdict(dict)
values[Name1] = {Points: [], Assist: [], Company: blah, Total_Employees: 123}
for generating the output, traverse through each item in the values to give you names, and populate other values using the key_values in the nested dict.
Again, make sure that there no multiple entries with same name, or choose the one with unique entries in the defaultdict.
Demo for the example-
>>> from collections import defaultdict
>>> import csv
>>> values = defaultdict(dict)
>>> vals = [["Lebron", 25, 10, "Nba", 5000], ["Ray", 40, 15]]
>>> fields = ["Name", "Points", "Assist", "Company", "Total Employes"]
>>> for item in vals:
... if len(item) == len(fields):
... details = dict()
... for j in range(1, len(fields)):
... details[fields[j]] = item[j]
... values[item[0]] = details
... elif len(item) < len(fields):
... details = dict()
... for j in range(1, len(fields)):
... if j+1 <= len(item):
... details[fields[j]] = item[j]
... else:
... details[fields[j]] = ""
... values[item[0]] = details
...
>>> values
defaultdict(<class 'dict'>, {'Lebron': {'Points': 25, 'Assist': 10, 'Company': 'Nba', 'Total Employes': 5000}, 'Ray': {'Points': 40, 'Assist': 15, 'Company': '', 'Total Employes': ''}})
>>> csv_file = open('file1.csv', 'w')
>>> writer = csv.writer(csv_file)
>>> for i in values:
... row = [i]
... for j in values[i]:
... row.append(values[i][j])
... writer.writerow(row)
...
23
13
>>> csv_file.close()
Contents of 'file1.csv':
Lebron,25,10,Nba,5000
Ray,40,15,,