csv row import into python array - python

I have csv file in the following format
a b c d
1 12.0 3.5 4.3 5.9
2 13.0 5.7 2.8 5.2
3 14.0 6.4 9.7 2.3
4 15.0 6.8 4.7 3.4
I want to export rows into a python array of arrays. Here is the pseudocode:
a = read csv
b[][] = a float 2d array that is 1x4
import rows into b
the output of b should be:
[[12.0,3.5,4.3,5.9],[13.0,5.7,2.8,5.2],[14.0,6.4,9.7,2.3],[15.0,6.8,4.7,3.4]]
how would I do this? Please let me know if you need any other clarification. Thank you.
Problems:
all rows are NOT of same size. some rows have 10 elements and others may have 7 or 8 or 9.
This is what I have:
import csv
def main():
a = range(4)
x = 0
with open('test.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in spamreader:
a[x] = row
x += 1
print a
Output:
[['13,4.2,2.4,5,6.4'], ['14,3.2,3.4,5.6,7.2'], ['15,8.5,3.7,8.5,0.75'], ['16,5.4,8.3,3.5,5.4']]
How do I make the arrays turn from string into floats?

Using module csv.DictReader to skip empty lines and get a list of dictionaries:
In [131]: import csv
...: with open('a.csv') as f:
...: lst=list(csv.DictReader(f))
In [132]: lst
Out[132]:
[{'a': '12.0', 'b': '3.5', 'c': '4.3', 'd': '5.9'},
{'a': '13.0', 'b': '5.7', 'c': '2.8', 'd': '5.2'},
{'a': '14.0', 'b': '6.4', 'c': '9.7', 'd': '2.3'},
{'a': '15.0', 'b': '6.8', 'c': '4.7', 'd': '3.4'}]
In [134]: [{k:float(d[k]) for k in d} for d in lst] #convert values to floats
Out[134]:
[{'a': 12.0, 'b': 3.5, 'c': 4.3, 'd': 5.9},
{'a': 13.0, 'b': 5.7, 'c': 2.8, 'd': 5.2},
{'a': 14.0, 'b': 6.4, 'c': 9.7, 'd': 2.3},
{'a': 15.0, 'b': 6.8, 'c': 4.7, 'd': 3.4}]
EDIT:
to get a list of list:
In [143]: with open('a.csv') as f:
...: cr=csv.reader(f)
...: skip=next(cr) #skip the first row of keys "a,b,c,d"
...: print [map(float, l) for l in cr]
...:
[[12.0, 3.5, 4.3, 5.9], [13.0, 5.7, 2.8, 5.2], [14.0, 6.4, 9.7, 2.3], [15.0, 6.8, 4.7, 3.4]]

Related

How to make multiindex dataframe from a nested dictionary keys and lists of values?

I have checked the advicse here: Nested dictionary to multiindex dataframe where dictionary keys are column labels
However, I couldn't get it to work in my problem.
I would like to change a dictionary into multiindexed dataframe, where 'a','b','c' are names of multiindexes, their values 12,0.8,1.8,bla1,bla2,bla3,bla4 are multiindexes and values from lists are assign to the multiindexes as in the picture of table below.
My dictionary:
dictionary ={
"{'a': 12.0, 'b': 0.8, 'c': ' bla1'}": [200, 0.0, '0.0'],
"{'a': 12.0, 'b': 0.8, 'c': ' bla2'}": [37, 44, '0.6'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla3'}": [100, 2.0, '1.0'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla4'}": [400, 3.0, '1.0']
}
The result DataFrame I would like to get:
The code which don't make multiindexes and set every values under each other in next row:
df_a = pd.DataFrame.from_dict(dictionary, orient="index").stack().to_frame()
df_b = pd.DataFrame(df_a[0].values.tolist(), index=df_a.index)
Use ast.literal_eval to convert each string into a dictionary and build the index from there:
import pandas as pd
from ast import literal_eval
dictionary ={
"{'a': 12.0, 'b': 0.8, 'c': ' bla1'}": [200, 0.0, '0.0'],
"{'a': 12.0, 'b': 0.8, 'c': ' bla2'}": [37, 44, '0.6'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla3'}": [100, 2.0, '1.0'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla4'}": [400, 3.0, '1.0']
}
keys, data = zip(*dictionary.items())
index = pd.MultiIndex.from_frame(pd.DataFrame([literal_eval(i) for i in keys]))
res = pd.DataFrame(data=list(data), index=index)
print(res)
Output
0 1 2
a b c
12.0 0.8 bla1 200 0.0 0.0
bla2 37 44.0 0.6
1.8 bla3 100 2.0 1.0
bla4 400 3.0 1.0

Add nested dictionaries on matching keys

I have a nested dictionary, such as:
{'A1': {'T1': [1, 3.0, 3, 4.0], 'T2': [2, 2.0]}, 'A2': {'T1': [1, 0.0, 3, 5.0], 'T2': [2, 3.0]}}
What I want to do is sum each sub dictionary, to obtain this:
A1 A2 A1 A2
T1+T1 T2+T2 (ignore the first entry of the list)
[3.0, 5.0, 9.0] <<<< output
1 2 3
res 3.0 + 0.0 = 3.0 and 2.0 + 3.0 = 5.0 and 5.0 + 4.0 = 9.0
How can I do this? I've tried a for, but I've created a big mess
One way is to use collections.Counter in a list comprehension, and sum the resulting Counter objects:
from collections import Counter
d = {'A1': {'T1': 3.0, 'T2': 2.0}, 'A2': {'T1': 0.0, 'T2': 3.0}}
l = (Counter(i) for i in d.values())
sum(l, Counter())
# Counter({'T1': 3.0, 'T2': 5.0})
For sum to work here, I've defined an empty Counter() as the start argument, so sum expects other Counter objects.
To get only the values, you can do:
sum(l, Counter()).values()
# dict_values([3.0, 5.0])
you could use a list comprehension with zip:
d = {'A1': {'T1': 3.0, 'T2': 2.0}, 'A2': {'T1': 0.0, 'T2': 3.0}}
[sum(e) for e in zip(*(e.values() for e in d.values()))]
output:
[3.0, 5.0]
this will work if your python version is >= 3.6
also, you can use 2 for loops:
r = {}
for dv in d.values():
for k, v in dv.items():
r.setdefault(k, []).append(v)
result = [sum(v) for v in r.values()]
print(result)
output:
[3.0, 5.0]
after your edit
you could use:
from itertools import zip_longest
sum_t1, sum_t2 = list(list(map(sum, zip(*t))) for t in zip(*[e.values() for e in d.values()]))
[i for t in zip_longest(sum_t1[1:], sum_t2[1:]) for i in t if i is not None]
output:
[3.0, 5.0, 6, 9.0]

Pandas Data frame to desired python dictionary

I have a data frame which looks like following
Date Top
A B
2018-09-30 1.2 2.3
2018-10-01 1.5 1.7
2018-10-02 2.3 2.8
2018-10-03 7.7 7.5
2018-10-04 1.1 0.9
2018-10-05 2.1 6.5
So I have multi-index in the columns, only two columns 'Date' and 'Top' and then 'Top' has two level 1 columns 'A' and 'B'.
I am trying to convert them into python dictionary.
when I am using
df_dict = df.to_dict(orient = 'index')
I get an output
{0: {('Top', 'A'): 1.2, ('Top', 'B'): 2.3, ('date', ''): '2018-09-30'},
1: {('Top', 'A'): 1.5, ('Top', 'B'): 1.7, ('date', ''): '2018-10-01'},
2: {('Top', 'A'): 2.3, ('Top', 'B'): 2.8, ('date', ''): '2018-10-02'},
3: {('Top', 'A'): 7.7, ('Top', 'B'): 7.5, ('date', ''): '2018-10-03'},
4: {('Top', 'A'): 1.1, ('Top', 'B'): 0.9, ('date', ''): '2018-10-04'},
5: {('Top', 'A'): 2.1, ('Top', 'B'): 6.5, ('date', ''): '2018-10-05'}}
Now I can access df_dict with following script which give me an output of 1.2
df_dict[1]['Top']['Top','A']
But I am looking for output with this script
df_dict[1]['Top']
Output: A:1.2, B:2.3
since 'Top' is not a key inside the first [1] key-value pair. So that I can access all 'Top' easily for a date.
Thanks for all the help
You can use dict comprehension with filtering by first level Top:
df_dict = df.to_dict(orient = 'index')
out = {k2: v for (k1, k2), v in df_dict[0].items() if k1 == 'Top'}
print (out)
{'A': 1.2, 'B': 2.3}
Simplier is use pandas for select by index value and first level of MultiIndex and then create dict:
print (df.loc[0, 'Top'])
A 1.2
B 2.3
Name: 0, dtype: object
out = df.loc[0, 'Top'].to_dict()
print (out)
{'A': 1.2, 'B': 2.3}
EDIT:
print (df)
A B
2018-09-30 1.2 2.3
2018-10-01 1.5 1.7
2018-10-02 2.3 2.8
2018-10-03 7.7 7.5
2018-10-04 1.1 0.9
2018-10-05 2.1 6.5
df.index.name = 'date'
df = df.reset_index()
#set MultiIndex for each columns for avoid empty strings keys
df.columns = [['d','Top', 'Top'], df.columns]
#for each first level of MultiIndex create dictionary
#also add new level to outer level of dict
out = {x:df[x].to_dict(orient = 'index') for x in df.columns.levels[0]}
print (out)
{'Top': {0: {'A': 1.2, 'B': 2.3}, 1: {'A': 1.5, 'B': 1.7}, 2: {'A': 2.3, 'B': 2.8},
3: {'A': 7.7, 'B': 7.5}, 4: {'A': 1.1, 'B': 0.9}, 5: {'A': 2.1, 'B': 6.5}},
'd': {0: {'date': '2018-09-30'}, 1: {'date': '2018-10-01'},
2: {'date': '2018-10-02'}, 3: {'date': '2018-10-03'},
4: {'date': '2018-10-04'}, 5: {'date': '2018-10-05'}}}
print (out['Top'][0])
{'A': 1.2, 'B': 2.3}

pandas - create key value pair from grouped by data frame

I have a data frame with three columns, I would like to create a dictionary after applying groupby function on first and second column.I can do this by for loops, but is there any pandas way of doing it?
DataFrame:
Col X Col Y Sum
A a 3
A b 2
A c 1
B p 5
B q 6
B r 7
After grouping by on Col X and Col Y : df.groupby(['Col X','Col Y']).sum()
Sum
Col X Col Y
A a 3
b 2
c 1
B p 5
q 6
r 7
Dictionary I want to create
{A:{'a':3,'b':2,'c':1}, B:{'p':5,'q':6,'r':7}}
Use a dictionary comprehension while iterating via a groupby object
{name: dict(zip(g['Col Y'], g['Sum'])) for name, g in df.groupby('Col X')}
{'A': {'a': 3, 'b': 2, 'c': 1}, 'B': {'p': 5, 'q': 6, 'r': 7}}
If you insisted on using to_dict somewhere, you could do something like this:
s = df.set_index(['Col X', 'Col Y']).Sum
{k: s.xs(k).to_dict() for k in s.index.levels[0]}
{'A': {'a': 3, 'b': 2, 'c': 1}, 'B': {'p': 5, 'q': 6, 'r': 7}}
Keep in mind, that the to_dict method is just using some comprehension under the hood. If you have a special use case that requires something more than what the orient options provide for... there is no shame in constructing your own comprehension.
You can iterate over the MultiIndex series:
>>> s = df.set_index(['ColX', 'ColY'])['Sum']
>>> {k: v.reset_index(level=0, drop=True).to_dict() for k, v in s.groupby(level=0)}
{'A': {'a': 3, 'b': 2, 'c': 1}, 'B': {'p': 5, 'q': 6, 'r': 7}}
#A to_dict() solution
d = df.groupby(['Col X','Col Y']).sum().reset_index().pivot(columns='Col X',values='Sum').to_dict()
Out[70]:
{'A': {0: 3.0, 1: 2.0, 2: 1.0, 3: nan, 4: nan, 5: nan},
'B': {0: nan, 1: nan, 2: nan, 3: 5.0, 4: 6.0, 5: 7.0}}
#if you need to get rid of the nans:
{k1:{k2:v2 for k2,v2 in v1.items() if pd.notnull(v2)} for k1,v1 in d.items()}
Out[73]: {'A': {0: 3.0, 1: 2.0, 2: 1.0}, 'B': {3: 5.0, 4: 6.0, 5: 7.0}}

Converting dataframe into sub-list or dictionaries

I have the data in tabular format (rows and columns) which I read into a dataframe (Data1) :
Name D Score
0 Angelica D1 3.5
1 Angelica D2 2.0
2 Bill D1 2.0
3 Chan D3 1.0
......
I am able to convert it into a list using:
Data2 = Data1.values.tolist()
and get the below output:
[
['Angelica', 'D1', 3.5], ['Angelica', 'D2', 2.0],
['Bill', 'D1', 2.0], ['Bill', 'D2', 3.5],
['Chan', 'D8', 1.0], ['Chan', 'D3', 3.0], ['Chan', 'D4', 5.0],
['Dan', 'D4', 3.0], ['Dan', 'D5', 4.5], ['Dan', 'D6', 4.0]
]
What I want is, the output to be like this:
{
'Angelica': {'D1': 3.5, 'D2': 2.0} ,
'Bill': {'D1': 2.0, 'D2': 3.5}
'Chan': {'D8': 1.0, 'D3': 3.0, 'D4': 5.0 }
'Dan': {'D4': 3.0, 'D5': 4.5, 'D6': 4.0}
}
How can I achieve this in Python?
You can use a dictionary comprehension after grouping the df by the Name column:
>>> df = pd.DataFrame([{'Name': 'Angela', 'Score': 3.5, 'D': 'D1'}, {'Name': 'Angela', 'Score': 2.0, 'D': 'D2'}, {'Name': 'Bill', 'Score': 2.0, 'D': 'D1'}, {'Name': 'Chan', 'Score': 1.0, 'D': 'D3'}])
>>> df
D Name Score
0 D1 Angela 3.5
1 D2 Angela 2.0
2 D1 Bill 2.0
3 D3 Chan 1.0
>>> data2 = {name: {df.ix[v].D: df.ix[v].Score for v in val} for name, val in df.groupby('Name').groups.items()}
>>> data2
{'Chan': {'D3': 1.0}, 'Angela': {'D1': 3.5, 'D2': 2.0}, 'Bill': {'D1': 2.0}}
You can zip up the values from each group after grouping by Name:
In [4]: l = [
...: ['Angelica', 'D1', 3.5], ['Angelica', 'D2', 2.0],
...: ['Bill', 'D1', 2.0], ['Bill', 'D2', 3.5],
...: ['Chan', 'D8', 1.0], ['Chan', 'D3', 3.0], ['Chan', 'D4', 5.0],
...: ['Dan', 'D4', 3.0], ['Dan', 'D5', 4.5], ['Dan', 'D6', 4.0]
...: ]
...: columns=["Name", "D", "Score"]
...: df = pd.DataFrame(l, columns=columns)
...:
In [5]: {name: dict(zip(v["D"], v["Score"])) for name, v in df.groupby("Name")}
In [6]: data
Out[6]:
{'Angelica': {'D1': 3.5, 'D2': 2.0},
'Bill': {'D1': 2.0, 'D2': 3.5},
'Chan': {'D3': 3.0, 'D4': 5.0, 'D8': 1.0},
'Dan': {'D4': 3.0, 'D5': 4.5, 'D6': 4.0}}
from collections import defaultdict
result = defaultdict(dict)
for item in Data2:
result[item[0]].update(dict([item[1:]]))

Categories