Store list of dictionaries as a DataFrame - python

Suppose I have List of dictionaries as
l = [{'car':'good'},
{'mileage':'high'},
{'interior':'stylish'},
{'car':'bad'},
{'engine':'powerful'},
{'safety':'low'}]
Basically these are noun-adjective pairs.
How can I visualize whats the most associated list of adjective to
lets say car here.
How to convert this to Data frame? , I have tried pd.Dataframe(l)
, but here the key is not the column name so gets little bit tricky
here.
Any help would be appreciated.

Given that you want this to be done column-wise, then you have to re-structure your list of dictionaries. You need to have one dictionary to represent one row. Therefore, your example list should be (I added a second row for better explainability):
l = [
{'car':'good','mileage':'high','interior':'stylish','car':'bad','engine':'powerful','safety':'low'}, # row 1
{'car':'bad','mileage':'low','interior':'old','car':'bad','engine':'powerful','safety':'low'} # row 2
]
At this point, all you have to do is call pd.DataFrame(l).
EDIT: Based on your comments, I think you need to convert the dictionary to a list to get your desired result. Here is a quick way (I'm sure it can be much more efficient):
l = [{'car':'good'},
{'mileage':'high'},
{'interior':'stylish'},
{'car':'bad'},
{'engine':'powerful'},
{'safety':'low'}]
new_list = []
for item in l:
for key, value in item.items():
temp = [key,value]
new_list.append(temp)
df = pd.DataFrame(new_list, columns=['Noun', 'Adjective'])

You can construct your DataFrame by giving a list of tuples. To get tuples from a dict use the method items(). Construct the list of tuples with a list comprehension by taking the first tuple of each items.
import pandas as pd
df=pd.DataFrame(data=[d.items()[0] for d in l],columns=['A','B'])
print df
Gives :
A B
0 car good
1 mileage high
2 interior stylish
3 car bad
4 engine powerful
5 safety low

Related

match the column value based on previous paired values in python

i have one dictionary with pair of keys and values and list
dict={0.000806:1.341382,0.023886:39.63012,7.525935:63.89669,7.571048:62.47208}
list=[7.525935,7.571048,0.000806,0.023886]
with this following list and dictionaries my expected output is
expected output:
{7.525937:63.89669,
7.571048:62.47208,
0.000806:1.341382,
0.023886:0.023886}
thing_category=dict((t,c) for c,t in category_thing.items())
list=[7.525935,7.571048,0.000806,0.023886]
for stuff in list_of_things:
if stuff in category_thing:
print(stuff)
with the help of list values match the dictionary values how can i do that i tried merge and map also but it didn't worked
with help of this 2 columns i have to match the values with anotherexpecting matched values with help of values in first figure
You can use a comprehension to construct a new dict with keys in the order of the list_ entries:
dict_ = {0.000806:1.341382,0.023886:39.63012,7.525935:63.89669,7.571048:62.47208}
list_ = [7.525935,7.571048,0.000806,0.023886]
{k: dict_[k] for k in list_}
However, you might run into problems with float keys if there are slight numeric differences.

How to drop element from a list inside a pandas column in Python?

I have a column in a dataframe that contain a list inside. My dataframe column is:
[],
['NORM'],
['NORM'],
['NORM'],
['NORM'],
['MI', 'STTC'],
As you can see I have an empty list and also a list with two elements. How can I change list with two elements to just take one of it (I don't care which one of it).
I tried with df.column.explode()but this just add more rows and I don't want more rows, I just need to take one of it.
Thank you so much
You can use Series.map with a custom mapping function which maps the elements of column according to desired requirements:
df['col'] = df['col'].map(lambda l: l[:1])
Result:
# print(df['col'])
0 []
1 [NORM]
2 [NORM]
3 [NORM]
4 [NORM]
5 [MI]
i, j is the location of the cell you need to access and this will give the first element of the list
list_ = df.loc[i][j]
if len(list_) > 0:
print(list_[0])
As you store lists into a pandas column, I assume that you do not worry for vectorization. So you could just use a list comprehension:
df[col] = [i[:1] for i in df[col]]

column of list values to one flat list in Python

I have pandas column with multiple string values in it, I want to convert them into one list so that I can take count of it
df.columnX
Row 1 ['A','B','A','C']
Row 2 ['A','C']
Row 3 ['D','A']
I want output like
Tag Count
A 4
B 1
C 2
D 1
I am trying to pull them to list but double quote is coming
df.columnX.values = ["'A','B',,,,,,,,,'A'"]
Thanks in advance
What about this ?
df.explode('columnX').columnX.value_counts().to_frame()
Note that you need pandas > 0.25.0 for explode to work.
If your lists are in fact strings, you can first convert them to lists (as suggested by #Jon Clements) :
import ast
df.columnX = df.columnX.map(ast.literal_eval)
I got it
flatList = [item for sublist in list(df.ColumnX.map(ast.literal_eval)) for item in sublist]
dict((x,flatList.count(x)) for x in set(flatList))

pandas how to convert a dataframe into a dictionary of tuples tuple using 1 col as key and the rest as a tuple of form (col2:col3)

I am trying to figure out the best way of creating tuples with the format:
(x:y) from 2 columns in a dataframe and then use column a of the dataframe as the key of the tuple
key data_1 data_2
0 14303 24.75 25.03
1 12009 25.00 25.07
2 14303 24.99 25.15
3 12009 24.62 24.77
The resulting dictionary
{14303 24.38:24.61 24:99:25:15
12009 24.62:24.77 25.00:25.07 }
I have tried to use iterrows and enumerate but was wondering if there is a more efficient way to achieve it
I think you wanted to append the (data_1, data2) tuple as a value for the given key. This solution uses iterrows(), which I acknowledge you said you already use. If this is not what you are looking for please post your code and exactly the output you want. I don't know if there is a native method in pandas to do this.
# df is the dataframe
from collections import defaultdict
sample_dict = defaultdict(list)
for line in df.iterrows():
k = line[1][0] # key
d_tuple = (line[1][1], line[1][2]) # (data_1, data_2)
sample_dict[k].append(d_tuple)
sample_list is therefore:
defaultdict(list,
{12009.0: [(25.0, 25.07), (24.620000000000001, 24.77)],
14303.0: [(24.75, 25.030000000000001),
(24.989999999999998, 25.149999999999999)]})
sample_list[12009] is therefore:
[(25.0, 25.07), (24.620000000000001, 24.77)]
Update:
You might take a look at this thread too:
https://stackoverflow.com/a/24368660/4938264

extract information from excel into python 2d array

I have an excel sheet with dates, time, and temp that look like this:
using python, I want to extract this info into python arrays.
The array would get the date in position 0, and then store the temps in the following positions and look like this:
temparray[0] = [20130102,34.75,34.66,34.6,34.6,....,34.86]
temparray[1] = [20130103,34.65,34.65,34.73,34.81,....,34.64]
here is my attempt, but it sucks:
from xlrd import *
print open_workbook('temp.xlsx')
wb = open_workbook('temp.xlsx')
for s in wb.sheets():
for row in range(s.nrows):
values = []
for col in range(s.ncols):
values.append(s.cell(row,col).value)
print(values[0])
print("%.2f" % values[1])
print'''
i used xlrd, but I am open to using anything. Thank you for your help.
From what I understand of your question, the problem is that you want the output to be a list of lists, and you're not getting such a thing.
And that's because there's nothing in your code that even tries to get such a thing. For each row, you build a list, print out the first value of that list, print out the second value of that list, and then forget the list.
To append each of those row lists to a big list of lists, all you have to do is exactly the same thing you're doing to append each column value to the row lists:
temparray = []
for row in range(s.nrows):
values = []
for col in range(s.ncols):
values.append(s.cell(row,col).value)
temparray.append(values)
From your comment, it looks like what you actually want is not only this, but also grouping the temperatures together by day, and also only adding the second column, rather than all of the values, for each day. Which is not at all what you described in the question. In that case, you shouldn't be looping over the columns at all. What you want is something like this:
days = []
current_day, current_date = [], None
for row in range(s.nrows):
date = s.cell(row, 0)
if date != current_date:
current_day, current_date = [], date
days.append(current_day)
current_day.append(s.cell(row, 2))
This code assumes that the dates are always in sorted order, as they are in your input screenshot.
I would probably structure this differently, building a row iterator to pass to itertools.groupby, but I wanted to keep this as novice-friendly, and as close to your original code, as possible.
Also, I suspect you really don't want this:
[[date1, temp1a, temp1b, temp1c],
[date2, temp2a, temp2b]]
… but rather something like this:
{date1: [temp1a, temp1b, temp1c],
date2: [temp1a, temp1b, temp1c]}
But without knowing what you're intending to do with this info, I can't tell you how best to store it.
If you are looking to keep all the data for the same dates, I might suggest using a dictionary to get a list of the temps for particular dates. Then once you get the dict initialized with your data, you can rearrange how you like. Try something like this after wb=open_workbook('temp.xlsx'):
tmpDict = {}
for s in wb.sheets():
for row in xrange(s.nrows):
try:
tmpDict[s.cell(row, 0)].append(s.cell(row, 2).value)
except KeyError:
tmpDict[s.cell(row, 0)] = [s.cell(row,2).value]
If you print tmpDict, you should get an output like:
{date1: [temp1, temp2, temp3, ...],
date2: [temp1, temp2, temp3, ...]
...}
Dictionary keys are kept in an arbitrary order (it has to do with the hash value of the key) but you can construct a list of lists based on the content of the dict like so:
tmpList = []
for key in sorted(tmpDict.keys):
valList = [key]
valList.extend(tmpDict[key])
tmpList.append(valList)
Then, you'll get a list of lists ordered by date with the vals, as you were originally working. However, you can always get to the values in the dictionary by using the keys. I typically find it easier to work with the data afterwards but you can change it to any form you need.

Categories