Ok, So I have a dictionary with values which are contained as lists, which is what I've been looking for... I was wondering if there was a way to sort this which will also display the values of the lists in separate columns... I have used this code to split the values into different columns:
writer = csv.writer(csvfile, delimiter=',')
for key, value in finaldict.iteritems():
writer.writerow([key] + value)
Is there any way to make this sort the keys before writing it? Anything I seem to try either doesn't sort it, generates an error, or breaks the part where it changes the list into separate columns... So if you don't understand what I am saying for example lets say I have a dictionary
finaldict = {'A': [2 , 1], 'C' [3, 3], 'B' [4, 3]}
I'm looking for this in the excel file:
Parameter Val1 Val2
A 2 1
B 4 3
C 3 3
But currently I get this:
Parameter Val1 Val2
A 2 1
C 3 3
B 4 3
I'm grateful for the replies guys thank you!
An alternative to the other answer (a bit cleaner, imho):
for key in sorted(finaldict.keys()):
writer.writerow([key] + finaldict[key])
Use sorted to return a list of tuples containing the key and value, then write that instead:
for key, value in sorted(finaldict.iteritems(), key=lambda: L[0]):
writer.writerow([key] + value)
Related
Suppose I have List of dictionaries as
l = [{'car':'good'},
{'mileage':'high'},
{'interior':'stylish'},
{'car':'bad'},
{'engine':'powerful'},
{'safety':'low'}]
Basically these are noun-adjective pairs.
How can I visualize whats the most associated list of adjective to
lets say car here.
How to convert this to Data frame? , I have tried pd.Dataframe(l)
, but here the key is not the column name so gets little bit tricky
here.
Any help would be appreciated.
Given that you want this to be done column-wise, then you have to re-structure your list of dictionaries. You need to have one dictionary to represent one row. Therefore, your example list should be (I added a second row for better explainability):
l = [
{'car':'good','mileage':'high','interior':'stylish','car':'bad','engine':'powerful','safety':'low'}, # row 1
{'car':'bad','mileage':'low','interior':'old','car':'bad','engine':'powerful','safety':'low'} # row 2
]
At this point, all you have to do is call pd.DataFrame(l).
EDIT: Based on your comments, I think you need to convert the dictionary to a list to get your desired result. Here is a quick way (I'm sure it can be much more efficient):
l = [{'car':'good'},
{'mileage':'high'},
{'interior':'stylish'},
{'car':'bad'},
{'engine':'powerful'},
{'safety':'low'}]
new_list = []
for item in l:
for key, value in item.items():
temp = [key,value]
new_list.append(temp)
df = pd.DataFrame(new_list, columns=['Noun', 'Adjective'])
You can construct your DataFrame by giving a list of tuples. To get tuples from a dict use the method items(). Construct the list of tuples with a list comprehension by taking the first tuple of each items.
import pandas as pd
df=pd.DataFrame(data=[d.items()[0] for d in l],columns=['A','B'])
print df
Gives :
A B
0 car good
1 mileage high
2 interior stylish
3 car bad
4 engine powerful
5 safety low
I have pandas column with multiple string values in it, I want to convert them into one list so that I can take count of it
df.columnX
Row 1 ['A','B','A','C']
Row 2 ['A','C']
Row 3 ['D','A']
I want output like
Tag Count
A 4
B 1
C 2
D 1
I am trying to pull them to list but double quote is coming
df.columnX.values = ["'A','B',,,,,,,,,'A'"]
Thanks in advance
What about this ?
df.explode('columnX').columnX.value_counts().to_frame()
Note that you need pandas > 0.25.0 for explode to work.
If your lists are in fact strings, you can first convert them to lists (as suggested by #Jon Clements) :
import ast
df.columnX = df.columnX.map(ast.literal_eval)
I got it
flatList = [item for sublist in list(df.ColumnX.map(ast.literal_eval)) for item in sublist]
dict((x,flatList.count(x)) for x in set(flatList))
I have a dictionary with 2 keys for every 1 value like so:
Initial Dict
Key : ('106338', '2006-12-27') , Value : []
Dict after populating
Key : ('106338', '2006-12-27') , Value : [8, 7, 9, 8, 7]
The value for each key pair is an array holding some amount of information which I need the length of. I created this dictionary by first itertupling across a dataframe and generating key pairs and empty arrays for each unique record. I then iterated across it again and populated the arrays with the information I need by appending values to the end of each key pair. Key pairs were generated from row values. The first item in the key is the Identification number for the asset and the second item is the date for the asset. Here is code for dict creation:
perm_dict = {}
for row in df_perm.itertuples():
perm_dict[str(row[1]),str(row[3])] = []
for row in df_perm.itertuples():
if row[6].to_datetime().date() < row[9].to_datetime().date() and row[9].to_datetime().date() < row[5].to_datetime().date():
perm_dict[str(row[1]), str(row[3])].append(row[10])
My problem is that I now need to call those values back via the key pairs by iterating through the original dataframe so I can take my array lengths and make a new column out of them. Screenshot of DataFrame:
I am having trouble working out a solution in my head for applying these counts back to the original dataframe as a new column for only the rows with key matches. I can't iterate back through to add them because then I'd be modifying my original DF and I've read that's a big no-no. Any help that you all may be able to provide would be greatly appreciated! Also please lmk if I need to include more information as I can provide more.
Edit1
Here are the outputs after running the dictionary comprehension code provided.
This might be what you are looking for.
import pandas as pd
# sample data
d = {('106338', '2006-12-27'): [8, 7, 9, 8, 7]}
df = pd.DataFrame([['106338', '2006-12-27']], columns=['Key1', 'Key2'])
# first make dictionary mapping to length of list
d_len = {k: len(v) for k, v in d.items()}
# perform mapping
df['Len'] = list(map(d_len.get, (zip(*(df[col] for col in ('Key1', 'Key2'))))))
# output
# Key1 Key2 Len
# 106338 2006-12-27 5
I am trying to figure out the best way of creating tuples with the format:
(x:y) from 2 columns in a dataframe and then use column a of the dataframe as the key of the tuple
key data_1 data_2
0 14303 24.75 25.03
1 12009 25.00 25.07
2 14303 24.99 25.15
3 12009 24.62 24.77
The resulting dictionary
{14303 24.38:24.61 24:99:25:15
12009 24.62:24.77 25.00:25.07 }
I have tried to use iterrows and enumerate but was wondering if there is a more efficient way to achieve it
I think you wanted to append the (data_1, data2) tuple as a value for the given key. This solution uses iterrows(), which I acknowledge you said you already use. If this is not what you are looking for please post your code and exactly the output you want. I don't know if there is a native method in pandas to do this.
# df is the dataframe
from collections import defaultdict
sample_dict = defaultdict(list)
for line in df.iterrows():
k = line[1][0] # key
d_tuple = (line[1][1], line[1][2]) # (data_1, data_2)
sample_dict[k].append(d_tuple)
sample_list is therefore:
defaultdict(list,
{12009.0: [(25.0, 25.07), (24.620000000000001, 24.77)],
14303.0: [(24.75, 25.030000000000001),
(24.989999999999998, 25.149999999999999)]})
sample_list[12009] is therefore:
[(25.0, 25.07), (24.620000000000001, 24.77)]
Update:
You might take a look at this thread too:
https://stackoverflow.com/a/24368660/4938264
I am using a dictionary comprension to be created from an excel spreadsheet. The first column of the excel sheet are the keys and the next 3 columns are the values. I'd like to build a dictionary comprehension that I can use later in my script. I understand a dictionary comprehension to be built as:
d = {key: value for (key, value) in sequence}
and I can do this and get a nice key,value dictionary:
d = {str(row.getValue("Column1")): str(row.getValue("Column2")) for i in arcpy.SearchCursor(xls,"[Column1] = 'Lake_Huron'")}
I'm just not sure how I would go about adding the other 2 columns in the dictionary comprehension as 2nd and 3rd values to the key? Is this possible?
d = {str(row.getValue("Column1")): (str(row.getValue("Column2")), str(row.getValue("Column3")), str(row.getValue("Column4"))) for i in arcpy.SearchCursor(xls,"[Column1] = 'Lake_Huron'")}
OR
d = {str(row.getValue("Column1")): (str(row.getValue("Column{0}".format(i)) for i in [2, 3, 4]) for i in arcpy.SearchCursor(xls,"[Column1] = 'Lake_Huron'")}