Create a dictionary from DataFrame? - python

I want to create a dictionary from a dataframe in python.
In this dataframe, frame one column contains all the keys and another column contains multiple values of that key.
DATAKEY DATAKEYVALUE
name mayank,deepak,naveen,rajni
empid 1,2,3,4
city delhi,mumbai,pune,noida
I tried this code to first convert it into simple data frame but all the values are not separating row-wise:
columnnames=finaldata['DATAKEY']
collist=list(columnnames)
dfObj = pd.DataFrame(columns=collist)
collen=len(finaldata['DATAKEY'])
for i in range(collen):
colname=collist[i]
keyvalue=finaldata.DATAKEYVALUE[i]
valuelist2=keyvalue.split(",")
dfObj = dfObj.append({colname: valuelist2}, ignore_index=True)

You should modify you title question, it is misleading because pandas dataframes are "kind of" dictionaries in themselves, that is why the first comment you got was relating to the .to_dict() pandas' built-in method.
What you want to do is actually iterate over your pandas dataframe row-wise and for each row generate a dictionary key from the first column, and a dictionary list from the second column.
For that you will have to use:
an empty dictionary: dict()
the method for iterating over dataframe rows: dataframe.iterrows()
a method to split a single string of values separated by a separator as the split() method you suggested: str.split().
With all these tools all you have to do is:
output = dict()
for index, row in finaldata.iterrows():
output[row['DATAKEY']] = row['DATAKEYVALUE'].split(',')
Note that this generates a dictionary whose values are lists of strings. And it will not work if the contents of the 'DATAKEYVALUE' column are not singles strings.
Also note that this may not be the most efficient solution if you have a very large dataframe.

Related

How to convert dataframe columns which contains specific string to each columns to a nested dictionary?

I have a dataframe with only one row which contains columns such as subscription_id_avg, subscription_id_std etc.., around 84 columns. Each column contains a mean and a standard deviation. I want to convert this dataframe to a nested dictionary such as
{"subscription_id" : {"avg": 0.36, "std": 1.5}}
How do I split the column names and convert it into a nested dictionary form?
Here, _std and _avg are in every column name so column_name[:-4] will give the desired property name. For example, subscription_id_avg[:-4] will yeild subscription_id. Using this, we can get the keys for dictionary.
dict = {}
for cl in df.columns:
key_name = cl[:-4]
if dict.get(key_name) is None:
dict[key_name] = {cl[-3:]: df[cl][0]}
else:
dict[key_name][cl[-3:]] = df[cl][0]
Here dict will be the required python dictionary. Hope this will help.
This will work only if _avg and _std are the only suffixes in column names.

How to access values from list of dictionaries in a dataframe?

I have a dataframe that has a column with a list of dictionaries and for each dictionary I want to be able to extract the values and put them in another column as list. Please see the picture below for example which shows only 1 row of the dataframe. so for each title shown on the picture I want to extract the values and put them in a list for all the rows in a dataframe
Use ast.literal_eval to convert the string as a list of dict then extract the `title keys from each records:
import ast
df['activities'].apply(lambda x: [d['title'] for d in ast.literal_eval(x)])

split contents of a column in a python pandas Dataframe and create a new Dataframe with the newly separated list of strings

I have my
my original pandas dataframe as such, where the 'comment' column consist of unseparated lists of strings and another column called 'direction' indicating whether the overall content in 'comment' column suggests positive or negative comments, where 1 represents positive comments and 0 represents negative comments.
Now I wish to create a new Dataframe by separating all the strings under 'comment' by delimiter '' and assign the each new list of strings as a seperate row with their original 'direction' respectively. So it would looks something like this new dataframe.
I wonder how should I achieve so?
Try:
df.comments = df.comments.str.split('<END>')
df = df.explode('comment')

Select a subset of an object type cell in panda Dataframe

I try to select a subset of the object type column cells with str.split(pat="'")
dataset['pictures'].str.split(pat=",")
I want to get the values of the numbers 40092 and 39097 and the two dates of the pictures as two columns ID and DATE but as result I get one column consisting of NaNs.
'pictures' column:
{"col1":"40092","picture_date":"2017-11-06"}
{"col1":"39097","picture_date":"2017-10-31"}
...
Here's what I understood from your question:
You have a pandas Dataframe with one of the columns containing json strings (or any other string that need to be parsed into multiple columns)
E.g.
df = pd.DataFrame({'pictures': [
'{"col1":"40092","picture_date":"2017-11-06"}',
'{"col1":"39097","picture_date":"2017-10-31"}']
})
You want to parse the two elements ('col1' and 'picture_date') into two separate columns for further processing (or perhaps just one of them)
Define a function for parsing the row:
import json
def parse_row(r):
j=json.loads(r['pictures'])
return j['col1'],j['picture_date']
And use Pandas DataFrame.apply() method as follows
df1=df.apply(parse_row, axis=1,result_type='expand')
The result is a new dataframe with two columns - each containing the parsed data:
0 1
0 40092 2017-11-06
1 39097 2017-10-31
If you need just one column you can return a single element from parse_row (instead of a two element tuple in the above example) and just use df.apply(parse_row).
If the values are not in json format, just modify parse_row accordingly (Split, convert string to numbers, etc.)
Thanks for the replies but I solved it by loading the 'pictures' column from the dataset into a list:
picturelist= dataset['pictures'].values.tolist()
And afterwards creating a dataframe of the list made from the column pictures and concat it with the original dataset without the picture column
two_new_columns = pd.Dataframe(picturelist)
new_dataset = pd.concat(dataset, two_new_columns)

Converting large string of dictionaries to dictionary

I have a big dataframe consisting of 144005 rows. One of the columns of the dataframe is a string of dictionaries like
'{"Step ID":"78495","Choice Number":"0","Campaign Run ID":"23199"},
{"Step ID":"78495","Choice Number":"0","Campaign Run ID":"23199"},
{"Step ID":"78495","Choice Number":"0","Campaign Run ID":"23199"}'
I want to convert this string to seperate dictionaries. I have been using json.loads() for this purpose, however, I have had to iterate over this string of dictionary one at a time, convert it to a dictionary using json.loads(), then convert this to a new dataframe and keep appending to this dataframe while I iterate over the entire original dataframe.
I wanted to know whether there was a more efficient way to do this as it takes a long time to iterate over an entire dataframe of 144005 rows.
Here is a snippet of what I have been doing:
d1 = df1['attributes'].values
d2 = df1['ID'].values
for i,j in zip(d1,d2):
data = json.loads(i)
temp = pd.DataFrame(data, index = [j])
temp['ID'] = j
df2 = df2.append(temp, sort=False)
My 'attributes' column consist of a string of dictionary as a row, and the 'Id' column contains it's corresponding Id
Did it myself.
I used map along with lambda functions to efficiently apply json.loads() on each row, then I converted this data to a dataframe and stored the output.
Here it is.
l1 = df1['attributes'].values
data = map(lambda x: json.loads(x), l1)
df2 = pd.DataFrame(data)
Just check the type of your column by using type()
If the type is Series:
data['your column name'].apply(pd.Series)
then you will see all keys as separate column in a dataframe with their key values.

Categories