Turn a dictionary of dictionaries into a dataframe

Turn a dictionary of dictionaries into a dataframe - python

So let's say I have a dictionary that looks like this:
{Row1 : {Col1: Data, Col2: Data ..} , Row2: {Col1: Data,...},...}
Ex dictionary:
{'1': {'0': '1', '1': '2', '2': '3'}, '2': {'0': '4', '1': '5', '2': '6'}}
I was checking out pandas from_dict method with orient='index', but it's not quite what I need.
This is what I have that works:
df_pasted_data = pd.DataFrame()
for v in dictionary.values():
# need to set the index to 0 since im passing in a basic dictionary that looks like: col1: data, col2: data
# otherwise it will throw an error saying ValueError: If using all scalar values, you must pass an index
temp = pd.DataFrame(v, index=[0])
# append doesn't happen in place so i need to set it to itself
df_pasted_data = df_pasted_data.append(temp, ignore_index=True)
This works, but I've read online that doing appends and stuff is not very efficient, is there a better way of going about this?

make use of DataFrame() method and T(Transpose) attribute:
import pandas as pd
df_pasted_data=pd.DataFrame(dictionary).T
#output
print(df_pasted_data)
0 1 2
1 1 2 3
2 4 5 6

Related

Pandas replace() string with int "Cannot set non-string value in StringArray"

I'm trying to replace strings with integers in a pandas dataframe. I've already visited here but the solution doesn't work.
Reprex:
import pandas as pd
pd.__version__
> '1.4.1'
test = pd.DataFrame(data = {'a': [None, 'Y', 'N', '']}, dtype = 'string')
test.replace(to_replace = 'Y', value = 1)
> ValueError: Cannot set non-string value '1' into a StringArray.
I know that I could do this individually for each column, either explicitly or using apply, but I am trying to avoid that. I'd ideally replace all 'Y' in the dataframe with int(1), all 'N' with int(0) and all '' with None or pd.NA, so the replace function appears to be the fastest/clearest way to do this.

Use Int8Dtype. IntXXDtype allow integer values and <NA>:
test['b'] = test['a'].replace({'Y': '1', 'N': '0', '': pd.NA}).astype(pd.Int8Dtype())
print(test)
# Output
a b
0 <NA> <NA>
1 Y 1
2 N 0
3 <NA>
>>> [type(x) for x in test['b']]
[pandas._libs.missing.NAType,
numpy.int8,
numpy.int8,
pandas._libs.missing.NAType]

Find a column name and retaining certain string in that entire column values

I would like to format the "status" column in a csv and retain the string inside single quotation adjoining comma ('sometext',)
Example:
Input
as in row2&3 - if more than one values are found in any column values then it should be concatenated with a pipe symbol(|)Ex. Phone|Charger
Expected output should get pasted in same status column like below
My attempt (not working):
import pandas as pd
df = pd.read_csv("test projects.csv")
scol = df.columns.get_loc("Status")
statusRegex = re.
compile("'\t',"?"'\t',") mo = statusRegex.search (scol.column)

Let say you have df as :
df = pd.DataFrame([[[{'a':'1', 'b': '4'}]], [[{'a':'1', 'b': '2'}, {'a':'3', 'b': '5'}]]], columns=['pr'])
df:
pr
0 [{'a': '1', 'b': '4'}]
1 [{'a': '1', 'b': '2'}, {'a': '3', 'b': '5'}]
df['comb'] = df.pr.apply(lambda x: '|'.join([i['a'] for i in x]))
df:
pr comb
0 [{'a': '1', 'b': '4'}] 1
1 [{'a': '1', 'b': '2'}, {'a': '3', 'b': '5'}] 1|3

import pandas as pd
# simplified mock data
df = pd.DataFrame(dict(
value=[23432] * 3,
Status=[
[{'product.type': 'Laptop'}],
[{'product.type': 'Laptop'}, {'product.type': 'Charger'}],
[{'product.type': 'TV'}, {'product.type': 'Remote'}]
]
))
# make a method to do the desired formatting / extration of data
def da_piper(cell):
"""extracts product.type and concatenates with a pipe"""
vals = [_['product.type'] for _ in cell] # get only the product.type values
return '|'.join(vals) # join them with a pipe
# save to desired column
df['output'] = df['Status'].apply(da_piper) # apply the method to the Status col
Additional help: You do not need to use read_excel since csv is not an excel format. It is comma separated values which is a standard format. in this case you can just do this:
import pandas as pd
# make a method to do the desired formatting / extration of data
def da_piper(cell):
"""extracts product.type and concatenates with a pipe"""
vals = [_['product.type'] for _ in cell] # get only the product.type values
return '|'.join(vals) # join them with a pipe
# read csv to dataframe
df = pd.read_csv("test projects.csv")
# apply method and save to desired column
df['Status'] = df['Status'].apply(da_piper) # apply the method to the Status col

Thank you all for the help and suggestions. Please find the final working codes.
df = pd.read_csv('test projects.csv')
rows = len(df['input'])
def get_values(value):
m = re.findall("'(.+?)'",value)
word = ""
for mm in m:
if 'value' not in str(mm):
if 'autolabel_strategy' not in str(mm):
if 'String Matching' not in str(mm):
word += mm + "|"
return str(word).rsplit('|',1)[0]
al_lst =[]
ans_lst = []
for r in range(rows):
auto_label = df['autolabeledValues'][r]
answers = df['answers'][r]
al = get_values(auto_label)
ans = get_values(answers)
al_lst.append(al)
ans_lst.append(ans)
df['a'] = al_lst
df['b'] = ans_lst
df.to_csv("Output.csv",index=False)

Express list comprehension as for loop

I'm currently trying to wrap my head around list comprehensions and try to get some practice by taking examples and form loops out of comprehensions and vice versa. Probably a really easy mistake, or a forest for the trees situation. Take the following expression taken from an example project:
rows = []
data = ['a', 'b']
res = ['1', '2']
rows.append({data[counter]: res[counter] for counter, _ in enumerate(data)})
print(rows):
[{'a': '1', 'b': '2'}]
How do i do this as a for loop? The following wraps each loop into a curly bracket instead of both.
for counter, _ in enumerate(data):
rows.append({data[counter]: res[counter]})
print(rows):
[{'a': '1'}, {'b': '2'}]
Am i missing something? Or do i have to merge the items by hand when using a for loop?

The problem in your code is that you create a dictionary for each item in data and append it to rows in each iteration.
In order to achieve the desired behaviour, you should update the same dict in each iteration and after you finish working on your dictionary, only then you should append it to rows.
Try this:
rows = []
data = ['a', 'b']
res = ['1', '2']
payload = {}
for counter, val in enumerate(data):
payload[val] = res[counter]
rows.append(payload)
Another compact way to write it might be:
rows.append(dict(zip(data,res)))

On every iteration of for loop you are creating a new dictionary and appending it into a list if you want to store a whole dictionary in a list then You should try something like that it outputs as you expected:
rows = []
data = ['a', 'b']
res = ['1', '2']
myDict = {}
for counter, _ in enumerate(data):
myDict[data[counter]]= res[counter]
rows.append(myDict)
print(rows)
Output:
[{'b': '2', 'a': '1'}]

How to find the similarities for set and print the value

I hope to make a dictionary and a list into set then if a.keys() == b then I will print the a.values().
Example:
c = [{'1': '0'}, {'0': '5'},{'2': '0'}]
d = {1,2}
I hope to make these two into the set. Then find all the similarities and print the values without changing the sequence.
Example, I want to print this.
{'1': '0'}
{'2': '0'}
Is it possible to use set?
Below is my code:
a = set(c.keys()) & set(d)
print(a)
for x in a:
y,z = c[x]

Since your example set contains integers while the keys in your example dicts are strings, you should convert the integers in the set to strings first. After that you can simply loop through each dict in the list and if the keys of the dict intersects with the set, then print the dict since it's a match:
d = set(map(str, d))
for i in c:
if i.keys() & d:
print(i)
This outputs:
{'1': '0'}
{'2': '0'}

First of all, you specified your input values the wrong way. The dictionary c should be defined as a dictionary with keys and values and not as a list of dictionaries with one item each - as you did. The keys should be specified as integer and not as string. Otherwise you need to cast them from string to int later on. The second item d is specified the wrong way, too. This should be a list of integers and not a dictionary.
Here's the code that specifies the input values correctly and gives you the desired output:
c = {1: '0', 0: '5', 2: '0'}
d = [1,2]
distinct_keys = c.keys() & set(d)
# {1, 2}
distinct_values = {key: value for key, value in c.items() if key in distinct_keys}
# {1: '0', 2: '0'}
distinct_values
This gives {1: '0', 2: '0'} as output.

Split list elements to key/val dictionary

I have this:
query='id=10&q=7&fly=none'
and I want to split it to create a dictionary like this:
d = { 'id':'10', 'q':'7', 'fly':'none'}
How can I do it with little code?

By splitting twice, once on '&' and then on '=' for every element resulting from the first split:
query='id=10&q=7&fly=none'
d = dict(i.split('=') for i in query.split('&'))
Now, d looks like:
{'fly': 'none', 'id': '10', 'q': '7'}

In your case, the more convenient way would be using of urllib.parse module:
import urllib.parse as urlparse
query = 'id=10&q=7&fly=none'
d = {k:v[0] for k,v in urlparse.parse_qs(query).items()}
print(d)
The output:
{'id': '10', 'q': '7', 'fly': 'none'}
Note, that urlparse.parse_qs() function would be more useful if there multiple keys with same value in a query string. Here is an example:
query = 'id=10&q=7&fly=none&q=some_identifier&fly=flying_away'
d = urlparse.parse_qs(query)
print(d)
The output:
{'q': ['7', 'some_identifier'], 'id': ['10'], 'fly': ['none', 'flying_away']}
https://docs.python.org/3/library/urllib.parse.html#urllib.parse.parse_qs

This is what I came up with:
dict_query = {}
query='id=10&q=7&fly=none'
query_list = query.split("&")
for i in query_list:
query_item = i.split("=")
dict_query.update({query_item[0]: query_item[1]})
print(dict_query)
dict_query returns what you want. This code works by splitting the query up into the different parts, and then for each of the new parts, it splits it by the =. It then updates the dict_query with each new value. Hope this helps!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Turn a dictionary of dictionaries into a dataframe - python

make use of DataFrame() method and T(Transpose) attribute: import pandas as pd df_pasted_data=pd.DataFrame(dictionary).T #output print(df_pasted_data) 0 1 2 1 1 2 3 2 4 5 6

Related

Pandas replace() string with int "Cannot set non-string value in StringArray"

Find a column name and retaining certain string in that entire column values

Express list comprehension as for loop

How to find the similarities for set and print the value

Split list elements to key/val dictionary

Categories

Resources