Dataframe with arrays and key-pairs - python

I have a JSON structure which I need to convert it into data-frame. I have converted through pandas library but I am having issues in two columns where one is an array and the other one is key-pair value.
Pito Value
{"pito-key": "Number"} [{"WRITESTAMP": "2018-06-28T16:30:36Z", "S":"41bbc22","VALUE":"2"}]
How to break columns into the data-frames.

As far as I understood your question, you can apply regular expressions to do that.
import pandas as pd
import re
data = {'pito':['{"pito-key": "Number"}'], 'value':['[{"WRITESTAMP": "2018-06-28T16:30:36Z", "S":"41bbc22","VALUE":"2"}]']}
df = pd.DataFrame(data)
def get_value(s):
s = s[1]
v = re.findall(r'VALUE\":\".*\"', s)
return int(v[0][8:-1])
def get_pito(s):
s = s[0]
v = re.findall(r'key\": \".*\"', s)
return v[0][7:-1]
df['value'] = df.apply(get_value, axis=1)
df['pito'] = df.apply(get_pito, axis=1)
df.head()
Here I create 2 functions that transform your scary strings to values you want them to have
Let me know if that's not what you meant

Related

Adding empty rows in Pandas dataframe

I'd like to append consistently empty rows in my dataframe.
I have following code what does what I want but I'm struggling in adjusting it to my needs:
s = pd.Series('', data_only_trades.columns)
f = lambda d: d.append(s, ignore_index=True)
set_rows = np.arange(len(data_only_trades)) // 4
empty_rows = data_only_trades.groupby(set_rows, group_keys=False).apply(f).reset_index(drop=True)
How can I adjust the code so I add two or more rows instead of one?
How can I set a starting point (e.g. it should start with row 5 -- Do I have to use .loc then in arange?)
Also tried this code but I was struggling in setting the starting row and the values to blank (I got NaN):
df_new = pd.DataFrame()
for i, row in data_only_trades.iterrows():
df_new = df_new.append(row)
for _ in range(2):
df_new = df_new.append(pd.Series(), ignore_index=True)
Thank you!
import numpy as np
v = np.ndarray(shape=(numberOfRowsYouWant,df.values.shape[1]), dtype=object)
v[:] = ""
pd.DataFrame(np.vstack((df.values, v)))
I think you can use NumPy
but, if you want to use your manner, simply convert NaN to "":
df.fillna("")

How to substitute NaN for a text in a DataFrame?

I have a DataFrame and I need to change the content of the cells of a specific column to a text content (for example "not registered").
I am trying different options, these are some of them:
dftotal.fillna({"Computer_OS":"not registered", "Computer_OS_version":"not registered"}, inplace=True)
dftotal.loc[(dftotal["Computer_OS"]=="NaN"),"Computer_OS"] = "not registered"
Assumed that all values in Computer_OS column are string datatype else you would need to change datatype first.
import numpy as np
import pandas as pd
import re
def txt2nan(x):
"""
if given string x contains alphabet
return NaN else original x.
Parameters
----------
x : str
"""
if re.match('[a-zA-Z]', x):
return np.nan
else:
return x
df = pd.DataFrame({"os":["tsd", "ssad d", "sd", "1","2","3"]})
df["os"] = df["os"].apply(txt2nan)
Better sol'tn is to vectorize above operation:
df["os"] = np.where(df["os"].str.match('[a-zA-Z]'), np.nan, df["os"])

What is the best way to convert a string in a pandas dataframe to a list?

Basically I have a dataframe with lists that have been read in as strings and I would like to convert them back to lists.
Below shows what I am currently doing but I m still learning and feel like there must be a better (more efficient/Pythonic) way to go about this. Any help/constructive criticism would be much appreciated!
import pandas as pd
import ast
df = pd.DataFrame(data=['[-1,0]', '[1]', '[1,2]'], columns = ['example'])
type(df['example'][0])
>> str
n = df.shape[0]
temp = []
temp2 = []
for i in range(n):
temp = (ast.literal_eval(df['example'][i]))
temp2.append(temp)
df['new_col_lists'] = temp2
type(df['new_col_lists'][0])
>> list
Maybe you could use a map:
df['example'] = df['example'].map(ast.literal_eval)
With pandas, there is almost always a way to avoid the for loop.
You can use .apply
Ex:
import pandas as pd
import ast
df = pd.DataFrame(data=['[-1,0]', '[1]', '[1,2]'], columns = ['example'])
df['example'] = df['example'].apply(ast.literal_eval)
print( type(df['example'][0]) )
Output:
<type 'list'>
You could use apply with a lambda which splits and converts your strings:
df['new_col_lists'] = df['example'].apply(lambda s: [int(v.strip()) for v in s[1:-1].split(',')])
Use float cast instead of int if needed.

Append model output to pd df rows

I'm trying to put Pyomo model output into pandas.DataFrame rows. I'm accomplishing it now by saving data as a .csv, then reading the .csv file as a DataFrame. I would like to skip the .csv step and put output directly into a DataFrame.
When I accomplish an optimization solution with Pyomo, the optimal assignments are 1 in the model.x[i] output data (0 otherwise). model.x[i] is indexed by dict keys in v. model.x is specific syntax to Pyomo
Pyomo assigns a timeItem[i], platItem[i], payItem[i], demItem[i], v[i] for each value that presents an optimal solution. The 0807results.csv file produces an accurate file of the optimal assignments showing the value of timeItem[i], platItem[i], payItem[i], demItem[i], v[i] for each valid assignment in the optimal solution.
When model.x[i] is 1, how can I get timeItem[i], platItem[i], payItem[i], demItem[i], v[i] directly into a DataFrame? Your assistance is greatly appreciated. My current code is below.
index=sorted(v.keys())
with open('0807results.csv', 'w') as f:
for i in index:
if value(model.x[i])>0:
f.write("%s,%s,%s,%s,%s\n"%(timeItem[i],platItem[i],payItem[i], demItem[i],v[i]))
from pandas import read_csv
now = datetime.datetime.now()
dtg=(now.strftime("%Y%m%d_%H%M"))
df = read_csv('0807results.csv')
df.columns = ['Time', 'Platform','Payload','DemandType','Value']
# convert payload types to string so not summed
df['Payload'] = df['Payload'].astype(str)
df = df.sort_values('Time')
df.to_csv('results'+(dtg)+'.csv')
# do stats & visualization with pandas df
I have no idea what is in the timeItem etc iterables from the code you've posted. However, I suspect that something similar to:
import pandas as pd
results = pd.DataFrame([timeItem, platItem, payItem, demItem, v], index=["time", "plat", "pay", "dem", "v"]).T
Will work.
If you want to filter on 1s in model.x, you might add it as a column as well, and do a filter with pandas directly:
import pandas as pd
results = pd.DataFrame([timeItem, platItem, payItem, demItem, v, model.x], index=["time", "plat", "pay", "dem", "v", "x"]).T
filtered_results = results[results["x"]>0]
You can also use the DataFrame.from_records() function:
def record_generator():
for i in sorted(v.keys()):
if value(model.x[i] > 1E-6): # integer tolerance
yield (timeItem[i], platItem[i], payItem[i], demItem[i], v[i])
df = pandas.DataFrame.from_records(
record_generator(), columns=['Time', 'Platform', 'Payload', 'DemandType', 'Value'])

How do I search for a tuple of values in pandas?

I'm trying to write a function to swap a dictionary of targets with results in a pandas dataframe. I'd like to match a tuple of values and swap out new values. I tried building it as follows, but the the row select isn't working. I feel like I'm missing some critical function here.
import pandas
testData=pandas.DataFrame([["Cats","Parrots","Sandstone"],["Dogs","Cockatiels","Marble"]],columns=["Mammals","Birds","Rocks"])
target=("Mammals","Birds")
swapVals={("Cats","Parrots"):("Rats","Canaries")}
for x in swapVals:
#Attempt 1:
#testData.loc[x,target]=swapVals[x]
#Attempt 2:
testData[testData.loc[:,target]==x,target]=swapVals[x]
This was written in Python 2, but the basic idea should work for you. It uses the apply function:
import pandas
testData=pandas.DataFrame([["Cats","Parrots","Sandstone"],["Dogs","Cockatiels","Marble"]],columns=["Mammals","Birds","Rocks"])
swapVals={("Cats","Parrots"):("Rats","Canaries")}
target=["Mammals","Birds"]
def swapper(in_row):
temp =tuple(in_row.values)
if temp in swapVals:
return list(swapVals[temp])
else:
return in_row
testData[target] = testData[target].apply(swapper, axis=1)
testData
Note that if you loaded the other keys into the dict, you could do the apply without the swapper function:
import pandas
testData=pandas.DataFrame([["Cats","Parrots","Sandstone"],["Dogs","Cockatiels","Marble"]],columns=["Mammals","Birds","Rocks"])
swapVals={("Cats","Parrots"):("Rats","Canaries"), ("Dogs","Cockatiels"):("Dogs","Cockatiels")}
target=["Mammals","Birds"]
testData[target] = testData[target].apply(lambda x: list(swapVals[tuple(x.values)]), axis=1)
testData

Categories