Append model output to pd df rows

Append model output to pd df rows - python

I'm trying to put Pyomo model output into pandas.DataFrame rows. I'm accomplishing it now by saving data as a .csv, then reading the .csv file as a DataFrame. I would like to skip the .csv step and put output directly into a DataFrame.
When I accomplish an optimization solution with Pyomo, the optimal assignments are 1 in the model.x[i] output data (0 otherwise). model.x[i] is indexed by dict keys in v. model.x is specific syntax to Pyomo
Pyomo assigns a timeItem[i], platItem[i], payItem[i], demItem[i], v[i] for each value that presents an optimal solution. The 0807results.csv file produces an accurate file of the optimal assignments showing the value of timeItem[i], platItem[i], payItem[i], demItem[i], v[i] for each valid assignment in the optimal solution.
When model.x[i] is 1, how can I get timeItem[i], platItem[i], payItem[i], demItem[i], v[i] directly into a DataFrame? Your assistance is greatly appreciated. My current code is below.
index=sorted(v.keys())
with open('0807results.csv', 'w') as f:
for i in index:
if value(model.x[i])>0:
f.write("%s,%s,%s,%s,%s\n"%(timeItem[i],platItem[i],payItem[i], demItem[i],v[i]))
from pandas import read_csv
now = datetime.datetime.now()
dtg=(now.strftime("%Y%m%d_%H%M"))
df = read_csv('0807results.csv')
df.columns = ['Time', 'Platform','Payload','DemandType','Value']
# convert payload types to string so not summed
df['Payload'] = df['Payload'].astype(str)
df = df.sort_values('Time')
df.to_csv('results'+(dtg)+'.csv')
# do stats & visualization with pandas df

I have no idea what is in the timeItem etc iterables from the code you've posted. However, I suspect that something similar to:
import pandas as pd
results = pd.DataFrame([timeItem, platItem, payItem, demItem, v], index=["time", "plat", "pay", "dem", "v"]).T
Will work.
If you want to filter on 1s in model.x, you might add it as a column as well, and do a filter with pandas directly:
import pandas as pd
results = pd.DataFrame([timeItem, platItem, payItem, demItem, v, model.x], index=["time", "plat", "pay", "dem", "v", "x"]).T
filtered_results = results[results["x"]>0]

You can also use the DataFrame.from_records() function:
def record_generator():
for i in sorted(v.keys()):
if value(model.x[i] > 1E-6): # integer tolerance
yield (timeItem[i], platItem[i], payItem[i], demItem[i], v[i])
df = pandas.DataFrame.from_records(
record_generator(), columns=['Time', 'Platform', 'Payload', 'DemandType', 'Value'])

Related

fastest way to iterate through uneven columns to find the existing value

I have 2 DataFrames of (df1) 35k and (df2) 76k rows where I need to check whether df1["col1"] elements exist in df2["col2"] sub-elements. The code seems to be working fine on a sample dataset I have provided but the runtime takes forever on the original one. Here is a for-loop code I used on the sample dataset:
import pandas as pd
post_token_list = [['wXrL3TbK'], ['wXmTQKw1'], ['wXvnlWej'], ['wXvXBjKp']]
tokens_list = [['wXv3qoPQ', 'wXvT7ylu', 'wXvnIJuH', 'wXvXH7vy', 'wXvDXSS1', 'wXvjVE1F', 'wXvPV6z1', 'wXvHF1uw',
'wXvH1q03', 'wXvnTlcr', 'wXvDEG9U', 'wXLfZtO6', 'wXvLDDDl', 'wXvHTgjk', 'wXvHDDr8', 'wXvPBLbu',
'wXvvxXHI', 'wXvPBFge', 'wXvLxSii', 'wXvDhk2h', 'wXv3Alan', 'wXvvQuKy', 'wXvvQ6LO', 'wXpHNjw9'],
['wXYr2lVk', 'wXXj7iDP', 'wXXXIsQr', 'wXQbXKz6', 'wXN3tMp1', 'wXMfZV5N', 'wXvnlWej', 'wXSDyEaW',
'wXQ7mM78', 'wXMPvojh', 'wXMjo-8G', 'wXLfZtO6', 'wXN3tMp1'],
['wXr_jZmX', 'wXr7D0AM', 'wXrzjhxL', 'wXrfjQNe', 'wXrnihqT', 'wXrjyqm5', 'wXr3CD4h', 'wXrnSZsy',
'wXrTieP7', 'wXLfZtO6', 'wXgHVwkc', 'wXdvewsV', 'wXrfxZeg', 'wXrLB7Zo', 'wXprtX71', 'wXrHhjtO',
'wXrzwKBt', 'wXqz-RlY', 'wXq_fp7F', 'wXq7Po7n', 'wXq7fC73', 'wXqzvRSW', 'wXqf_PQ3', 'wXML2vCd'],
['wXv3aQrv', 'wXvn6ONM', 'wXvfaG0M', 'wXvf6LIr', 'wXvjJBg_', 'wXvL6M-0', 'wXv7p2cd', 'wXv3poSs',
'wXvz5kUz', 'wXvrZz0_', 'wXv_YVCb', 'wXLfZtO6', 'wXvX5Hgi', 'wXvz3Ptg', 'wXvHJUU-', 'wXvr4fB7',
'wXvnlWej', 'wXv_YUrK', 'wXv7Id05', 'wXv7IYOV', 'wXvfYfLo', 'wXv7Y3AV', 'wXvT4_pE', 'wXvPovRt'],
['wXoDui-2', 'wXoT9yTg', 'wXmTQKw1', 'wXormLxu', 'wXMX-NNQ', 'wXo7kUfB', 'wXon0rt_', 'wXozT-3V',
'wXnvYjEc', 'wXnTn9D6', 'wXnLH7Cz', 'wXn_2HV_', 'wXnPGou9', 'wXnPVSNo', 'wXuG0sl3', 'wXnjAs7X',
'wXm38mLv', 'wXmnj5Oh', 'wXmfjQ2h', 'wXm_wXuD', 'wXlPOUmy', 'wXcfHkmx', 'wXQ_62cx', 'wXUD3qyx']]
df1 = pd.DataFrame({"col1": post_token_list})
df2 = pd.DataFrame({"col2": tokens_list})
query_bounce = []
def query_bounce_checker(dataset_clicked, dataset_loaded, col1, col2):
for i in dataset_clicked[col1]:
for j in i:
[query_bounce.append(k) for k in dataset_loaded[col2] if j in k]
return query_bounce
query_bounce_checker(df1, df2, "col1", "col2")
i, j, and k values are used to access and compare the elements and sub-elements of the two respecting columns.
Speed is a contributing factor for me, and the function written here is not fast enough for a dataset of this size.

If this is actually what you want, this should be pretty fast.
import numpy as np
np.intersect1d(np.hstack(df1.col1),np.hstack(df2.col2))
Output
array(['wXmTQKw1', 'wXvnlWej'], dtype='<U8')

I am not sure if it is what you want. If you just want to check which values in df1 also exist in df2, you can transform two dataframes into arrays and use np.in1d() to do so.
Try this:
array1 = np.array((','.join(df1['col1'].apply(lambda x: ','.join(x)))).split(','))
array2 = np.array((','.join(df2['col2'].apply(lambda x: ','.join(x)))).split(','))
print(array1[np.in1d(array1,array2)])
Output:
['wXmTQKw1' 'wXvnlWej']

How to form a matrix of distances between sites in Python?

I have all the data (sites and distances already).
Now I have to form a string matrix to use as an input for another python script.
I have sites and distances as (returned from a query, delimited as here):
A|B|5
A|C|3
A|D|9
B|C|7
B|D|2
C|D|6
How to create this kind of matrix?
A|B|C|D
A|0|5|3|9
B|5|0|7|2
C|3|7|0|6
D|9|2|6|0
This has to be returned as a string from python and I'll have more than 1000 sites, so it should be optimized for such size.
Thanks

I have no doubt it could be done in a cleaner way (because Python).
I will do some more research later on but I do want you to have something to start with, so here it is.
import pandas as pd
data = [
('A','B',5)
,('A','C',3)
,('A','D',9)
,('B','C',7)
,('B','D',2)
,('C','D',6)
]
data.extend([(y,x,val) for x,y,val in data])
df = pd.DataFrame(data, columns=['x','y','val'])
df = df.pivot_table(values='val', index='x', columns='y')
df = df.fillna(0)
Here is a demo for 1000x1000 (take about 2 seconds)
import pandas as pd, itertools as it
data = [(x,y,val) for val,(x,y) in enumerate(it.combinations(range(1000),2))]
data.extend([(y,x,val) for x,y,val in data])
df = pd.DataFrame(data, columns=['x','y','val'])
df = df.pivot_table(values='val', index='x', columns='y')
df = df.fillna(0)

Dataframe with arrays and key-pairs

I have a JSON structure which I need to convert it into data-frame. I have converted through pandas library but I am having issues in two columns where one is an array and the other one is key-pair value.
Pito Value
{"pito-key": "Number"} [{"WRITESTAMP": "2018-06-28T16:30:36Z", "S":"41bbc22","VALUE":"2"}]
How to break columns into the data-frames.

As far as I understood your question, you can apply regular expressions to do that.
import pandas as pd
import re
data = {'pito':['{"pito-key": "Number"}'], 'value':['[{"WRITESTAMP": "2018-06-28T16:30:36Z", "S":"41bbc22","VALUE":"2"}]']}
df = pd.DataFrame(data)
def get_value(s):
s = s[1]
v = re.findall(r'VALUE\":\".*\"', s)
return int(v[0][8:-1])
def get_pito(s):
s = s[0]
v = re.findall(r'key\": \".*\"', s)
return v[0][7:-1]
df['value'] = df.apply(get_value, axis=1)
df['pito'] = df.apply(get_pito, axis=1)
df.head()
Here I create 2 functions that transform your scary strings to values you want them to have
Let me know if that's not what you meant

How do I search for a tuple of values in pandas?

I'm trying to write a function to swap a dictionary of targets with results in a pandas dataframe. I'd like to match a tuple of values and swap out new values. I tried building it as follows, but the the row select isn't working. I feel like I'm missing some critical function here.
import pandas
testData=pandas.DataFrame([["Cats","Parrots","Sandstone"],["Dogs","Cockatiels","Marble"]],columns=["Mammals","Birds","Rocks"])
target=("Mammals","Birds")
swapVals={("Cats","Parrots"):("Rats","Canaries")}
for x in swapVals:
#Attempt 1:
#testData.loc[x,target]=swapVals[x]
#Attempt 2:
testData[testData.loc[:,target]==x,target]=swapVals[x]

This was written in Python 2, but the basic idea should work for you. It uses the apply function:
import pandas
testData=pandas.DataFrame([["Cats","Parrots","Sandstone"],["Dogs","Cockatiels","Marble"]],columns=["Mammals","Birds","Rocks"])
swapVals={("Cats","Parrots"):("Rats","Canaries")}
target=["Mammals","Birds"]
def swapper(in_row):
temp =tuple(in_row.values)
if temp in swapVals:
return list(swapVals[temp])
else:
return in_row
testData[target] = testData[target].apply(swapper, axis=1)
testData
Note that if you loaded the other keys into the dict, you could do the apply without the swapper function:
import pandas
testData=pandas.DataFrame([["Cats","Parrots","Sandstone"],["Dogs","Cockatiels","Marble"]],columns=["Mammals","Birds","Rocks"])
swapVals={("Cats","Parrots"):("Rats","Canaries"), ("Dogs","Cockatiels"):("Dogs","Cockatiels")}
target=["Mammals","Birds"]
testData[target] = testData[target].apply(lambda x: list(swapVals[tuple(x.values)]), axis=1)
testData

work with chunked data while groupby operations are needed

I have a dataset df with three columns: 'String_key_val', 'Float_other_val1', 'Int_other_val2'. I want to groupby on key_val, then extract the sum of val1 (resp. val2) with respect to these groups. Here is my code:
df = pandas.read_csv('test.csv')
grouped = df.groupby('String_key_val')
series_calculus1 = grouped['Float_other_val1'].sum()
series_calculus2 = grouped['Int_other_val2'].sum()
res = pandas.concat([series_calculus1, series_calculus2], axis=1)
res.to_csv('output_test.csv')
My problem is: My entry dataset is 10GB and I have 4Go Ram so I need to chunk my calculus but I can't see how. I thought of using HDFStore, but since I only have to build a numerical dataset, I see no point of storing complete DataFrame, and I don't think HDFStore can store simple arrays.
What can I do?

I believe a simple approach would be something along these lines....
import pandas as pd
summary = pd.DataFrame()
chunker = pd.read_csv('test.csv',iterator=True,chunksize=50000)
for chunk in chunker:
group = chunk.groupby('String_key_val')
out = group[['Float_other_val1','Int_other_val2']].sum()
summary = summary.append(out)
summary = summary.reset_index()
group = summary.groupby('String_key_val')
summary = group[['Float_other_val1','Int_other_val2']].sum()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Append model output to pd df rows - python

Related

fastest way to iterate through uneven columns to find the existing value

How to form a matrix of distances between sites in Python?

Dataframe with arrays and key-pairs

How do I search for a tuple of values in pandas?

work with chunked data while groupby operations are needed

Categories

Resources