python pandas dataframe missplaced - python

my pandas dataframe is not correctly placing items when i append new row to it.
i use a function to make it easier to append.
also when i append without function it works fine.
image
code:
from emailsender import email_send
import pandas as pd
import numpy as np
try:
file = pd.read_csv("customers.csv")
except:
pass
customers = {"name":["name"],
"last":["last"],
"age_range":[0],
"emails":["namelast#gmail.com"]}
df_customers = pd.DataFrame(customers)
def add_customer(df,name=np.nan,last=np.nan,age=np.nan,email=np.nan):
return df.append({"name":name,
"last":last,
"age_range":age,
"emails":email},ignore_index=True)
df_customers = (df_customers,"mohamed","miboun","mohamedwapana#gmail.com")
print(df_customers)

You can only append Series, df or list-like objects to a dataframe, but you are appending a dictionary to df. So, try this:
...
#your code befor append
...
df_to_add = pd.DataFrame({"name":[name], "last":[last], "age_range":[25], "emails":[email]})
return df.append(df_to_add, ignore_index=True)

Related

Pandas function explode does not work on this DataSeries

The pandas explode function doesn't drop the object elements into rows like it should.
import pandas as pd
import requests
import io
from pandas.io.json import json_normalize
response = requests.request("GET", url, headers=headers, data = payload)
response_text = response.text.encode('utf8')
fundingRate = pd.read_json(response_text,orient='columns',typ='frame')
fundingC = pd.DataFrame(fundingRate['data'])
fundingC = fundingC.T
fundingC = fundingC.astype(object)
fundingdataMap = fundingC['dataMap']
fundingdataMap = fundingdataMap.astype(str)
fundingdataMap = fundingdataMap.str.slice(start=10)
fundingdataMap.explode()
fundingdataMap DataSeries
https://www.pythonanywhere.com/user/armaniallie93/files/home/armaniallie93/fundingdataMap.txt
output
data [0.07280400000000001, 0.013058, 0.01, 0.01, 0....
Name: dataMap, dtype: object
After setting the column elements as a string and slicing the portion I want, no error but it still doesn't produce the explode function correctly. Any insight to why?
The reason for the error is quite simple. You have a dictionary which you are trying to explode, which would not work.
#Removing the first row with dictionary
df.iloc[1:].explode('data')
#Without removing first row
df.explode('data')
You will have to take a call on how you want to convert this dictionary into a list. That would require a lambda function.

Using pandas, how do I turn one csv file column into list and then filter a different csv with the created list?

Basically I have one csv file called 'Leads.csv' and it contains all the sales leads we already have. I want to turn this csv column 'Leads' into a list and then check a 'Report' csv to see if any of the leads are already in there and then filter it out.
Here's what I have tried:
import pandas as pd
df_leads = pd.read_csv('Leads.csv')
leads_list = df_leads['Leads'].values.tolist()
df = pd.read_csv('Report.csv')
df = df.loc[(~df['Leads'].isin(leads_list))]
df.to_csv('Filtered Report.csv', index=False)
Any help is much appreciated!
You can try:
import pandas as pd
df_leads = pd.read_csv('Leads.csv')
df = pd.read_csv('Report.csv')
set_filtered = set(df['Leads'])-(set(df_leads['Leads']))
df_filtered = df[df['Leads'].isin(set_filtered)]
Note: Sets, are significantly faster than lists for this operation.

How can I use a CSV file for Python pdblp instead of a ticker reference for getting API from con.ref

I very new to Python and I want to replace an exact ticker with a reference to a column of a Data Frame I created from a CVS file, can this be done. i'm using:
import pandas as pd
import numpy as np
import pdblp as pdblp
import blpapi as blp
con = pdblp.BCon(debug=False, port=8194, timeout=5000)
con.start()
con.ref("CLF0CLH0 Comdty","PX_LAST")
tickers = pd.read_csv("Tick.csv")
so "tickers" has a colum 'ticker1' which is a list of tickers, i want to replace
con.ref("CLF0CLH0 Comdty","PX_LAST") with somthing like
con.ref([tickers('ticker1')],"PX_LAST")
any ideas?
assuming you would want to load all tickers into one df, i think it would look something like this:
df = pd.DataFrame(columns=["set your columns"])
for ticker in tickers.tickers1:
df_tmp = pd.DataFrame()
con.ref(ticker,"PX_LAST")
df_tmp = con.fetch #you'll have to fetch the records into a df
df.append(df_tmp)
Ended up using the following .tolist() function, and worked well.
tickers = pd.read_csv("Tick.csv")
tickers1=tickers['ticker'].tolist()
con.ref(tickers1,[PX_LAST])

Pandas adding date of file creation as a variable

I have multiple csv files in a folder. I want to add "date_created" as an variable to my dataframe for each csv file. Currently I have something like this:
import glob
import pandas as pd
df = pd.concat([pd.read_csv(f, encoding="utf-16", delimiter = "^") for f in glob.glob('*.csv')])
df.to_csv("all_together.csv")
How could I do this?
Use assign with custom function:
import os
import platform
#https://stackoverflow.com/a/39501288
def creation_date(path_to_file):
"""
Try to get the date that a file was created, falling back to when it was
last modified if that isn't possible.
See http://stackoverflow.com/a/39501288/1709587 for explanation.
"""
if platform.system() == 'Windows':
return os.path.getctime(path_to_file)
else:
stat = os.stat(path_to_file)
try:
return stat.st_birthtime
except AttributeError:
# We're probably on Linux. No easy way to get creation dates here,
# so we'll settle for when its content was last modified.
return stat.st_mtime
L = [pd.read_csv(f, encoding="utf-16", delimiter = "^").assign(date_created=creation_date(f))
for f in glob.glob('*.csv')]
df = pd.concat(L, ignore_index=True)
df.to_csv("all_together.csv")

Pandas apply, how to combine the results returned

In python pandas apply, the applied function takes each row of the Dataframe and will return another Dataframe, how can I get the combination of (append) these Dataframes returned through applying? For example:
# this is an example
import pandas as pd
import numpy as np
def newdata(X, data2):
return X - data2[data2['no']!=X['no']].sample(1,random_state=100)
col = ['no','a','b']
data1 = pd.DataFrame(np.column_stack((range(5),np.random.rand(5,2))),columns=col)
data2 = pd.DataFrame(np.column_stack((range(3),np.random.rand(3,2))),columns=col)
Newdata = data1.apply(newdata, args=(data2,), axis=1)

Categories