I'm running a for loop using pandas that checks if another DataFrame with same name has been created. If it has been created, then just append the values to the correspondent columns. If it has not been created, then create the df and append the values to the named columns.
dflistglobal = []
####
For loop that generate a, b, and c variables every time it runs.
####
###
The following code runs inside the for loop, so that everytime it runs, it should generate a, b, and c, then check if a df has been created with a specific name, if yes, it should append the values to that "listname". If not, it should create a new list with "listname". List name changes everytime I run the code, and it varies but can be repeated during this for loop.
###
if listname not in dflistglobal:
dflistglobal.append(listname)
listname = pd.DataFrame(columns=['a', 'b', 'c'])
listname = listname.append({'a':a, 'b':b, 'c':c}, ignore_index=True)
else:
listname = listname.append({'a':a, 'b':b, 'c':c}, ignore_index=True)
I am getting the following error:
File "test.py", line 150, in <module>
functiontest(image, results, list)
File "test.py", line 68, in funtiontest
listname = listname.append({'a':a, 'b':b, 'c':c}, ignore_index=True)
AttributeError: 'str' object has no attribute 'append'
The initial if statement runs fine, but the else statement causes problems.
Solved this issue by not using pandas dataframes. I looped thru the for loop generating a unique identifier for each listname, then appended a,b,c,listname on a list. At the end you will end up with a large df that can be filtered using the groupby function.
Not sure if this will be helpful for anyone, but avoid creating pandas dfs and using list is the best approach.
That error tells you that listname is a string (and you cannot append a DataFrame to a string).
You may want to check if somewhere in your code you are adding a string to your list dflistglobal.
EDIT: Possible solution
I'm not sure how you are naming your DataFrames, and I don't see how you can access them afterwards.
Instead of using a list, you can store your DataFrames inside a dictionary dict = {"name": df}. This will let you easily access the DataFrames by name.
import pandas as pd
import random
df_dict = {}
# For loop
for _ in range(10):
# Logic to get your variables (example)
a = random.randint(1, 10)
b = random.randint(1, 10)
c = random.randint(1, 10)
# # Logic to get your DataFrame name (example)
df_name = f"dataframe{random.randint(1,10)}"
if df_name not in df_dict.keys():
# DataFrame with same name does not exist
new_df = pd.DataFrame(columns=['a', 'b', 'c'])
new_df = new_df.append({'a':a, 'b':b, 'c':c}, ignore_index=True)
df_dict[df_name] = new_df
else:
# DataFrame with same name exists
updated_df = df_dict[df_name].append({'a':a, 'b':b, 'c':c}, ignore_index=True)
df_dict[df_name] = updated_df
Also, for more info, you may want to visit this question
I hope it was clear and it helps.
I'm new with pandas...I'm trying add a new colum to a df (df['new_col']) ..but when I make have this error:
import requests
import pandas as pd
import json
res = requests.get("http://api.etherscan.io/api?module=account&action=txlist&address=0xddbd2b932c763ba5b1b7ae3b362eac3e8d40121a&startblock=0&endblock=99999999&sort=asc&apikey=YourApiKeyToken")
j = res.json()
df = pd.DataFrame(j['result'])
#add column
df = df['new_col'] = '12'
print(df.head())
Traceback (most recent call last):
File "pandas_csv.py", line 8, in <module>
df = df['new_col'] = '12'
TypeError: 'str' object does not support item assignment
Any idea ?
Just to explain why that's happening, here's a simpler MCVE of the problem:
d = {1: "a"}
d = d[1] = "3"
TypeError: 'str' object does not support item assignment
This happens because, as described here, df = df['new_col'] = '12' is equivilent to:
df = "3"
df['new_col'] = '12'
Now, it should be obvious why the error is happening. df is overwritten with a string before the 'new_col' assignment happens.
Just replace
df = df['new_col'] = '12'
by
df['new_col'] = '12'
I loaded a csv, tried to pipe some functions and get the following error:
AttributeError: 'NoneType' object has no attribute 'pipe'
df = pd.read_csv('file.csv')
def func1(df):
df['newcol'] = ...some code
def func2(df):
df['newcol2'] = ...some code
(
df.pipe(func1)
.pipe(func2)
)
when I print out df, it prints the dataframe normally. No idea why I get that error. Pandas v0.24.2. Python v3.7
you need to return df from func1 as it is input to func2.
I had same issue as I was not returning df from a function in middle. Hope this helps!!
When I try to apply the values_count() method to series within a function, I am told that 'Series' object has no attribute 'values_counts'.
def replace_1_occ_feat(col_list, df):
for col in col_list:
feat_1_occ = df[col].values_counts()[df[col].values_counts() == 1].index
feat_means = df[col].groupby(col)['SalePrice'].mean()
feat_means_no_1_occ = feat_means.iloc[feat_means.difference(feat_1_occ),:]
for feat in feat_1_occ:
# Find the closest mean SalePrice
replacement = (feat_means_no_1_occ - feat_means.iloc[feat,:]).idxmin()
df.col.replace(feat, replacement, inplace = True)
However when running df.column.values_count() outside a function it works.
The problem occurs on the first line when the values_counts() methods is used.
I checked the pandas version it's 0.23.0.
The function is value_counts(). Note only count is plural.
I've written the function (tested and working) below:
import pandas as pd
def ConvertStrDateToWeekId(strDate):
dateformat = '2016-7-15 22:44:09'
aDate = pd.to_datetime(strDate)
wk = aDate.isocalendar()[1]
yr = aDate.isocalendar()[0]
Format_4_5_4_date = str(yr) + str(wk)
return Format_4_5_4_date'
and from what I have seen on line I should be able to use it this way:
ml_poLines = result.value.select('PURCHASEORDERNUMBER', 'ITEMNUMBER', PRODUCTCOLORID', 'RECEIVINGWAREHOUSEID', ConvertStrDateToWeekId('CONFIRMEDDELIVERYDATE'))
However when I "show" my dataframe the "CONFIRMEDDELIVERYDATE" column is the original datetime string! NO errors are given.
I've also tried this:
ml_poLines['WeekId'] = (ConvertStrDateToWeekId(ml_poLines['CONFIRMEDDELIVERYDATE']))
and get the following error:
"ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions." which makes no sense to me.
I've also tried this with no success.
x = ml_poLines.toPandas();
x['testDates'] = ConvertStrDateToWeekId(x['CONFIRMEDDELIVERYDATE'])
ml_poLines2 = spark.createDataFrame(x)
ml_poLines2.show()
The above generates the following error:
AttributeError: 'Series' object has no attribute 'isocalendar'
What have I done wrong?
Your function ConvertStrDateToWeekId takes a string. But in the following line the argument of the function call is a series of strings:
x['testDates'] = ConvertStrDateToWeekId(x['CONFIRMEDDELIVERYDATE'])
A possible workaround for this error is to use the apply-function of pandas:
x['testDates'] = x['CONFIRMEDDELIVERYDATE'].apply(ConvertStrDateToWeekId)
But without more information about the kind of data you are processing it is hard to provide further help.
This was the work-around that I got to work:
`# convert the confirimedDeliveryDate to a WeekId
x= ml_poLines.toPandas();
x['WeekId'] = x[['ITEMNUMBER', 'CONFIRMEDDELIVERYDATE']].apply(lambda y:ConvertStrDateToWeekId(y[1]), axis=1)
ml_poLines = spark.createDataFrame(x)
ml_poLines.show()`
Not quite as clean as I would like.
Maybe someone else cam propose a cleaner solution.