I,m trying to add empty column in my dataset on colab but it give me this error. and when I,m trying to run it on my local machine it works perfectly fine. does anybody know possible solution for this?
My code.
dataframe["Comp"] = ''
dataframe["Negative"] = ''
dataframe["Neutral"] = ''
dataframe["Positive"] = ''
dataframe
Error message
TypeError: Expected unicode, got pandas._libs.properties.CachedProperty
I run into similar issue today.
"Expected unicode, got pandas._libs.properties.CachedProperty"
my dataframe(called df) has timeindex. When add a new column to it, and fill with numpy.array data, it raise this error. I tried set it with df.index or df.index.value. It always raise this error.
Finally, I solved by 3 stesp:
df = df.reset_index()
df['new_column'] = new_column_data # it is np.array format
df = df.set_index('original_index_name')
WY
this Quetion is the same as https://stackoverflow.com/a/67997139/16240186, and there's a simple way to solve it: df = df.asfreq('H') # freq can be min\D\M\S\5min etc.
Related
So I have a csv, and I am trying to load it into a dataframe via
df = pd.read_csv("watchlist.csv", sep='\s{2,}',)
It seems to work fine when I print(df)
Also, when I print columns, this is the output I get.
print(df.columns) #- OUTPUT:
Index([',Name,Growth,Recommendation,CurrentRatio,TotalCash,Debt,Revenue,PercentageSharesOut,PercentageInstitutions,PercentageInsiders,PricetoBook,ShortRatio,RegularMarketPrice'], dtype='object')
The trouble I'm having, is that when I try to then go and access a column with something like
med_debt = math.floor(df.Debt), or even
print(df.Debt)
I get an attribute error:
AttributeError: 'DataFrame' object has no attribute 'Debt'
Any assistance here would be appreicated
sep='\s{2,}' parameter will cause column list to become an object of type string, example:
>>> df = pd.read_csv("weather", sep='\s{2,}')
>>> df.columns
Index(['Date/Time,Temp (C),Dew Point Temp (C),Rel Hum (%),Wind Spd (km/h),
Visibility (km),Stn Press (kPa),Weather'], dtype='object')
>>> df.index
RangeIndex(start=0, stop=8784, step=1)
When you try to access a specific column math.floor(df.Debt) it returns
AttributeError: 'DataFrame' object has no attribute 'Debt'
or maybe df["Debt"]
raise KeyError(key) from err
(KeyError: 'Debt')
To have access on specific columns of df by this way, use:
df = pd.read_csv("watchlist.csv")
The separator is not separating the csv correctly, try leaving it out and letting the csv reader use the default value of , instead.
This is my piece of code:
def segregate_files(self, list_of_csv, each_sub_folder):
new_list_of_csv = []
for each_csv in list_of_csv:
pattern = f"{each_sub_folder}/(.*?)/"
self.data_centre = re.search(pattern, each_csv).group(1)
if "org_dashboards/" in each_csv:
each_csv = each_csv.replace("org_dashboards/", f"{self.file_path}/")
else:
each_csv = each_csv.replace("dashboards/", f"{self.file_path}/")
df = pd.read_csv(each_csv)
print(df.shape)
df["Data Centre"] = self.data_centre
print(df.shape)
df.to_csv(each_csv)
new_list_of_csv.append(each_csv)
# self.list_of_sub_folder.append(f"files/{blob_name}")
print(new_list_of_csv)
self.aggregate_csv_path = f"{self.file_path}/{each_sub_folder}"
return new_list_of_csv, self.aggregate_csv_path
and my dataframe is properly able to read the csv
and there is no error in df["Data Centre"] = self.data_centre
only the shape remains the same
FYI the value of self.data_centre is also correct
Sorry my bad. It was a file write issue. Now it has been resolved. Thank you.
Based on the responses I received in Pandas SettingWithCopyWarning: I'm thoroughly confused and the clear explanation I found at Pandas - Get first row value of a given column, I thought I had all my SettingWithCopyWarning errors solved. Unfortunately, they are back (Python 3.8.5), and I'd appreciate your assistance. My dataframe df has a column 'SBPi_min_time' which I refer to as t_min
t_min
'SBPi_min_time'
df.head()
SBPi_max_time SBPi_max SBPi_min_time SBPi_min delta_p
0 52.257 119.626 55.903 111.256 8.370
1 59.513 118.580 60.562 114.395 4.185
2 62.632 119.626 63.650 112.999 6.627
3 65.721 121.021 67.279 114.395 6.626
4 69.344 120.672 72.414 113.348 7.324
If I now try to copy a value from one line of df to the previous line, I get the infamous SettingWithCopyWarning. I have tried 5 distinct approaches, and get the error in every single case. It's worth noting that the first approach is the one that is recommended in the posts that I have posted links to:
df.iloc[i, df.columns.get_loc('SBPi_min_time')] = df.iloc[i+1, df.columns.get_loc('SBPi_min_time')]
df.iloc[i, df.columns.get_loc(t_min)] = df.iloc[i+1, df.columns.get_loc(t_min)]
df.iloc[i, 3] = df.iloc[i+1, 3]
df[t_min].iloc[i] = df[t_min].iloc[i+1]
df[t_min][i] = df[t_min][i+1]
If there was a way to create a new object (in this case a float) from df[t_min][i+1], I could do so, and then set df[t_min][i] to it, but there doesn't seem to a way in which to do it:
df_copy = df.copy(deep = True)
df[t_min][i] = df_copy[t_min][i+1]
gives me the same error. What on earth am I doing wrong, and what's the fix?
Many thanks in advance
Thomas Philips
I've written the function (tested and working) below:
import pandas as pd
def ConvertStrDateToWeekId(strDate):
dateformat = '2016-7-15 22:44:09'
aDate = pd.to_datetime(strDate)
wk = aDate.isocalendar()[1]
yr = aDate.isocalendar()[0]
Format_4_5_4_date = str(yr) + str(wk)
return Format_4_5_4_date'
and from what I have seen on line I should be able to use it this way:
ml_poLines = result.value.select('PURCHASEORDERNUMBER', 'ITEMNUMBER', PRODUCTCOLORID', 'RECEIVINGWAREHOUSEID', ConvertStrDateToWeekId('CONFIRMEDDELIVERYDATE'))
However when I "show" my dataframe the "CONFIRMEDDELIVERYDATE" column is the original datetime string! NO errors are given.
I've also tried this:
ml_poLines['WeekId'] = (ConvertStrDateToWeekId(ml_poLines['CONFIRMEDDELIVERYDATE']))
and get the following error:
"ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions." which makes no sense to me.
I've also tried this with no success.
x = ml_poLines.toPandas();
x['testDates'] = ConvertStrDateToWeekId(x['CONFIRMEDDELIVERYDATE'])
ml_poLines2 = spark.createDataFrame(x)
ml_poLines2.show()
The above generates the following error:
AttributeError: 'Series' object has no attribute 'isocalendar'
What have I done wrong?
Your function ConvertStrDateToWeekId takes a string. But in the following line the argument of the function call is a series of strings:
x['testDates'] = ConvertStrDateToWeekId(x['CONFIRMEDDELIVERYDATE'])
A possible workaround for this error is to use the apply-function of pandas:
x['testDates'] = x['CONFIRMEDDELIVERYDATE'].apply(ConvertStrDateToWeekId)
But without more information about the kind of data you are processing it is hard to provide further help.
This was the work-around that I got to work:
`# convert the confirimedDeliveryDate to a WeekId
x= ml_poLines.toPandas();
x['WeekId'] = x[['ITEMNUMBER', 'CONFIRMEDDELIVERYDATE']].apply(lambda y:ConvertStrDateToWeekId(y[1]), axis=1)
ml_poLines = spark.createDataFrame(x)
ml_poLines.show()`
Not quite as clean as I would like.
Maybe someone else cam propose a cleaner solution.
I am trying to create a dataframe with Python, which works fine with the following command:
df_test2 = DataFrame(index = idx, data=(["-54350","2016-06-25T10:29:57.340Z","2016-06-25T10:29:57.340Z"]))
but, when I try to get the data from a variable instead of hard-coding it into the data argument; eg. :
r6 = ["-54350", "2016-06-25T10:29:57.340Z", "2016-06-25T10:29:57.340Z"]
df_test2 = DataFrame(index = idx, data=(r6))
I expect this is the same and it should work? But I get:
ValueError: DataFrame constructor not properly called!
Reason for the error:
It seems a string representation isn't satisfying enough for the DataFrame constructor
Fix/Solutions:
import ast
# convert the string representation to a dict
dict = ast.literal_eval(r6)
# and use it as the input
df_test2 = DataFrame(index = idx, data=(dict))
which will solve the error.