Iterating pandas frame fails

Iterating pandas frame fails - python

I'm very new to python. I recently downloaded this project which is used to analyze stock trends on reddit. The project is located here:
They have code in the Procces.py
def calculate_df(df):
data_df = df.filter(['tickers', 'score', 'sentiment'])
tickers_processed = pd.DataFrame(df.tickers.explode().value_counts())
tickers_processed = tickers_processed.rename(columns = {'tickers':'counts'})
tickers_processed['score'] = 0.0
tickers_processed['sentiment'] = 0.0
for idx, row_tick in enumerate(tickers_processed.iloc):
I'm getting an error when I try to enumerate the tickers_processed.iloc
Exception has occurred: NotImplementedError
ix is not iterable
Stack track:
File "C:\Users\MyUser\Desktop\NLP\trading-bot-base\tickerrain\process.py", line 113, in calculate_df
for idx, row_tick in enumerate(tickers_processed.iloc):
File "C:\Users\MyUser\Desktop\NLP\trading-bot-base\tickerrain\process.py", line 152, in processed_df
return calculate_df(df), calculate_df(df_3), calculate_df(df_1)
I've looked at a few other questions about this, they said to try to do something like this instead:
for idx, row_tick in tickers_processed.iloc[::1]
I tried this and it didn't work either. Does anyone know how I can enumerate the iloc?

Try using df.iterrows()
for idx, row_tick in tickers_processed.iterrows():
...

Related

Python - trying to Calculate the distance between Starting post code and Destination post code for each entry of my data. Issue with Country [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 months ago.
Improve this question
I have the following table:
Country
Starting Post Code
Destination post Code
US
99685
65039
GB
AB15
DD9
That I am trying to run the following query to return the road miles between the starting and end postcodes, but as I am learning Python I am struggling to get it to pass in the Country from the table below. I can use the commented-out code to pass either 'GB' or 'US' but I need to have this variable built in from the table.
import pandas as pd
import pgeocode
df = pd.read_excel("C:\\Users\\APP DEV\Python\\distance\\Road\\Address.xlsx",sheet_name=0)
#dist = pgeocode.GeoDistance('GB')
dist = pgeocode.GeoDistance(df['Country']).astype(str).to_list()
df["Distance"]=dist.query_postal_code(df['Starting Post Code'].astype(str).to_list(),df['Destination post Code'].astype(str).to_list())
print(df)
The issue is with the following line:
dist = pgeocode.GeoDistance(df['Country']).astype(str).to_list()
The error traceback I get is as follows:
> Traceback (most recent call
> last): File "c:\Users\kyddgorg\Desktop\APP
> DEV\Python\distance\Road\Distance.py", line 6, in <module>
> dist = pgeocode.GeoDistance(df['Country']).astype(str).to_list() File
> "C:\Users\kyddgorg\AppData\Local\Programs\Python\Python310\lib\site-packages\pgeocode.py",
> line 333, in __init__
> super().__init__(country) File "C:\Users\kyddgorg\AppData\Local\Programs\Python\Python310\lib\site-packages\pgeocode.py",
> line 193, in __init__
> country = country.upper() File "C:\Users\kyddgorg\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\generic.py",
> line 5575, in __getattr__
> return object.__getattribute__(self, name) AttributeError: 'Series' object has no attribute 'upper'
Thanks for any help

I took the liberty of modifying the column labels to unify their names, please make sure they match your data files.
import pgeocode
import pandas as pd
df = pd.read_excel("C:\\Users\\APP DEV\Python\\distance\\Road\\Address.xlsx",sheet_name=0)
## This is a dictionary that simulates the data
# data = {
# "Country": ["US", "GB"],
# "Starting post code": ["99685", "AB15"],
# "Destination post code": ["65039", "DD9"],
# }
# df = pd.DataFrame.from_dict(data=data)
df["Distance"] = df.apply(
lambda row: pgeocode.GeoDistance(row["Country"]).query_postal_code(
row["Starting post code"], row["Destination post code"]
),
axis=1,
)
print(df)
I must warn you that using apply is quite inneficient and may scale badly if you have a millions of rows.

how can I make a for loop to populate a DataFrame?

and from the begining I thanks everyone that seeks to help.
I have started to learn python and came across a opportunity to use python to my advantage at work
Im basically made a script that reads a google sheets file, import it into pandas and cleaned up the data.
In the end, I just wanna have the name of the agents in the columns and all of their values for resolucao column below them so I can take the average amount of time for all of the agentes, but I'm struggling to make it with a list comprehension / for loop.
This is what the DataFrame looks like after I cleaned it up
And this is the Code that I tried to Run
PS: Sorry for the messy code.
agentes_unique = list(df['Agente'].unique())
agentes_duplicated = df['Agente']
value_resolucao_duplicated = df['resolucao']
n_of_rows = []
for row in range(len(df)):
n_of_rows.append(row)
i = 0
while n_of_rows[i] < len(n_of_rows):
df2 = pd.DataFrame({agentes_unique[i]: (value for value in df['resolucao'][i] if df['Agente'][i] == agentes_unique[i])})
i+= 1
df2.to_excel('teste.xlsx',index = True, header = True)
But in the end it came to this error:
Traceback (most recent call last):
File "C:\Users\FELIPE\Desktop\Python\webscraping\.venv\lib\site-packages\pandas\core\indexes\range.py", line 385, in get_loc
return self._range.index(new_key)
ValueError: 0 is not in range
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "c:\Users\FELIPE\Desktop\Python\webscraping\bot_csv_extract\bot_login\main.py", line 50, in <module>
df2 = pd.DataFrame({'Agente': (valor for valor in df['resolucao'][i] if df['Agente'][i] == 'Gabriel')})
File "C:\Users\FELIPE\Desktop\Python\webscraping\.venv\lib\site-packages\pandas\core\series.py", line 958, in __getitem__
return self._get_value(key)
File "C:\Users\FELIPE\Desktop\Python\webscraping\.venv\lib\site-packages\pandas\core\series.py", line 1069, in _get_value
loc = self.index.get_loc(label)
File "C:\Users\FELIPE\Desktop\Python\webscraping\.venv\lib\site-packages\pandas\core\indexes\range.py", line 387, in get_loc
raise KeyError(key) from err
KeyError: 0
I feel like I'm making some obvious mistake but I cant fix it
Again, thanks to anyone who tries to help

Are you looking to do something like this? This is just sample data, but a good start for what you are looking to do if I understand what your wanting to do.
data = {
'Column1' : ['Data', 'Another_Data', 'More_Data', 'Last_Data'],
'Agente' : ['Name', 'Another_Name', 'More_Names', 'Last_Name'],
'Data' : [1, 2, 3, 4]
}
df = pd.DataFrame(data)
df = df.pivot(index = 'Column1', columns=['Agente'], values = 'Data')
df.reset_index()

It is not recommended to use for loops against pandas DataFrames: It is considered messy and inefficient.
With some practice you will be able to approach problems in such a way that you will not need to use for loops in these cases.
From what I understand, your goal can be realized in 3 simple steps:
1. Select the 2 columns of interest. I recommend you take a look at how to access different elements of a dataFrame:
df = df[["Agent", "resolucao"]]
2. Convert the column you want to average to a numeric value. Say seconds:
df["resolucao"] = pd.to_timedelta(df['resolucao'].astype(str)).dt.total_seconds()
3. Apply an average aggregation, via the groupby() function:
df = df.groupby(["Agente"]).mean().reset_index()
Hope this helps.
For the next time, I also recommend you to not post the database as an image in order to be able to reproduce your code.
Cheers and keep it up!

Expected unicode, got pandas._libs.properties.CachedProperty

I,m trying to add empty column in my dataset on colab but it give me this error. and when I,m trying to run it on my local machine it works perfectly fine. does anybody know possible solution for this?
My code.
dataframe["Comp"] = ''
dataframe["Negative"] = ''
dataframe["Neutral"] = ''
dataframe["Positive"] = ''
dataframe
Error message
TypeError: Expected unicode, got pandas._libs.properties.CachedProperty

I run into similar issue today.
"Expected unicode, got pandas._libs.properties.CachedProperty"
my dataframe(called df) has timeindex. When add a new column to it, and fill with numpy.array data, it raise this error. I tried set it with df.index or df.index.value. It always raise this error.
Finally, I solved by 3 stesp:
df = df.reset_index()
df['new_column'] = new_column_data # it is np.array format
df = df.set_index('original_index_name')
WY

this Quetion is the same as https://stackoverflow.com/a/67997139/16240186, and there's a simple way to solve it: df = df.asfreq('H') # freq can be min\D\M\S\5min etc.

Deleting rows from Python Dataframe with condition

I'm trying to delete some rows from huge dataset in Pandas. I decided to use iterrows() function for searching for indexes to delete (since I know that deleting while iteration is bad idea).
Right now it looks like that:
list_to_delete = []
rows_to_delete = {}
for index, row in train.iterrows():
if <some conditions>:
list_to_delete.append(int(index))
rows_to_delete[int(index)] = row
train = train.drop([train.index[i] for i in list_to_delete])
It's giving me such error:
Traceback (most recent call last):
File "C:/Users/patka/PycharmProjects/PARSER/getStatistics.py", line 115, in <module>
train = train.drop([train.index[i] for i in list_to_delete])
File "C:/Users/patka/PycharmProjects/PARSER/getStatistics.py", line 115, in <listcomp>
train = train.drop([train.index[i] for i in list_to_delete])
File "C:\Users\patka\PycharmProjects\PARSER\venv\lib\site-packages\pandas\core\indexes\base.py", line 3958, in __getitem__
return getitem(key)
IndexError: index 25378 is out of bounds for axis 0 with size 25378
How is it possible?
Before that I created a copy of this dataset and tried to delete chosen rows from this copy while iterating through original one (with inplace=True). Unfortunately there was error saying that NoneType object has no attribute 'drop'.
I would appreciate your help very much.
My example row looks like that:
resolution Done
priority Major
created 2000-07-04T13:13:52.000+0200
status Resolved
Team XBee
changelog {'Team" : {'from':...

KeyError: 'labels [data] not contained in axis'

I have tried several different methods to add a row to an existing Pandas Dataframe. For example I tried the solution here. However I was not able to correct the issue. I have reverted back to my original code in hopes someone can help me here.
Here is my code:
print('XDF Created, Starting Bucket Separation...')
XDFDFdrop = pd.DataFrame.duplicated(XDFDF,subset='LastSurveyMachineID')
index_of_unique = XDFDF.drop_duplicates(subset='LastSurveyMachineID')
for index,row in zip(XDFDFdrop,XDFDF.itertuples()):
if index:
goodBucket.append(row)
else:
badBucket.append(row)
goodBucketDF = pd.DataFrame(goodBucket)
badBucketDF = pd.DataFrame(badBucket)
print('Bucket Separation Complete, EmailPrefix to F+L Test Starting...')
for emp , fname , lname , row1 in zip(goodBucketDF['EmailPrefix'] , goodBucketDF['Fname'] , goodBucketDF['Lname'] , goodBucketDF.itertuples()):
for emp2 , row2 in zip(goodBucketDF['EmailPrefix'] , goodBucketDF.itertuples()):
if columns != rows:
temp = fuzz.token_sort_ratio((fname+lname),emp)
temp2 = fuzz.token_sort_ratio((fname+lname),emp2)
if abs(temp - temp2) < 10:
badBucketDF.append(list(row2))
goodBucketDF = goodBucketDF.drop(row2)
removed = True
rows += 1
if removed:
badBucketDF.append(list(row2))
goodBucketDF = goodBucketDF.drop(row2)
removed = False
columns += 1
Please note: XDFDF is a relatively large data set that is built using pandas and was pulled from a database (it should not affect the code you see just figured I would disclose that information).
This is my Error:
Traceback (most recent call last):
File "/Users/john/PycharmProjects/Greatness/venv/Recipes.py", line 122, in <module>
goodBucketDF = goodBucketDF.drop([rows])
File "/Users/john/PycharmProjects/Greatness/venv/lib/python3.6/site-packages/pandas/core/frame.py", line 3694, in drop
errors=errors)
File "/Users/john/PycharmProjects/Greatness/venv/lib/python3.6/site-packages/pandas/core/generic.py", line 3108, in drop
obj = obj._drop_axis(labels, axis, level=level, errors=errors)
File "/Users/john/PycharmProjects/Greatness/venv/lib/python3.6/site-packages/pandas/core/generic.py", line 3140, in _drop_axis
new_axis = axis.drop(labels, errors=errors)
File "/Users/john/PycharmProjects/Greatness/venv/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 4387, in drop
'labels %s not contained in axis' % labels[mask])
KeyError: 'labels [(15, '1397659289', 'joshi.penguin#gmail.com', 'jim', 'smith', '1994-05-04', 'joshi.penguin', 'CF032611-8A86-4688-9715-E1278E75D046')] not contained in axis'
Process finished with exit code 1
I would like to know if anyone has a solution to this error so that: I can add a take a row from one Dataframe, place it in the the other DataFrame (does not need to be in order, and I don't care if index duplicates or not). Once it is in its new Dataframe I want to remove it from the old one.
My current issue is removing the row from the old Dataframe. Any help would be appreciated.
If you have any questions on the code please let me know and I will respond as soon as I can. Thank you for your help.
Edit 1
Below I have included a printout of row1. Hopefully this will help as well.
Pandas(Index=1, _1=2, entity_id='1180722688', email='assassin_penguin#live.com', Fname='jim', Lname='smith', Birthdate='1990-09-14', EmailPrefix='assassin_penguin', LastSurveyMachineID=None)

Given that XDFDF is a pandas.DataFrame, shouldn't the following work?
XDFDFdrop = pd.DataFrame.duplicated(XDFDF,subset='LastSurveyMachineID')
goodBucket = XDFDF.loc[~XDFDFdrop] #the ~ negates a boolean array
badBucket = XDFDF.loc[XDFDFdrop]
Edit:
The updated error comes from you passing an entire row rather than an index to the function pandas.DataFrame.drop.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Iterating pandas frame fails - python

Try using df.iterrows() for idx, row_tick in tickers_processed.iterrows(): ...

Related

Python - trying to Calculate the distance between Starting post code and Destination post code for each entry of my data. Issue with Country [closed]

how can I make a for loop to populate a DataFrame?

Expected unicode, got pandas._libs.properties.CachedProperty

Deleting rows from Python Dataframe with condition

KeyError: 'labels [data] not contained in axis'

Categories

Resources