Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I am running Jupyter notebook with Python 3.0 using Pandas to read from an Excel file. I am using a converter to change values in columns i.e. -1 to 1. I keep getting a syntax error in the converters.
import pandas as pd
def convert_adult(cell):
if cell==-1:
return 1
return cell
df = pd.read_excel("Merged.xlsx", "Sheet1", converters = ['adult_not_activation':convert_adult])
return is
File "<ipython-input-17-8eea921e19bb>", line 10
df = pd.read_excel("Merged.xlsx", "Sheet1", converters = ['adult_not_activation':convert_adult])
^
SyntaxError: invalid syntax
I am in the beginning of my journey in Python and Pandas so I hope my problem is not too trivial to ask. Cheers
The error points you to the problem :) You are using a list where a dict is expected.
df = pd.read_excel("Merged.xlsx", "Sheet1", converters = {'adult_not_activation': convert_adult})
converters argument takes a dictionary, You are giving this dictionary as a list but with dictionary syntax.
df = pd.read_excel("Merged.xlsx", "Sheet1", converters = {'adult_not_activation':convert_adult}) # {} instead of []
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I'm having the following error:
"ParserError: Error tokenizing data. C error: out of memory"
When I try to read a large dataframe (5 gb), but I am selecting only the columns that interest me and setting the necessary parameters, and even so it does not work. I've tried using chunks parameter.
df = pd.read_csv('file.csv', encoding = 'ISO-8859-1', usecols = names_columns, low_memory = False, nrows = 10000)
The strange thing is that when I put the parameter "nrows = 1000" it works.
I've run dataframes with many more rows than that and it worked perfectly, but this one is giving this error.
Someone has any suggestions?
From this answer:
There should not be a need to mess with low_memory. Remove that parameter option.
Specifying dtypes (should always be done)
Consider the example of one file which has a column called user_id. It contains 10 million rows where the user_id is always numbers. Adding dtype={'user_id': int} to the pd.read_csv() call will make pandas know when it reads the file, that this is only integers.
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I am new to pandas and I have been struggling how to use pivot function.
I am trying to make the index as flat_type.
When I tried to use the pivot function, I kept getting this error:
TypeError: pivot() got multiple values for argument 'index'
And I have no idea how to fix it.
Any help or suggesttion would do greatly.! thankyou have a nice day!
link to dataset: https://data.gov.sg/dataset/median-rent-by-town-and-flat-type
code:
import pandas as pd
import numpy as np
df = pd.read_csv('median-rent-by-town-and-flat-type (1).csv',na_values=['na','-'])
mydf = df.dropna()
mydf = mydf.reset_index()
mydf.pivot(mydf,index="flat_type",columns="town",values="median_rent")
Remove mydf from pivot in paratheses, because already chained mydf with DataFrame.pivot method:
mydf.pivot(index="flat_type",columns="town",values="median_rent")
Another solution is use pandas.pivot - then is changed mydf.pivot to pd.pivot:
pd.pivot(mydf, index="flat_type",columns="town",values="median_rent")
Try my removing df
mydf.pivot(index="flat_type", columns="town", values="median_rent")
Thank you.
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
For the first time, I'm unable to use the .shift() method on a column of the DataFrame I'm working with, giving me a DataFrame' object is not callable error.
sdf = quandl.get("AAII/AAII_SENTIMENT", authtoken="mytoken")
sdf = pd.DataFrame(data = sdf)
sdf = sdf.infer_objects()
sdf.index = pd.to_datetime(sdf.index, dayfirst=True)
sdf = sdf.iloc[9:,]
sdf['sp500_2w_future_close'] = sdf(['S&P 500 Weekly Close']).shift(-2)
I was expecting to get a new column displaying S&P 500 Weekly Close two rows down and instead got this weird error. He, p please!
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I have to set the result of
df.groupby(['région'])['counts'].sum())
as the column c2 of my dataframe.
So I do this:
df['c2'] = pd.to_numeric(df.groupby(['région'])['counts'].sum()).astype(float)
Thus
pd.to_numeric(df.groupby(['région'])['counts'].sum()).astype(float)
has type float, and so df['c2'] should also have type float.
However, when I tried to print the column of my dataframe df['c2'] all values are NaN.
How can I solve this?
EDIT 1:
My code is here
In your code, after this part:
import numpy as np
d_copy = d.copy()
Do this:
d_copy['counts2'] = d_copy.groupby(['region'])['counts'].transform('count')
Results:
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
I need to look up a value in a csv file given a criteria in a function. When I run my program I get all the values but not the one associated with my entry. Any help will be appreciated.
The date looks something like this:
rose,7.95
lily,3.95
begonia,5.95
The function I created is:
def problem3_8(csv_pricefile, flower):
import csv
archivo = open(csv_pricefile)
for row in csv.reader(archivo):
if flower in row[0]:
print(row[1])
archivo.close()
When I ran the program using the next line:
problem3_7("flowers.csv","rose")
I get all the value in the file, like this:
7.95
3.95
5.95
But the answer should be just the value associated with the second entry.
7.95
Thanks
I ran your code given and had the correct output of 7.95.
Is it possible you called the wrong function? In your question you referred to the function problem3_7 instead of the function problem3_8