I imported stock/options data into a data frame and want to use pandas to manually filter for specific criteria. I renamed a few columns and then later on I tried to do a bit of cleaning so I can work with the data.
I tried to replace percentage signs then convert the data type to a float by doing this:
df = df['IV'].str.rstrip("%").astype(float)
df = df['IV_Rank'].str.rstrip("%").astype(float)/100
df = df['IV PCT'].str.rstrip("%").astype(float)/100
When I run that code I get the error message: KeyError: 'IV'. I got this error for the other columns as well when I tried to run them each independently but I tried copy then pasting the column name as well as trying the old names. I am not too sure what to do but some help would be appreciated
That's because you are overwriting the entire dataframe. This is what I think you are trying to do
df['IV'] = df['IV'].str.rstrip("%").astype(float)
df['IV_Rank'] = df['IV_Rank'].str.rstrip("%").astype(float)/100
df['IV PCT'] = df['IV PCT'].str.rstrip("%").astype(float)/100
Related
While reading Dataframe in Atoti using the following code error is occured which is shown below.
#Code
global_data=session.read_pandas(df,keys=["Row ID"],table_name="Global_Superstore")
#error
ArrowInvalid: Could not convert '2531' with type str: tried to convert to int64
How to solve this? Please help guys..
Was trying to read a Dataframe using atoti functions.
There are values with different types in that particular column. If you aren't going to preprocess the data and you're fine with that column being read as a string, then you should specify the exact datatypes of each of your columns (or that particular column), either when you load the dataframe with pandas, or when you read the data into a table with the function you're currently using:
import atoti as tt
global_superstore = session.read_pandas(
df,
keys=["Row ID"],
table_name="Global_Superstore",
types={
"<invalid_column>": tt.type.STRING
}
)
I am importing a file that is semicolon delimited. my code:
df = pd.read_csv('bank-full.csv', sep = ';')
print(df.shape)
When I use this in Jupyter Notebooks and Spyder I get a shape output of (45211, 1). When I print my dataframe the data looks like this at this point:
<bound method NDFrame.head of age;"job";"marital";"education";"default";"balance";"housing";"loan";"contact";"day";"month";"duration";"campaign";"pdays";"previous";"poutcome";"y"
0 58;"management";"married";"tertiary";"no";2143...
I can get the correct shape by using
df = pd.read_csv('bank-full.csv', sep = '[;]')
print(df.shape)
or
df = pd.read_csv('bank-full.csv', sep = '\;')
print(df.shape)
However when I do this the data seems to get pulled in as though each row is a string. The first and last column get added preceding and ending double quotations respectively, and when I attempt to strip them nothing is working to remove them so either way I am stuck with many of my columns called objects and unable to force them into integers when needed. My data comes out like this:
"age ""job"" ""marital"" ""education"" ""default"" \
0 "58 ""management"" ""married"" ""tertiary"" ""no""
with final column:
""y"""
0 ""no"""
I have reached out to those in my class and had them send me their .csv file, restarted from scratch, tried a different UI, and even copy/pasted their line of code to read and shape the data and get nothing. I have used every resource except asking this here and am out of ideas.
CSVs are usually separated by commas, but sometimes the cells are separated by a different character(s). So, since I don't have access to your exact dataset, I will give you advice that should help you overall.
First, look at the CSV and assess what character(s) are separating each value, then use that as the value in "sep" during your pd.read_csv() call.
Then, whatever columns you want to convert to numeric, you can use pd.to_numeric() to convert the data type. This may present problems if any of the values in the column cannot be converted to numeric, and you will then need to do additional data cleaning.
Below is an example of how to do this to a particular column that I am calling "col":
import pandas as pd
df = pd.read_csv('bank-full.csv', sep = '[;]')
df[col] = pd.to_numeric(df[col])
Let me know if you have further questions, or better yet, share the data with me if you can't get this to work for you.
Hi I have looked but on stackoverflow and not found a solution for my problem. Any help highly appeciated.
After importing a csv I noticed that all the types of the columns are object and not float.
My goal is to convert all the columns but the YEAR column to float. I have read that you first have to strip the columns for taking blanks out and then also convert NaNs to 0 and then try to convert strings to floats. But in the code below I'm getting an error.
My code in Jupyter notes is:
And I get the following error.
How do I have to change the code.
All the columns but the YEAR column have to be set to float.
If you can help me set the column Year to datetime that would be also very nice. But my main problem is getting the data right so I can start making calculations.
Thanks
Runy
Easiest would be
df = df.astype(float)
df['YEAR'] = df['YEAR'].astype(int)
Also, your code fails because you have two columns with the same name BBPWN, so when you do df['BBPWN'], you will get a dataframe with those two columns. Then, df['BBPWN'].str will fail.
I have a dataframe which is like as shown below
Though I know the column names are 'FR', 'ig' and 'te' with the help of below command.
dataFramesDict['Tri'].columns
What does name = 'level_1' mean here? Moreover, I also don't see subject_ID in the columns or index list. What is subject_ID here?
How do I get the output to be like as shown below
I tried the below code to rename 'level_1' to 'subject_ID' but it doesn't work
dataFramesDict['Tri'].index = dataFramesDict['Tri'].index.rename('subject_ID')
Please note that the data is just a sample data. I am only interested in changing the first column name and dropping that 'level_1'. Nothing to do with data
I am unable to create dataframe of this type through sample code. The above shown dataframe is a result of another complex code. So, I have provided a screenshot of dataframe
Try this
df.columns.name= ''
df.reset_index(inplace=True)
Using ipython for interactive manipulation, the autocomplete feature helps expanding columns names quickly.
But given the column object, I'd like to get it's name but I haven't found a simple way to do it. Is there one?
I'm trying to avoid typing the full "ALongVariableName"
x = "ALongVariableName"
relevantColumn = df[x]
instead I type "df.AL<\Tab>" to get my series. So I have:
relevantColumn = df.ALongVariableName #now how can I get x?
But that series object doesn't carry its name or index in the dataframe. Did I miss it?
Thanks!