I have my code here:
df = pd.DataFrame({'key': 1}, index=[0])
df.to_parquet(r"C:\Users\name\Desktop\file.parquet", engine='pyarrow', index=False)
df = pd.read_parquet(r"C:\Users\name\Desktop\file.parquet", engine='fastparquet')
Whenever I run it, I get the same output:
key
0 1
It's not showing the contents of my parquet file. I tried pd.DataFrame.reset_index(), I tried deleting the dataframe and creating a new one, I tried df.reset_index(), I tried everything and the output is always the same. How do I reset what I did? I think the issue arose when I did this:
df = pd.DataFrame({'key': 1}, index=[0])
and now it seems like I can't go back. I even deleted my file and created a new one.
Related
I am using drop() to try to remove a row from a dataframe, based on an index. Nothing seems to happen. I get lots of errors if I play with the syntax, but the below example yields the same dataframe I started with.
data= {"col1":[1, 3, 3,3],"col2":[4,5,6,4],"col3":[7,6,6,8]}
testdf2 = pd.DataFrame(data)
testdf2.drop([1])
testdf2
I assume I'm missing something obvious?
When using drop, you must reassign to your df (Or create a new one).
import pandas as pd
data= {"col1":[1, 3, 3,3],"col2":[4,5,6,4],"col3":[7,6,6,8]}
testdf2 = pd.DataFrame(data)
testdf2 = testdf2.drop([1])
testdf2
Alternatively, supply inplace.
testdf2.drop([1], inplace=True)
However, this could lead to complications regarding view / copy and I usually reassign.
Hope that helps!
I am working with a dataset. As a precautionary measure, I created a back-up copy using the following command.
Orig. Dataframe = df
df_copy = df.copy(deep = True)
Now, I dropped few columns from original dataframe (df) by mistake using inplace = True.
I tried to undo the operation, but no use.
So, the question is how to get my original dataframe (df) from copied dataframe (df_copy) ?
Yoy cannot restore it. Code like below dosen't work.
df = df_copy.copy(deep = True)
Every variables which reference original df keep reference after operation above.
I created a dictionary to pass into df.rename() in order to change some of the column names into something readable. However, only some of the column names change while others stay the same.
I'm learning from a site that has Jupyter notebook integrated into it and also following along using Jupyter notebook on another laptop as a fallback.
Everything is fine on the site's Jupyter notebook but something is wrong on my laptop's. I've checked for spelling errors and spacing but everything seems fine.
df = pd.DataFrame({'Record ID': [1,2,3], 'Gender. What is your Gender?': ['M', 'F', 'M']})
dictionary = {'Record ID': 'id', 'Gender. What is your Gender?': 'gender'}
updatedDF = df.rename(dictionary, axis=1)
So for example, Record ID changes successfully to id but Gender. What is your Gender? stays the same. It should change to gender.
Well I'm led to believe the problem is actually a unicode problem and I'm not exactly sure how to fix that.
I came across a solution that bypasses that by just accessing the column name directly instead.
df = pd.DataFrame({'Record ID': [1,2,3], 'Gender. What is your Gender.': ['M', 'F', 'M']})
dictionary = {'Record ID': 'id', df.columns[-1]: 'gender'}
updatedDF = df.rename(dictionary, axis=1)
I have attached a screenshot of my excel sheet. I want to store the length of every string in SUPPLIER_id Length column. But when I run my code, CSV columns are blanks.
And when I use this same code in different CSV, it works well.
I am using following code but not able to print the data.
I have attached the snippet of csv. Can somebody tell me why is this happening:
import pandas as pd
data = pd.read_csv(r'C:/Users/patesari/Desktop/python work/nba.csv')
df = pd.DataFrame(data, columns= ['SUPPLIER_ID','ACTION'])
data.dropna(inplace = True)
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data['SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)
data
print(df)
data.to_csv("C:/Users/patesari/Desktop/python work/nba.csv")
I faced a similar problem in the past.
Instead of:
df = pd.DataFrame(data, columns= ['SUPPLIER_ID','ACTION'])
Type this:
data.columns=['SUPPLIER_ID','ACTION']
Also, I don't understand why did you create DataFrame df. It was unnecessary in my opinion.
Aren't you getting a SettingWithCopyWarning from pandas? I would imagine (haven't ran this code) that these lines
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data['SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)
would not do anything, and should be replaced with
data.loc[:, 'SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data.loc[:, 'SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data.loc[:, 'SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)
I'm trying to drop the last row in a dataframe created by pandas in python and seem to be having trouble.
index = DateRange('1/1/2000', periods=8)
df = DataFrame(randn(8, 3), index=index, columns=['A', 'B', 'C'])
I tried the drop method like this:
df.drop([shape(df)[0]-1], axis = 0)
but it keeps saying label not contained in the axis.
I also tried to drop by index name and it still doesn't seem to be working.
Any advice would be appreciated. Thanks!!!
df.ix[:-1]
returns the original DataFrame with the last row removed.
Referencing the DataFrame directly to retrieve all but the last index worked for me.
df[:-1]