I hve a dataframe where I want to use the group by function for Region column.It works fine in data frame
I am doing
import pandas as pd
#df=pd.read_csv(r'C:\Users\mobeen\Downloads\pminus.csv')
df=pd.read_csv(r'C:\Users\final.csv')
print(df)
df1=[v for k, v in df.groupby('region')]
df1
df1.to_csv('filename2',na_rep='Nan',index=False)
but after that I want to write the output in csv and it throws following error
AttributeError: 'list' object has no attribute 'to_csv'
How can I write it into csv?
I already checked this but it is not working
You are getting subgroup to list, you can try loop the list and export them to csv by appending
dfs = [v for k, v in df.groupby('region')]
for df in dfs:
df.to_csv('filename2', mode='a', na_rep='Nan', index=False)
You are calling to_csv from the module (in pd.to_csv(df1,'filename2',na_rep='Nan',index=False). Call df1.to_csv('filename2',na_rep='Nan',index=False), as that is the actual dataframe.
Although your code does not show you calling to_csv(), the error suggests that you're calling the function directly on the pandas module, i.e. pd.to_csv(). You should call the function on the dataframe, like so: df.to_csv(filename).
Related
I have a dataframe that I have generated myself like:
I would want to save it using pickle (the only way I know to do that) using:
df.to_pickle(file_name)
To use it then with:
df = pd.read_pickle(file_name)
The problem is that using this I'm getting a file that the other program don't know how to read:
df = pd.read_pickle("dataframe.pkl")
print(df)
I'm getting an AttributeError:
AttributeError: 'DataFrame' object has no attribute '_data'
Thanks for your help.
For a current project, I am planning to clean a Pandas DataFrame off its Null values. For this purpose, I want to use Pandas.DataFrame.fillna, which is apparently a solid soliton for data cleanups.
When running the below code, I am however receiving the following error AttributeError: module 'pandas' has no attribute 'df'. I tried several options to rewrite the line df = pd.df().fillna, none of which changed the outcome.
Is there any smart tweak to get this running?
import string
import json
import pandas as pd
# Loading and normalising the input file
file = open("sp500.json", "r")
data = json.load(file)
df = pd.json_normalize(data)
df = pd.df().fillna
When you load the file to the pandas - in your code the data variable is a DataFrame instance. However, you made a typo.
df = pd.json_normalize(data)
df = df.fillna()
I'm trying to take a dictionary object in python, write it out to a csv file, and then read it back in from that csv file.
But it's not working. When I try to read it back in, it gives me the following error:
EmptyDataError: No columns to parse from file
I don't understand this for two reasons. Firstly, if I used pandas very own to_csv method, it should
be giving me the correct format for a csv. Secondly, when I print out the header values (by doing this : print(df.columns.values) ) of the dataframe that I'm trying to save, it says I do in fact have headers ("one" and "two".) So if the object I was sending out had column names, I don't know why they wouldn't be found when I'm trying to read it back.
import pandas as pd
testing = {"one":1,"two":2 }
df = pd.DataFrame(testing, index=[0])
file = open('testing.csv','w')
df.to_csv(file)
new_df = pd.read_csv("testing.csv")
What am I doing wrong?
Thanks in advance for the help!
The default pandas.DataFrame.to_csv takes a path and not an text io. Just remove the file declaration and directly use the path, pass index = False to skip indexes.
import pandas as pd
testing = {"one":1,"two":2 }
df = pd.DataFrame(testing, index=[0])
df.to_csv('testing.csv', index = False)
new_df = pd.read_csv("testing.csv")
I have a single, large, file. It has 40,955,924 lines and is >13GB. I need to be able to separate this file out into individual files based on a single field, if I were using a pd.DataFrame I would use this:
for k, v in df.groupby(['id']):
v.to_csv(k, sep='\t', header=True, index=False)
However, I get the error KeyError: 'Column not found: 0' there is a solution to this specific error on Iterate over GroupBy object in dask, but this requires using pandas to store a copy of the dataframe, which I cannot do. Any help on splitting this file up would be greatly appreciated.
You want to use apply() for this:
def do_to_csv(df):
df.to_csv(df.name, sep='\t', header=True, index=False)
return df
df.groupby(['id']).apply(do_to_csv, meta=df._meta).size.compute()
Note
- the group key is stored in the dataframe name
- we return back the dataframe and supply a meta; this is not really necessary, but you will need to compute on something and it's convenient to know exactly what that thing is
- the final output will be the number of rows written.
I'm trying to write a 4 table, 3 column, and 50 row dataframe file to a csv using pandas. I'm getting the following error AttributeError: 'dict' object has no attribute 'to_csv'. I believe I'm writing the syntax correctly, but could anyone point out where my syntax is incorrect in trying to write a dataframe to a csv?
'dict' object has no attribute 'to_csv'
import pandas as pd
import numpy as np
df = pd.read_excel("filelocation.xlsx",
sheetname=['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data'],
skiprows=8, parse_cols="B:D", keep_default_na='FALSE', na_values=['NULL'])
df.to_csv('filelocation.csv', line_terminator=',', index=False, header=False) #error occurs on this line
Your intuition is right; there's nothing wrong with the syntax in your code.
You are receiving the AttributeError because you are reading data from multiple sheets within your workbook, generating a dictionary of DataFrames (instead of one DataFrame), from which you attempt to_csv (a method only available to a DataFrame).
As your code is written, the keys of the dictionary you generate correspond to the names of the worksheets, and the values are the respective DataFrames. It's all explained in the docs for the read_excel() method.
To write a csv file containing the aggregate data from all the worksheets, you could loop through the worksheets and append each DataFrame to your file (this works if your sheets have the same structure and dimensions):
import pandas as pd
import numpy as np
sheets = ['pnl1 Data ','pnl2 Data','pnl3 Data','pnl4 Data']
for sheet in sheets:
df = pd.read_excel("filelocation.xlsx",
sheetname=sheet,
skiprows=8,
parse_cols="B:D",
keep_default_na='FALSE',
na_values=['NULL'])
with open('filelocation.csv', 'a') as f:
df.to_csv(f, line_terminator=',', index=False, header=False)