I'm trying to take a dictionary object in python, write it out to a csv file, and then read it back in from that csv file.
But it's not working. When I try to read it back in, it gives me the following error:
EmptyDataError: No columns to parse from file
I don't understand this for two reasons. Firstly, if I used pandas very own to_csv method, it should
be giving me the correct format for a csv. Secondly, when I print out the header values (by doing this : print(df.columns.values) ) of the dataframe that I'm trying to save, it says I do in fact have headers ("one" and "two".) So if the object I was sending out had column names, I don't know why they wouldn't be found when I'm trying to read it back.
import pandas as pd
testing = {"one":1,"two":2 }
df = pd.DataFrame(testing, index=[0])
file = open('testing.csv','w')
df.to_csv(file)
new_df = pd.read_csv("testing.csv")
What am I doing wrong?
Thanks in advance for the help!
The default pandas.DataFrame.to_csv takes a path and not an text io. Just remove the file declaration and directly use the path, pass index = False to skip indexes.
import pandas as pd
testing = {"one":1,"two":2 }
df = pd.DataFrame(testing, index=[0])
df.to_csv('testing.csv', index = False)
new_df = pd.read_csv("testing.csv")
Related
Hi I am trying to create multiple csv files from a single big csv using python. The original csv file has multiple stocks data in 1 min date/time with Open, high, low, close, volume as other columns.
Sample data from original file is here
At first, I tried to copy individual Ticker and all its corresponding values to a new file with following code:
import pandas as pd
excel_file_path=r'C:\Users\mahan\Documents\test projects\01_07_APR_WEEKLY_expiry_data_VEGE_NF_AND_BNF_Options_Desktop_Vege.csv'
export_path=r"C:\Users\mahan\Documents\exportfiles\{output_file_name}_sheet.csv"
data= pd.read_csv(excel_file_path, index_col="Ticker") #Making data frame from csv file
rows= data.loc[['NIFTYWK17500CE']] #Retrieving rows by loc method
output_file_name ="NIFTYWK17500CE_"
print(type(rows))
rows
rows.to_csv(export_path)
Result was something like this:
a file was saved with the name "{output_file_name}__sheet.csv"
I failed at naming the file but data was copied pertaining to all the values with Ticker value 'NIFTYWK17500CE'.
Then I tried to create a array with column "Ticker" to find unique values. Created a dataframe with original file for all the data. And tried to use a For loop for values in the array matching the 1st column 'Ticker' and copy those data to a new file using the value in the exporting csv file name.
code as below:
import pandas as pd
excel_file_path=r'C:\Users\mahan\Documents\test projects\01_07_APR_WEEKLY_expiry_data_VEGE_NF_AND_BNF_Options_Desktop_Vege.csv'
df2=pd.read_csv(excel_file_path)
df2_uniques =df2['Ticker'].unique()
df2_counts=df2['Ticker'].value_counts()
for value in df2_uniques:
value=value.replace(' ', '_')
export_path=r"C:\Users\mahan\Documents\exportfiles\{value}__sheet.csv"
df=pd.read_csv(excel_file_path,index_col="Ticker")
rows=df.loc[['value']]
print(type(rows))
rows.to_csv(export_path)
Received an error:
KeyError: "None of [Index(['value'], dtype='object', name='Ticker')] are in the [index]"
Where did I went wrong:
In naming the file properly to save in earlier code.
In the second code.
Any help is really appreciated. Thanks in advance.
SOLVED
What worked for me was the following with comments:
import pandas as pd
excel_file_path=r'C:\Users\mahan\Documents\test projects\01_07_APR_WEEKLY_expiry_data_VEGE_NF_AND_BNF_Options_Desktop_Vege.csv'
df2=pd.read_csv(excel_file_path)
df2_uniques =df2['Ticker'].unique()
for value in df2_uniques:
value=value.replace(' ', '_')
df=pd.read_csv(excel_file_path,index_col="Ticker")
rows=df.loc[[value]] #Changed from 'value' to value
print(type(rows))
rows.to_csv(r'_'+value+'.csv')
#Removed export_path as filename and filepath together were giving me hard time to figure out.
#The files get saved in same filepath as the original imported filepath. So that'll do. sharing just for reference
Final output looks like this:
I can't know for sure without seeing the dataframe, but the error indicates that there is no column name 'Ticker'. It appears that you set this column to be the index, so you can try df2_uniques = set(df2.index).
changed
rows=df.loc[['value']]
to
rows=df.loc[[value]]
Also, Removed export_path as both filename and filepath together were giving me hard time to figure out.
The files get saved in same filepath as the original imported filepath. So that'll do. Sharing just for reference
Final code that worked looked like this:
import pandas as pd
excel_file_path=r'C:\Users\mahan\Documents\test projects\01_07_APR_WEEKLY_expiry_data_VEGE_NF_AND_BNF_Options_Desktop_Vege.csv'
df2=pd.read_csv(excel_file_path)
df2_uniques =df2['Ticker'].unique()
for value in df2_uniques:
value=value.replace(' ', '_')
df=pd.read_csv(excel_file_path,index_col="Ticker")
rows=df.loc[[value]] #Changed from 'value' to value
print(type(rows))
rows.to_csv(r'_'+value+'.csv')
I have an excel file that contains the names of 60 datasets.
I'm trying to write a piece of code that "enters" the Excel file, accesses a specific dataset (whose name is in the Excel file), gathers and analyses some data and finally, creates a new column in the Excel file and inserts the information gathered beforehand.
I can do most of it, except for the part of adding a new column and entering the data.
I was trying to do something like this:
path_data = **the path to the excel file**
recap = pd.read_excel(os.path.join(path_data,'My_Excel.xlsx')) # where I access the Excel file
recap['New information Column'] = Some Value
Is this a correct way of doing this? And if so, can someone suggest a better way (that works ehehe)
Thank you a lot!
You can import the excel file into python using pandas.
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx')
print (df)
If you have many sheets, then you could do this:
import pandas as pd
df = pd.read_excel (r'Path\Filename.xlsx', sheet_name='sheetname')
print (df)
To add a new column you could do the following:
df['name of the new column'] = 'things to add'
Then when you're ready, you can export it as xlsx:
import openpyxl
# to excel
df.to_excel(r'Path\filename.xlsx')
I want to display the csv file name which is read by pandas.read_csv() function. I tried the below code but I couldn't display the csv file name.
import pandas as pd
df=pd.read_csv("abc.csv")
print(df.info())
I want to display the "abc". Guide me for my situation. Thanks in advance.
The pandas.read_csv() method accepts a File object (actually any file-like object with a read() method).
And the File class has a name object that has the name of the opened file.
I see this code and situation as absolutely meaningless since you already know the file name beforehand, but for the sake of completeness, here you go:
import pandas as pd
csv_file = open("your_csv_filename.csv")
print(csv_file.name)
df = pd.read_csv(csv_file)
When you use pandas read_csv function, you get a dataframe that does not include the file name. So the solution is storing the name of the .csv in a variable, and then print it. You can check about pandas dataframe in pandas.DataFrame Documentation
import pandas as pd
name = "abc.csv"
df=pd.read_csv(name)
print(name.split(".")[0])
You can use something like this as read_csv does not save the file_name.
Using glob will give you the ability to put wildcards or regex for all the CSV files on that folder for reading.
import glob
data = {}
for filename in glob.glob("/path/of/the/csv/files/*.csv"):
data[filename.split("/")[-1].split(".")[0]] = pd.read_csv(filename)
for key, value in data.items():
print(key)
print(value.info())
print("\n\n")
filename.split("/")[-1].split('.')[0]
The above line may look complicated but it just split the file_name 2 times.
I am trying to parse this CSV data which has quotes in between in unusual pattern and semicolon in the end of each row.
I am not able to parse this file correctly using pandas.
Here is the link of data (The pastebin was for some reason not recognizing as text / csv so picked up any random formatting please ignore that)
https://paste.gnome.org/pr1pmw4w2
I have tried using the "," as delimiter, and normal call of pandas dataframe object construction by only giving file name as parameter.
header = ["Organization_Name","Organization_Name_URL","Categories","Headquarters_Location","Description","Estimated_Revenue_Range","Operating_Status","Founded_Date","Founded_Date_Precision","Contact_Email","Phone_Number","Full_Description","Investor_Type","Investment_Stage","Number_of_Investments","Number_of_Portfolio_Organizations","Accelerator_Program_Type","Number_of_Founders_(Alumni)","Number_of_Alumni","Number_of_Funding_Rounds","Funding_Status","Total_Funding_Amount","Total_Funding_Amount_Currency","Total_Funding_Amount_Currency_(in_USD)","Total_Equity_Funding_Amount","Total_Equity_Funding_Amount_Currency","Total_Equity_Funding_Amount_Currency_(in_USD)","Number_of_Lead_Investors","Number_of_Investors","Number_of_Acquisitions","Transaction_Name","Transaction_Name_URL","Acquired_by","Acquired_by_URL","Announced_Date","Announced_Date_Precision","Price","Price_Currency","Price_Currency_(in_USD)","Acquisition_Type","IPO_Status,Number_of_Events","SimilarWeb_-_Monthly_Visits","Number_of_Founders","Founders","Number_of_Employees"]
pd.read_csv("data.csv", sep=",", encoding="utf-8", names=header)
First, you can just read the data normally. Now all data would be in the first column. You can use pyparsing module to split based on ',' and assign it back. I hope this solves your query. You just need to do this for all the rows.
import pyparsing as pp
import pandas as pd
df = pd.read_csv('input.csv')
df.loc[0] = pp.commaSeparatedList.parseString(df['Organization Name'][0]).asList()
Output
df #(since there are 42 columns, pasting just a snipped)
I'm trying to work with data in a google spreadsheet, reading it into a csv and then working with it as a dataframe using pandas.read_csv().
I can get the csv read out into a variable (the variable "data" below), but cannot then use pandas.read_csv() on the variable. I've tried casting it as a string, using os.cwd(), etc.
r = requests.get('I put my google sheets url here')
data = r.text
print(data)
#csv is printed out properly
df = pd.read_csv(filepath_or_buffer = data, header = 1, usecols = ["Latitude", "Longitude"])
print(df)
No matter what I try, I always get a FileNotFoundException.
I'm a python newbie, so I'm probably missing something something really obvious. Thank you!
If the first parameter to read_csv is a string (as it is in your case) it treats it as a file path that it tries to open. Hence the FileNotFoundException.
You need your data in a file-like object. Try using io.StringIO like so:
import io
r = requests.get('I put my google sheets url here')
data = r.text
buffer = io.StringIO(data)
df = pd.read_csv(filepath_or_buffer = buffer, header = 1, usecols = ["Latitude", "Longitude"])
You can do this with StringIO:
import pandas as pd
import io
import requests
url="I put my google sheets csv url here"
s=requests.get(url).content
c=pd.read_csv(io.StringIO(s.decode('utf-8')))