Opening csv file in jupyter notebook - python

I tried to open a csv file in jupyter notebook, but it shows error message. And I didn't understand the error message. CSV file and jupyter notebook file is in the same directory. plz check the screenshot to see the error message
jupyter notebook code
csv file and jupyter notebook file is in same directory

As others have written it's a bit difficult to understand what exactly is your problem.
But why don't you try something like:
with open("file.csv", "r") as table:
for row in table:
print(row)
# do something
Or:
import pandas as pd
df = pd.read_csv("file.csv", sep=",")
# shows top 10 rows
df.head(10)
# do something

You can use the in-built csv package
import csv
with open('my_file.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for row in csv_reader:
print(row)
This will print each row as an array of items representing each cell.
However, using Jupyter notebook you should use Pandas to nicely display the csv as a table.
import pandas as pd
df = pd.read_csv("test.csv")
# Displays top 5 rows
df.head(5)
# Displays whole table
df
Resources
The csv module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel.
Read More CSV: https://docs.python.org/3/library/csv.html
pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
Read More Pandas: https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html

Use pandas for csv reading.
import pandas as pd
df=pd.read_csv("AppleStore.csv")
You can used head/tail function to see the values. Use dtypes to see the types of all the values. You can check the documentation.

Related

Reading XLSB (binary) file with Pandas read_excel using pyxlsb reads empty rows for some xlsb file

I'm trying to read binary Excel files using read_excel method in pandas with pyxlsb engine as below:
import pandas as pd
df = pd.read_excel('test.xlsb', engine='pyxlsb')
If the xlsb file is like this file (Right now, I'm sharing this file via WeTransfer, but if there is a better way to share files on StackOverflow, let me know), the returned dataframe is filled with NaN's. I suspected that it might be because the file was saved with active cell pointing at the empty cells after the data originally. So I tried this:
import pandas as pd
with open('test.xlsb', 'rb') as data:
data.seek(0,0)
df = pd.read_excel(data, engine='pyxlsb')
but it still doesn't seem to work. I also tried reading the data from byte number 0 (from the beginning), writing it into a new file, 'test_1.xlsb', and finally reading it with pandas, but that doesn't work.
with open('test.xlsb','rb') as data:
data.seek(0,0)
with open('test_1.xlsb','wb') as outfile:
outfile.write(data.read())
df = pd.read_excel('test_1.xlsb', engine='pyxlsb')
If anyone has suggestion as to what might be going on and how to resolve it, I'd greatly appreciate the help.

trying to import a excel csv (?!) file with panda

I am new to Python/Panda and I am trying to import the following file in Jupyter notebook via pd.read_
Initial file lines:
either pd.read_excel or pd.read_csv returned an error.
eliminating the first row allowed me to read the file but all csv data were not separated.
could you share the line of code you have used so far to import the data?
Maybe try this one here:
data = pd.read_csv(filename, delimiter=',')
It is always easier for people to help you if you share the relevant code accompanied by the error you are getting.

how can I quickly convert in python an xlsx file into a csv file?

I have a 140MB Excel file I need to analyze using pandas. The problem is that if I open this file as xlsx it takes python 5 minutes simply to read it. I tried to manually save this file as csv and then it takes Python about a second to open and read it! There are different 2012-2014 solutions that why Python 3 don't really work on my end.
Can somebody suggest how to convert very quickly file 'C:\master_file.xlsx' to 'C:\master_file.csv'?
There is a project aiming to be very pythonic on dealing with data called "rows". It relies on "openpyxl" for xlsx, though. I don't know if this will be faster than Pandas, but anyway:
$ pip install rows openpyxl
And:
import rows
data = rows.import_from_xlsx("my_file.xlsx")
rows.export_to_csv(data, open("my_file.csv", "wb"))
I faced the same problem as you. Pandas and openpyxl didn't work for me.
I came across with this solution and that worked great for me:
import win32com.client
xl=win32com.client.Dispatch("Excel.Application")
xl.DisplayAlerts = False
xl.Workbooks.Open(Filename=your_file_path,ReadOnly=1)
wb = xl.Workbooks(1)
wb.SaveAs(Filename='new_file.csv', FileFormat='6') #6 means csv
wb.Close(False)
xl.Application.Quit()
wb=None
xl=None
Here you convert the file to csv by means of Excel. All the other ways that I tried refuse to work.
Use read-only mode in openpyxl. Something like the following should work.
import csv
import openpyxl
wb = load_workbook("myfile.xlsx", read_only=True)
ws = wb['sheetname']
with open("myfile.csv", "wb") as out:
writer = csv.writer(out)
for row in ws:
values = (cell.value for cell in row)
writer.writerow(values)
Fastest way that pops to mind:
pandas.read_excel
pandas.DataFrame.to_csv
As an added benefit, you'll be able to do cleanup of the data before saving it to csv.
import pandas as pd
df = pd.read_excel('C:\master_file.xlsx', header=0) #, sheetname='<your sheet>'
df.to_csv('C:\master_file.csv', index=False, quotechar="'")
At some point, dealing with lots of data will take lots of time. Just a fact of life. Good to look for options if it's a problem, though.

Read data of a CSV file to create a new CSV file

I have some data on a CSV file. As you can see in the code, I can read the file and print the info I need. The problem is when I try to create a new CSV file with some info of Original CSV file. I would like to save my analyzed info in a new CSV. I don't know how to use the original info to make a new file.
Data.csv
enter image description here
import csv
with open('Data.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
analyzed = (row[0],row[3],row[3]<0.25)
print(analyzed)
You probably want to use pandas when it comes to CSV files or table-like data:
import pandas as pd
df_data = pd.DataFrame.from_csv('Data.csv')
# Analyze
for index, row in df_data.iterrows():
pass
df_data.to_csv('new_Data.csv')
For reading you have several options like
pandas.DataFrame.from_csv
pandas.read_csv
pandas.read_table
and, as you see, use
pandas.DataFrame.to_csv
to save your transformed or newly created DataFrame.
For installation run
pip install pandas

xlsxwriter - how to create from csv file

I'm going to poll controllers in our data centers and output all of them to a csv file. The python tool, xlsxwriter, looks to be the best for it. However, I don't see any mention of how to simply take a csv file and convert it to xlsx.
Xlsxwriter seems to be great for making an xlsx file based on the python script it's in, but I don't know how to gather that data from a csv file.
If you don't mind an answer with another package dependency, I highly recommend pandas for I/O operations like this. It's hard to beat in terms of both code economy and performance. Also, if you need to do any manipulations (filtering, sorting, etc.) on the data before writing to xslx, it's already in a handy dataframe.
You could do something like:
import pandas as pd
import xlsxwriter
path = 'some/path/'
#read the csv into a pandas dataframe
data = pd.read_csv(path + 'input.csv')
#setup the writer
writer = pd.ExcelWriter(path + 'output.xlsx', engine='xlsxwriter')
#write the dataframe to an xlsx file
data.to_excel(writer, sheet_name='mysheet', index=False)
writer.save()

Categories