I am not sure if I am making this question correctly but here's my issue:
I have a .csv file (InjectionWells.csv) that I need to split into columns based on commas. When I do it, it just doesn't work and I can only think might be an encoding but I don't know how to fix it. Can someone shed a light?
Here are few lines of the actual file:
API#,Operator,Operator ID,WellType,WellName,WellNumber,OrderNumbers,Approval Date,County,Sec,Twp,Rng,QQQQ,LAT,LONG,PSI,BBLS,ZONE,,,
3500300026,PHOENIX PETROCORP INC,19499,2R,SE EUREKA UNIT-TUCKER #1,21,133856,9/6/1977,ALFALFA,13,28N,10W,C-SE SE,36.9003240,-98.2182600,"2,500",300,CHEROKEE,,,
3500300163,CHAMPLIN EXPLORATION INC,4030,2R,CHRISTENSEN,1,470258,11/27/2002,ALFALFA,21,28N,09W,C-NW NW,36.8966360,-98.1777200,"2,400","1,000",RED FORK,,,
3500320786,LINN OPERATING INC,22182,2R,NE CHEROKEE UNIT,85,329426,8/19/1988,ALFALFA,24,27N,11W,SE NE,36.8061130,-98.3258400,"1,050","1,000",RED FORK,,,
3500321074,SANDRIDGE EXPLORATION & PRODUCTION LLC,22281,2R,VELMA,2-19,281652,7/11/1985,ALFALFA,19,28N,10W,SW NE NE SW,36.8885890,-98.3185300,"3,152","1,000",RED FORK,,,
I have tried both of these and non of them work:
1.
import pandas as pd
df=pd.read_csv('InjectionWells.csv', sep=',')
print(df)
import pandas as pd
test_data2=pd.read_csv('InjectionWells.csv', sep=',', encoding='utf-8')
test_data2.head()
As your CSV files contain some non-ASCII characters also, you need to pass a different encoding. UTF-8 can't handle that.
I tried this and it's working:
import pandas as pd
test_data2=pd.read_csv('InjectionWells.csv', sep=',', encoding='ISO-8859-1')
print(test_data2)
I'm trying to read binary Excel files using read_excel method in pandas with pyxlsb engine as below:
import pandas as pd
df = pd.read_excel('test.xlsb', engine='pyxlsb')
If the xlsb file is like this file (Right now, I'm sharing this file via WeTransfer, but if there is a better way to share files on StackOverflow, let me know), the returned dataframe is filled with NaN's. I suspected that it might be because the file was saved with active cell pointing at the empty cells after the data originally. So I tried this:
import pandas as pd
with open('test.xlsb', 'rb') as data:
data.seek(0,0)
df = pd.read_excel(data, engine='pyxlsb')
but it still doesn't seem to work. I also tried reading the data from byte number 0 (from the beginning), writing it into a new file, 'test_1.xlsb', and finally reading it with pandas, but that doesn't work.
with open('test.xlsb','rb') as data:
data.seek(0,0)
with open('test_1.xlsb','wb') as outfile:
outfile.write(data.read())
df = pd.read_excel('test_1.xlsb', engine='pyxlsb')
If anyone has suggestion as to what might be going on and how to resolve it, I'd greatly appreciate the help.
I tried to open a csv file in jupyter notebook, but it shows error message. And I didn't understand the error message. CSV file and jupyter notebook file is in the same directory. plz check the screenshot to see the error message
jupyter notebook code
csv file and jupyter notebook file is in same directory
As others have written it's a bit difficult to understand what exactly is your problem.
But why don't you try something like:
with open("file.csv", "r") as table:
for row in table:
print(row)
# do something
Or:
import pandas as pd
df = pd.read_csv("file.csv", sep=",")
# shows top 10 rows
df.head(10)
# do something
You can use the in-built csv package
import csv
with open('my_file.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for row in csv_reader:
print(row)
This will print each row as an array of items representing each cell.
However, using Jupyter notebook you should use Pandas to nicely display the csv as a table.
import pandas as pd
df = pd.read_csv("test.csv")
# Displays top 5 rows
df.head(5)
# Displays whole table
df
Resources
The csv module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel.
Read More CSV: https://docs.python.org/3/library/csv.html
pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
Read More Pandas: https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html
Use pandas for csv reading.
import pandas as pd
df=pd.read_csv("AppleStore.csv")
You can used head/tail function to see the values. Use dtypes to see the types of all the values. You can check the documentation.
I have a 140MB Excel file I need to analyze using pandas. The problem is that if I open this file as xlsx it takes python 5 minutes simply to read it. I tried to manually save this file as csv and then it takes Python about a second to open and read it! There are different 2012-2014 solutions that why Python 3 don't really work on my end.
Can somebody suggest how to convert very quickly file 'C:\master_file.xlsx' to 'C:\master_file.csv'?
There is a project aiming to be very pythonic on dealing with data called "rows". It relies on "openpyxl" for xlsx, though. I don't know if this will be faster than Pandas, but anyway:
$ pip install rows openpyxl
And:
import rows
data = rows.import_from_xlsx("my_file.xlsx")
rows.export_to_csv(data, open("my_file.csv", "wb"))
I faced the same problem as you. Pandas and openpyxl didn't work for me.
I came across with this solution and that worked great for me:
import win32com.client
xl=win32com.client.Dispatch("Excel.Application")
xl.DisplayAlerts = False
xl.Workbooks.Open(Filename=your_file_path,ReadOnly=1)
wb = xl.Workbooks(1)
wb.SaveAs(Filename='new_file.csv', FileFormat='6') #6 means csv
wb.Close(False)
xl.Application.Quit()
wb=None
xl=None
Here you convert the file to csv by means of Excel. All the other ways that I tried refuse to work.
Use read-only mode in openpyxl. Something like the following should work.
import csv
import openpyxl
wb = load_workbook("myfile.xlsx", read_only=True)
ws = wb['sheetname']
with open("myfile.csv", "wb") as out:
writer = csv.writer(out)
for row in ws:
values = (cell.value for cell in row)
writer.writerow(values)
Fastest way that pops to mind:
pandas.read_excel
pandas.DataFrame.to_csv
As an added benefit, you'll be able to do cleanup of the data before saving it to csv.
import pandas as pd
df = pd.read_excel('C:\master_file.xlsx', header=0) #, sheetname='<your sheet>'
df.to_csv('C:\master_file.csv', index=False, quotechar="'")
At some point, dealing with lots of data will take lots of time. Just a fact of life. Good to look for options if it's a problem, though.
I met a DF file which is encoded in binary format. But when I open it using Vim, still I can see characters like "pandas.core.frame", "numpy.core.multiarray". So I guess it is related with Python. However I know little about the Python language. Though I have tried using pandas and numpy modules, I failed to read the file. Could you guys give any suggestion on this issue? Thank you in advance. Here is the Dropbox link to the DF file: https://www.dropbox.com/s/b22lez3xysvzj7q/flux.df
Looks like DataFrame stored with pickle, use read_pickle() to read it:
import pandas as pd
df = pd.read_pickle('flux.df')