Trying to read a xlsb file to create a DF in pandas.
import pandas as pd
a_data = pd.ExcelFile(
r'C:\\Desktop\\a.xlsb')
df_data = pd.read_excel(a_data, 'Sheet1', engine='pyxlsb')
print(df.head())
When I run the script I keep getting this error.
OSError: File contains no valid workbook part
You can use pyxlsb, all latest version of pandas support this.
Use following code:
import pandas as pd
a_data = pd.ExcelFile(r'C:\\Desktop\\a.xlsb')
df = pd.read_excel('a_data', sheet_name='Sheet1', engine='pyxlsb')
You will have to install pyxlsbfirst using command: pip install pyxlsb
Related
Language: Python 3.8.3
I faced this error when I was importing my xlxs file ModuleNotFoundError: No module named 'xlxswriter'
import xlxswriter
import pandas as pd
from pandas import DataFrame
path = ('mypath.xlxs')
xl = pd.ExcelFile(path)
print(xl.sheet_names)
How can I fix this?
Instead of typing xlsx, type xlsx like this:
import xlsxwriter
import pandas as pd
from pandas import DataFrame
path = ('mypath.xlsx')
xl = pd.ExcelFile(path)
print(xl.sheet_names)
It'll work.
The module name is xlsxwriter not xlxswriter, so replace that line with:
import xlsxwriter
I am trying to build a panel database by appending tables by rows using same column names for data tables in pp. 149-157 from this pdf file:
https://www.uv.mx/personal/clelanda/files/2013/02/Garber-2000-Famous-first-bubbles.pdf
Here is the code I am currently using:
!pip install tabula-py
!pip install pandas
import pandas as pd
import tabula
from google.colab import files
def getLocalFiles():
_files = files.upload()
if len(_files) >0:
for k,v in _files.items():
open(k,'wb').write(v)
getLocalFiles()
#directory path
!ls
#Reading pdf tables
file = "bubbles.pdf"
path = 'bubbles.pdf'
tables = tabula.read_pdf(path, pages = [149,150,151,152,153,154,155,156,157], columns= (1,2,3,4,5,6,7))
print(tables)
#passing to csv format
from pandas import DataFrame
df=pd.DataFrame(page_1)
print(df)
df.to_csv('test.csv', index= False)
This is the output data:
In which way could I append all pdf tables?,
thanks in advance
Hi I am unable to read CSV file from the URL by using
import pandas as pd
import numpy as np
data_url = 'https://data.baltimorecity.gov/Financial/Real-Property-Taxes/27w9-urtv.csv'
df = pd.read_csv(data_url)
df.head()
I got an error: "not acceptable"
I also tried different codes importing "requests" but none of them worked. How do I fix this?
Your URL wasnt correct. This should work:
import pandas as pd
data_url = 'https://data.baltimorecity.gov/resource/27w9-urtv.csv'
df = pd.read_csv(data_url)
df.head()
I'm still new to python. I'm trying to import an excel doc into python but I get the filenotfounderror
This is what I'm running:
import pandas as pd
practiceset = (r'C:\Users\michael\Desktop\Work\Transpo\'Transportation2016.xlsx')
df = pd.read_excel(practiceset)
print (df)
The python file is in the same folder as the doc, so I'm confused.
Try this:
import pandas as pd
practiceset = (r'C:\Users\michael\Desktop\Work\Transpo\Transportation2016.xlsx')
df = pd.read_excel(practiceset)
print (df)
Hi I am new to the open source tech - I am using Anaconda3-5.1.0-Windows-x86_64 & Microsoft Excel 2016.The Excel reading operation using pandas throws error as File not found error for the below code.
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
path= "D:\sample.xlsx"
print(path)
df = pd.read_excel(path, sheet_name = 'Sheet1')
print('Column headings:')
print(df.columns)
The error message is FileNotFoundError: [Errno 2] No such file or directory: 'D:\sample.xlsx'-
I was trying to read 'D:\sample.xlsx' but the function tries to open file as 'D:\sample.xlsx'.
Can anyone please advise on this issue or shall let me know any more details required.
Change
path= "D:\sample.xlsx"
to
path= r"D:\sample.xlsx" or path= "D:\\sample.xlsx"