Python Read In Google Spreadsheet Using Pandas - python

I have file in Google sheets I want to read it into a Pandas Dataframe. But gives me an error i don't know what's it.
this is the code :
import pandas as pd
sheet_id = "1HUbEhsYnLxJP1IisFcSKtHTYlFj_hHe5v21qL9CVyak"
df = pd.read_csv(f"https://docs.google.com/spreadsheets/d/{sheet_id}/export?
gid=556844753&format=csv")
print(df)
And this is the error :
File "c:\Users\blaghzao\Documents\Stage PFA(Laghzaoui Brahim)\google_sheet.py", line 3, in <module>
df = pd.read_csv(f"https://docs.google.com/spreadsheets/d/{sheet_id}/export?gid=556844753&format=csv")
File "C:\Users\blaghzao\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\Users\blaghzao\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 680, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\blaghzao\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 581, in _read
return parser.read(nrows)
File "C:\Users\blaghzao\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\readers.py", line 1255, in read
index, columns, col_dict = self._engine.read(nrows)
File "C:\Users\blaghzao\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 225, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas\_libs\parsers.pyx", line 805, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas\_libs\parsers.pyx", line 861, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 1960, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 89 fields in line 3, saw 243

I found the answer, the problem it's just with access permissions of the file.
enter image description here

Remove gid from code
import pandas as pd
sheet_id = "1HUbEhsYnLxJP1IisFcSKtHTYlFj_hHe5v21qL9CVyak"
df = pd.read_csv(f"https://docs.google.com/spreadsheets/d/{sheet_id}/export?format=csv")
print(df)
Click on link for sample image

As far as I know this error rises using comma delimiter and you have more commas then expected.
Can you try with below read_csv() method to avoid them;
df = pd.read_csv(f"https://docs.google.com/spreadsheets/d/{sheet_id}/export? gid=556844753&format=csv", on_bad_lines='skip')
This will avoid bad lines so you can identify problem depending on skipped lines. I believe your csv format export is not matching with what pandas read_csv() expects.

Related

pandas how to read this row?

data sample: program go wrong with the second data for it has 7 "," while normal data only have 6.
7558,1488,1738539,,,,1
7559,1489,1702292,,"(segment \"Pesnya, ili Kak velikij Luarsab khor organizovyval\")",8,1
7560,1489,2146930,1975,,21,1
It is from imdb dataset's cast_info table. ([IMDB][2] is from a database task named cardinality estimination.) Its sep is ",". But if there were some sep in string, pandas can't recognize them.
The error log:
File "\pytorch\lib\site-packages\pandas\io\parsers\readers.py", line 488, in _read
return parser.read(nrows)
File "\pytorch\lib\site-packages\pandas\io\parsers\readers.py", line 1047, in read
index, columns, col_dict = self._engine.read(nrows)
File "\pytorch\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 223, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas\_libs\parsers.pyx", line 801, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas\_libs\parsers.pyx", line 857, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 843, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 1925, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 7 fields in line 7559, saw 8
How can I solve it?
[2]: https://www.imdb.com/interfaces/
Try this i think this should work.
import pandas as pd
pd.read_csv(data_path,sep = ",")

Unable to add more than one read_json method to a dataframe

I have two files from which i am trying to write each into a dataframe. However, when i add more than one read_json method in my code, I get the following errror:
File "/Users/pmall/Desktop/Python Pratice/hello.py", line 29, in <module>
df=pd.read_json('/Users/pmall/Desktop/Python Pratice/test2.json')
File "/Users/pmall/anaconda3/lib/python3.7/site-packages/pandas/io/json/json.py", line 427, in read_json
result = json_reader.read()
File "/Users/pmall/anaconda3/lib/python3.7/site-packages/pandas/io/json/json.py", line 537, in read
obj = self._get_object_parser(self.data)
File "/Users/pmall/anaconda3/lib/python3.7/site-packages/pandas/io/json/json.py", line 556, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/Users/pmall/anaconda3/lib/python3.7/site-packages/pandas/io/json/json.py", line 652, in parse
self._parse_no_numpy()
File "/Users/pmall/anaconda3/lib/python3.7/site-packages/pandas/io/json/json.py", line 871, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None)
ValueError: Expected object or value
The code I am trying is as following :
import pandas as pd
df=pd.read_json('/Users/pmall/Desktop/Python Pratice/test1.json')
df2=pd.read_json('/Users/pmall/Desktop/Python Pratice/test2.json')
Why am I unable to add more than one read_json?

Python script works but throws error - pandas.errors tokenizing data , Expected 9 fields saw 10

I am new to python. I am trying to read json response from requests and filtering using pandas to save in csv file. This script works and gives me all the data but its throws this error after execution -
I am not able to figure out why its throwing this error ? How can I pass this error ?
Error -
script.py line 42, in <module>
df = pd.read_csv("Data_script4.csv")
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/pandas/io/parsers.py", line 686, in read_csv
return _read(filepath_or_buffer, kwds)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/pandas/io/parsers.py", line 458, in _read
data = parser.read(nrows)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/pandas/io/parsers.py", line 1196, in read
ret = self._engine.read(nrows)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-
packages/pandas/io/parsers.py", line 2155, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 862, in
pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 918, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 905, in
pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2042, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 9 fields in line 53,
saw 10
This is my script -
if __name__ == '__main__':
parser = argparse.ArgumentParser("gets data")
parser.add_argument("-o" , dest="org", help="org name")
parser.add_argument("-p" , dest="pat", help="pat value")
args = parser.parse_args()
org = args.org
token = args.pat
url = "https://dev.azure.com/{org_name}/_apis/git/repositories?
api-
version=6.0".format(org_name=org)
data = getproject(url,token)
data_file=open("Data_script4.csv", "w",newline='')
val=data['value']
csv_writer = csv.writer(data_file)
for i in val:
if count==0:
header=i.keys()
csv_writer.writerow(header)
count +=1
csv_writer.writerow(i.values())
pro_name=[]
time=[]
df = pd.read_csv("Data_script4.csv")
for i in df["project"]:
res = ast.literal_eval(i)
pro_name.append(res['name'])
time.append(res['lastUpdateTime'])
del df["project"]
df["project name"] = pro_name
df["lastUpdateTime"] = time
df =df[["id","name","url","project
name","lastUpdateTime","defaultBranch","size","remoteUrl","sshUrl","webUrl"]]
df.head()
df.to_csv("Data_Filtered.csv",index=False)
print("\nFile Created Successfully...")
data_file.close()
os.remove('Data_script4.csv')
How can I resolve this issue ?
Your question was answered here
Here's the takeaway:
You need to substitute:
df = pd.read_csv("Data_script4.csv")
with this:
df = pd.read_csv('Data_script4.csv', error_bad_lines=False)

I am having trouble running my csv file through my code and I don't understand the error message

I am trying to run a csv file through my code and work with the data. I am receiving a error message that I don't exactly understand.
Here is the csv file
There is a lot more code but I will only include code that is relevant to the problem. Comment below if you need more info.
import pandas as pd
df_playoffs = pd.read_csv('/Users/hannahbeegle/Desktop/playoff_teams.csv.numbers', encoding='latin-1', index_col = 'team')
df_playoffs.fillna('None', inplace=True)
Here is the error message:
Traceback (most recent call last):
File "/Users/hannahbeegle/Desktop/Baseball.py", line 130, in <module>
df_playoffs = pd.read_csv('/Users/hannahbeegle/Desktop/playoff_teams.csv.numbers', encoding='latin-1', index_col = 'team')
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/parsers.py", line 435, in _read
data = parser.read(nrows)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/parsers.py", line 1139, in read
ret = self._engine.read(nrows)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/parsers.py", line 1995, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 899, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 914, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 968, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 955, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2172, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 2
Looks as though your csv is maybe tab delimited, in the line that specifies the csv, edit to something like:
pd.read_csv('/Users/hannahbeegle/Desktop/playoff_teams.csv.numbers', sep="\t", encoding='latin-1', index_col = 'team')
[edited section after comments]
If the data is "ragged" then you could try breaking it up into a dictionary, and then using that to build the dataframe - here's an example I tried with a mocked-up sample dataset:
record_dict = {}
file=open("variable_columns.csv", mode="r")
for line in file.readlines():
split_line = line.split()
record_dict[split_line[0]]=split_line[1:]
df_playoffs = pd.DataFrame.from_dict(record_dict, orient='index' )
df_playoffs.sample(5)
You might need to look at the line.split() line, and enter "\t" as the split parameter (i.e. line.split("\t") but you can experiment with this.
Also, notice that pandas has forced the data to be rectangular, so some of the columns will contain None for the "short" rows.

Python, Pandas write to dataframe, lxml.etree.SerialisationError: IO_WRITE

Code to pick the wanted lines from a dataframe. The original data is in Excel format and I put it in dataframe here.
I want to pick all the rows of “Test Date” fall in “201506” and “201508”, and write them to an Excel file. The lines are working fine.
import pandas as pd
data_short = {'Contract_type' : ["Other", "Other", "Type-I", "Type-I", "Type-I", "Type-II", "Type-II", "Type-III", "Type-III", "Part-time"],
'Test Date': ["20150816", "20150601", "20150204", "20150609", "20150204", "20150806", "20150201", "20150615", "20150822", "20150236" ],
'Test_time' : ["16:26", "07:39", "18:48", "22:32", "03:54", "03:30", "04:00", "22:02", "13:43", "10:29"],
}
df = pd.DataFrame(data_short)
data_201508 = df[df['Test Date'].astype(str).str.startswith('201508')]
data_201506 = df[df['Test Date'].astype(str).str.startswith('201506')]
data_68 = data_201506.append(data_201508)
writer = pd.ExcelWriter("C:\\test-output.xlsx", engine = 'openpyxl')
data_68.to_excel(writer, "Sheet1", index = False)
writer.save()
But when I applied them to a larger file, ~600,000 rows with 25 columns (65 MB in file size), it returns error message like below:
Traceback (most recent call last):
File "C:\Python27\Working Scripts\LL move pick wanted ATA in months.py", line 15, in <module>
writer.save()
File "C:\Python27\lib\site-packages\pandas\io\excel.py", line 732, in save
return self.book.save(self.path)
File "C:\Python27\lib\site-packages\openpyxl\workbook\workbook.py", line 263, in save
save_workbook(self, filename)
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 239, in save_workbook
writer.save(filename, as_template=as_template)
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 222, in save
self.write_data(archive, as_template=as_template)
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 80, in write_data
self._write_worksheets(archive)
File "C:\Python27\lib\site-packages\openpyxl\writer\excel.py", line 163, in _write_worksheets
xml = sheet._write(self.workbook.shared_strings)
File "C:\Python27\lib\site-packages\openpyxl\worksheet\worksheet.py", line 776, in _write
return write_worksheet(self, shared_strings)
File "C:\Python27\lib\site-packages\openpyxl\writer\worksheet.py", line 263, in write_worksheet
xf.write(worksheet.page_breaks.to_tree())
File "src/lxml/serializer.pxi", line 1016, in lxml.etree._FileWriterElement.__exit__ (src\lxml\lxml.etree.c:142025)
File "src/lxml/serializer.pxi", line 904, in lxml.etree._IncrementalFileWriter._write_end_element (src\lxml\lxml.etree.c:140218)
File "src/lxml/serializer.pxi", line 999, in lxml.etree._IncrementalFileWriter._handle_error (src\lxml\lxml.etree.c:141711)
File "src/lxml/serializer.pxi", line 195, in lxml.etree._raiseSerialisationError (src\lxml\lxml.etree.c:131087)
lxml.etree.SerialisationError: IO_WRITE
Does it mean the computer is not good enough (8GB, Win10)? Is there a way to optimize the code (for example, consume less memory)? Thank you.
btw: Question similiar to I/O Error while saving Excel file - Python but no solution...
found a solution: write the output to csv instead (anyway it can be opened in Excel as well)
data_wanted_all.to_csv("C:\\test-output.csv", index=False)
post here for in case some one encounters the same problem. let me know if this question shall be removed. :)

Categories