I have 3 different tables I'm looking to directly push to 3 separate tabs in a Google Sheet. I set up the GSpread connection and that's working well. I started to adjust my first print statement into what I thought would append the information to Tab A (waveData), but no luck.
I'm looking to append the information to the FIRST blank row in a tab. Basically, so that the data will be ADDED to what is already in there.
I'm trying to use append_rows to do this, but am hitting a "gspread.exceptions.APIError: {'code': 400, 'message': 'Invalid value at 'data.values' (type.googleapis.com/google.protobuf.ListValue).
I'm really new to this, just thought it would be a fun project to evaluate wave sizes in NJ across all major surf spots, but really in over my head (no pun intended).
Any thoughts?
import requests
import pandas as pd
import gspread
gc = gspread.service_account(filename='creds.json')
sh = gc.open_by_key('152qSpr-4nK9V5uHOiYOWTWUx4ojjVNZMdSmFYov-n50')
waveData = sh.get_worksheet(0)
tideData = sh.get_worksheet(1)
lightData = sh.get_worksheet(2)
# AddValue = ["Test", 25, "Test2"]
# lightData.insert_row(AddValue, 3)
id_list = [
'/Belmar-Surf-Report/3683/',
'/Manasquan-Surf-Report/386/',
'/Ocean-Grove-Surf-Report/7945/',
'/Asbury-Park-Surf-Report/857/',
'/Avon-Surf-Report/4050/',
'/Bay-Head-Surf-Report/4951/',
'/Belmar-Surf-Report/3683/',
'/Boardwalk-Surf-Report/9183/',
]
for x in id_list:
waveData.append_rows(pd.read_html(requests.get('http://magicseaweed.com' + x).text)
[2].iloc[:9, [0, 1, 2, 3, 4, 6, 7, 12, 15]].to_json(), value_input_option="USER_ENTERED")
# print(pd.read_html(requests.get('http://magicseaweed.com' + x).text)[0])
# print(pd.read_html(requests.get('http://magicseaweed.com' + x).text)[1])
From your following reply,
there really is no relationship between the 3. When I scrape with IMPORTHTML into Google sheets, those are just Tables at the locations 0,1, and 2. I'm basically just trying to have an output of each table on a separate tab
I understood that you wanted to retrieve the values with pd.read_html(requests.get('http://magicseaweed.com' + x).text)[2].iloc[:9, [0, 1, 2, 3, 4, 6, 7, 12, 15]] from id_list, and wanted to put the values to a sheet in Google Spreadsheet.
In this case, how about the following modification?
At append_rows, it seems that JSON data cannot be directly used. In this case, it is required to use a 2-dimensional array. And, I'm worried about the value of NaN in the datafarame. When these points are reflected in your script, how about the following modification?
Modified script 1:
In this sample, all values are put into a sheet.
gc = gspread.service_account(filename='creds.json')
sh = gc.open_by_key('152qSpr-4nK9V5uHOiYOWTWUx4ojjVNZMdSmFYov-n50')
waveData = sh.get_worksheet(0)
id_list = [
"/Belmar-Surf-Report/3683/",
"/Manasquan-Surf-Report/386/",
"/Ocean-Grove-Surf-Report/7945/",
"/Asbury-Park-Surf-Report/857/",
"/Avon-Surf-Report/4050/",
"/Bay-Head-Surf-Report/4951/",
"/Belmar-Surf-Report/3683/",
"/Boardwalk-Surf-Report/9183/",
]
# I modified the below script.
res = []
for x in id_list:
df = pd.read_html(requests.get("http://magicseaweed.com" + x).text)[2].iloc[:9, [0, 1, 2, 3, 4, 6, 7, 12, 15]].fillna("")
values = [[x], df.columns.values.tolist(), *df.values.tolist()]
res.extend(values)
res.append([])
waveData.append_rows(res, value_input_option="USER_ENTERED")
When this script is run, the retrieved values are put into the 1st sheet as follows. In this sample modification, the path and a blank row are inserted between each data. Please modify this for your actual situation.
Modified script 2:
In this sample, each value is put into each sheet.
gc = gspread.service_account(filename='creds.json')
sh = gc.open_by_key('152qSpr-4nK9V5uHOiYOWTWUx4ojjVNZMdSmFYov-n50')
id_list = [
"/Belmar-Surf-Report/3683/",
"/Manasquan-Surf-Report/386/",
"/Ocean-Grove-Surf-Report/7945/",
"/Asbury-Park-Surf-Report/857/",
"/Avon-Surf-Report/4050/",
"/Bay-Head-Surf-Report/4951/",
"/Belmar-Surf-Report/3683/",
"/Boardwalk-Surf-Report/9183/",
]
obj = {e.title: e for e in sh.worksheets()}
for e in id_list:
if e not in obj:
obj[e] = sh.add_worksheet(title=e, rows="1000", cols="26")
for x in id_list:
df = pd.read_html(requests.get("http://magicseaweed.com" + x).text)[2].iloc[:9, [0, 1, 2, 3, 4, 6, 7, 12, 15]].fillna("")
values = [df.columns.values.tolist(), *df.values.tolist()]
obj[x].append_rows(values, value_input_option="USER_ENTERED")
When this script is run, the sheets are checked and created with the sheet names of the values in id_list, and each value is put to each sheet.
Reference:
append_rows
Related
I have a table containing coordinates with associated labels A, B and C. I want to add another column that simply translates the labels to 1, 2 and 3.
import xlsxwriter
# Create some example data
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
labels = ["A", "B", "C", "B", "A"]
# Create a new Excel file and add a worksheet
workbook = xlsxwriter.Workbook('scatter_plot.xlsx')
worksheet = workbook.add_worksheet('Data')
# write column headings
worksheet.write(0,0,'x')
worksheet.write(0,1,'y')
worksheet.write(0,2,'labels')
# Write the data to the worksheet
for i in range(len(x)):
worksheet.write(i+1, 0, x[i])
worksheet.write(i+1, 1, y[i])
worksheet.write(i+1, 2, labels[i])
# Formula that writes a new column where A = 1 B = 2 C = 3
worksheet.write_dynamic_array_formula('D2:D6', '=IFS(LEFT(C2:C6,1)="A",1,LEFT(C2:C6,1)="B",2,LEFT(C2:C6,1)="C",3,TRUE,NA())')
# Add a scatter chart to the worksheet
chart = workbook.add_chart({'type': 'scatter'})
chart.add_series({
'name': 'X vs Y',
'categories': '=Data!$A$2:$A$6',
'values': '=Data!$B$2:$B$6',
})
# Insert the chart into the worksheet
worksheet.insert_chart("F1", chart)
# Save the Excel file
workbook.close()
I run this and get this excel file output:Here
The formula has no syntax errors in excel, I just have to manually press enter on the cell for it to apply the formula. Shouldn't this be done automatically?
The IFS() is a so called Future Function in Excel and generally needs to be prefixed by _xlfn. (which won't show up in the formula in Excel):
worksheet.write_dynamic_array_formula('D2:D6', '=_xlfn.IFS(LEFT(C2:C6,1)="A",1,LEFT(C2:C6,1)="B",2,LEFT(C2:C6,1)="C",3,TRUE,NA())')
Or you can use the use_future_functions Constructor option.
workbook = xlsxwriter.Workbook('scatter_plot.xlsx', {'use_future_functions': True})
Either of those changes will produce the desired result:
For more information see the Formulas added in Excel 2010 section of the XlsxWriter docs.
I want to read R objects back to python in Jupyter. For example, in Jupyter this example reads a dataframe generated in python and processed in R. Now I process this dataframe and create a new one that I want to be able to read to python.
Python cell:
# enables the %%R magic, not necessary if you've already done this
%load_ext rpy2.ipython
import pandas as pd
df = pd.DataFrame({
'cups_of_coffee': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
'productivity': [2, 5, 6, 8, 9, 8, 0, 1, 0, -1]
})
R cell:
%%R -i df
# import df from global environment
df$time = 1
df_new = df
df_new
If I move to a new cell the new dataframe df_new cannot read it as is not recognized.
I tried this:
%Rget df_new
But don't know how to assign it to a pandas dataframe or pass it to a python function.
How can switch back to a python cell and be able to read this new dataframe created in the R cell?
So, I randomly tried something myself and it worked. I couldn't find some good documentation.
So, one can just simply do:
df_python = %Rget df_new
This worked for me.
I have a .xlsx file which looks as the attached file. What is the most common way to extract the different data parts from this excel file in Python?
Ideally there would be a method that is defined as :
pd.read_part_csv(columns=['data1', 'data2','data3'], rows=['val1', 'val2', 'val3'])
and returns an iterator over pandas dataframes which hold the values in the given table.
here is a solution with pylightxl that might be a good fit for your project if all you are doing is reading. I wrote the solution in terms of rows but you could just as well have done it in terms of columns. See docs for more info on pylightxl https://pylightxl.readthedocs.io/en/latest/quickstart.html
import pylightxl
db = pylightxl.readxl('Book1.xlsx')
# pull out all the rowIDs where data groups start
keyrows = [rowID for rowID, row in enumerate(db.ws('Sheet1').rows,1) if 'val1' in row]
# find the columnIDs where data groups start (like in your example, not all data groups start in col A)
keycols = []
for keyrow in keyrows:
# add +1 since python index start from 0
keycols.append(db.ws('Sheet1').row(keyrow).index('val1') + 1)
# define a dict to hold your data groups
datagroups = {}
# populate datatables
for tableIndex, keyrow in enumerate(keyrows,1):
i = 0
# data groups: keys are group IDs starting from 1, list: list of data rows (ie: val1, val2...)
datagroups.update({tableIndex: []})
while True:
# pull out the current group row of data, and remove leading cells with keycols
datarow = db.ws('Sheet1').row(keyrow + i)[keycols[tableIndex-1]:]
# check if the current row is still part of the datagroup
if datarow[0] == '':
# current row is empty and is no longer part of the data group
break
datagroups[tableIndex].append(datarow)
i += 1
print(datagroups[1])
print(datagroups[2])
[[1, 2, 3, ''], [4, 5, 6, ''], [7, 8, 9, '']]
[[9, 1, 4], [2, 4, 1], [3, 2, 1]]
Note that output of table 1 has extra '' on it, that is because the size of the sheet data is larger than your group size. You can easily remove these with list.remove('') if you like
I'm using win32com.client to write data to an excel file.
This takes too much time (the code below simulates the amount of data I want to update excel with, and it takes ~2 seconds).
Is there a way to update multiple cells (with different values) in one call rather than filling them one by one? or maybe using a different method which is more efficient?
I'm using python 2.7 and office 2010.
Here is the code:
from win32com.client import Dispatch
xlsApp = Dispatch('Excel.Application')
xlsApp.Workbooks.Add()
xlsApp.Visible = True
workSheet = xlsApp.Worksheets(1)
for i in range(300):
for j in range(20):
workSheet.Cells(i+1,j+1).Value = (i+10000)*j
A few suggestions:
ScreenUpdating off, manual calculation
Try the following:
xlsApp.ScreenUpdating = False
xlsApp.Calculation = -4135 # manual
try:
#
worksheet = ...
for i in range(...):
#
finally:
xlsApp.ScreenUpdating = True
xlsApp.Calculation = -4105 # automatic
Assign several cells at once
Using VBA, you can set a range's value to an array. Setting several values at once might be faster:
' VBA code
ActiveSheet.Range("A1:D1").Value = Array(1, 2, 3, 4)
I have never tried this using Python, I suggest you try something like:
worksheet.Range("A1:D1").Value = [1, 2, 3, 4]
A different approach
Consider using openpyxl or xlwt. Openpyxls lets you create .xlsx files without having Excel installed. Xlwt does the same thing for .xls files.
used the range suggestion of the other answer, I wrote this:
def writeLineToExcel(wsh,line):
wsh.Range( "A1:"+chr(len(line)+96).upper()+"1").Value=line
xlApp = Dispatch("Excel.Application")
xlApp.Visible = 1
xlDoc = xlApp.Workbooks.Open("test.xlsx")
wsh = xlDoc.Sheets("Sheet1")
writeLineToExcel(wsh,[1, 2, 3, 4])
you may also write multiple lines at once:
def writeLinesToExcel(wsh,lines): # assume that all lines have the same length
wsh.Range( "A1:"+chr(len(lines)+96).upper()+str(len(lines[0]))).Value=lines
writeLinesToExcel(wsh,[ [1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10,11,12],
[13,14,15,16],
])
Note that you can set ranges via numeric adresses easily by using the following code:
cl1 = Sheet1.Cells(X1,Y1)
cl2 = Sheet1.Cells(X2,Y2)
Range = Sheet1.Range(cl1,cl2)
I have python code below that will loop through a table and print out values within a particular column. What is not shown is the form in which the user selects a Feature Layer. Once the Feature Layer is selected a second Dropdown is populated with all the Column Headings for that Feature and the user chooses which Column they want to focus on. Now within the python script, I simply print out each value within that column. But I want to store each value in a List or Array and get Distinct values. How can I do this in Python?
Also is there a more efficient way to loop through the table than to go row by row? That is very slow for some reason.
many thanks
# Import system modules
import sys, string, os, arcgisscripting
# Create the Geoprocessor object
gp = arcgisscripting.create(9.3)
gp.AddToolbox("E:/Program Files (x86)/ArcGIS/ArcToolbox/Toolboxes/Data Management Tools.tbx")
# Declare our user input args
input_dataset = sys.argv[1] #This is the Feature Layer the User wants to Query against
Atts = sys.argv[2] #This is the Column Name The User Selected
#Lets Loop through the rows to get values from a particular column
fc = input_dataset
gp.AddMessage(Atts)
rows = gp.searchcursor(fc)
row = rows.next()
NewList = []
for row in gp.SearchCursor(fc):
##grab field values
fcValue = fields.getvalue(Atts)
NewList.add(fcValue)
You can store distinct values in a set:
>>> a = [ 1, 2, 3, 1, 5, 3, 2, 1, 5, 4 ]
>>> b = set( a )
>>> b
{1, 2, 3, 4, 5}
>>> b.add( 5 )
>>> b
{1, 2, 3, 4, 5}
>>> b.add( 6 )
>>> b
{1, 2, 3, 4, 5, 6}
Also you can make your loop more pythonic, although I'm not sure why you loop over the row to begin with (given that you are not using it):
for row in gp.searchcursor( fc ):
##grab field values
fcValue = fields.getvalue(Atts)
gp.AddMessage(fcValue)
And btw, """ text """ is not a comment. Python only has single line comments starting with #.
One way to get distinct values is to use a set to see if you've seen the value already, and display it only when it's a new value:
fcValues = set()
for row in gp.searchcursor(fc):
##grab field values
fcValue = fields.getvalue(Atts)
if fcValue not in fcValues:
gp.AddMessage(fcValue)
fcValues.add(fcValue)