Append data to existing pytables table - python

I am new to PyTables and implemented a few basic techniques of inserting and retrieving data from a table in Pytables. However, I am not sure about how to insert data in an existing table of PyTables because all I read/get in the tutorial is creating a new table (using h5file.createTable() method). Here is the basic code that tutorial has about inserting data into PytTables table created from scratch:
h5file = openFile("tutorial1.h5", mode = "w", title = "Test file")
group = h5file.createGroup("/", 'detector', 'Detector information')
table = h5file.createTable(group, 'readout', Particle, "Readout example")
for i in xrange(10):
particle['name'] = 'Particle: %6d' % (i)
particle['TDCcount'] = i % 256
particle['ADCcount'] = (i * 256) % (1 << 16)
particle['grid_i'] = i
particle['grid_j'] = 10 - i
particle['pressure'] = float(i*i)
particle['energy'] = float(particle['pressure'] ** 4)
particle['idnumber'] = i * (2 ** 34)
# Insert a new particle record
particle.append()
table.flush()
P.S. There is one place in this tutorial that talks about appending data to an existing table, but uses the table that was created from scratch and basically gives no idea about selecting pre-existing table for appending data. Kindly help. Thanks.

You need open your file in append mode "a". Also do not create the group and table again. This appends another 10 rows:
import tables
class Particle(tables.IsDescription):
name = tables.StringCol(16) # 16-character String
idnumber = tables.Int64Col() # Signed 64-bit integer
ADCcount = tables.UInt16Col() # Unsigned short integer
TDCcount = tables.UInt8Col() # unsigned byte
grid_i = tables.Int32Col() # 32-bit integer
grid_j = tables.Int32Col() # 32-bit integer
pressure = tables.Float32Col() # float (single-precision)
energy = tables.Float64Col() # double (double-precision)
h5file = tables.openFile("tutorial1.h5", mode = "a")
table = h5file.root.detector.readout
particle = table.row
for i in range(10, 20):
particle['name'] = 'Particle: %6d' % (i)
particle['TDCcount'] = i % 256
particle['ADCcount'] = (i * 256) % (1 << 16)
particle['grid_i'] = i
particle['grid_j'] = 10 - i
particle['pressure'] = float(i*i)
particle['energy'] = float(particle['pressure'] ** 4)
particle['idnumber'] = i * (2 ** 34)
# Insert a new particle record
particle.append()
h5file.close()

Related

Interactive Data table visual power bi via R/python

I am attempting to display a table in powerbi that includes a ' checkbox' field that the user can tick or untick.
So far, the closest I've come is to create such a table in R using the rhandsontable library.
'''
DF = data.frame(integer =
1:10,
numeric =
rnorm(10),
logical =
rep(TRUE, 10),
character =
LETTERS[1:10],
factor =
factor(letters[1:10], levels =
letters[10:1],
ordered = TRUE),
factor_allow =
factor(letters[1:10], levels =
letters[10:1],
ordered = TRUE),
date = seq(from
= Sys.Date(), by = "days",
length.out = 10),
stringsAsFactors = FALSE)
rhandsontable(DF, width = 600,
height = 300) %>%
hot_col("factor_allow",
allowInvalid = TRUE)
'''
This solution works in R . However, in powerBi ( through the R script visual), it gets the following error: " cannot display visual".
From what I can tell, this is because Powerbi cannot display dynamic html visuals. If this correct, is there a way around this or alternate solution?
Any suggestions will be greatly appreciated!

slqalchemy core pagination with flask when we pass text fragments to generate the sql and python

For a simple select, pagination works as implemented here:
mheader_dict = dict(request.headers)
no_of_pgs = 0
if 'Maxpage' in mheader_dict.keys():
max_per_pg = int(mheader_dict['Maxpage'])
else:
max_per_pg = 100
page_no = int(request.headers.get('Pageno', type=int, default=1))
offset1 = (page_no - 1) * max_per_pg
s = select[orders]
if s is not None:
s = s.limit(max_per_pg).offset(offset1)
rs = g.conn.execute(s)
Conn is the connection object above
When text is used in the select statement, How to specify the limit?.How to rectify in below:
s1 = text('select d.*, (select array(select localities.name from localities, localities_boys where localities.id = localities_boys.locality_id and localities_boys.boy_id = d.id and localities_boys.boy_id is not null )) from delivery_boys d order by d.id;')
page_no = int(request.headers.get('Pageno', type=int, default=1))
offset1 = (page_no - 1) * max_per_pg
s1 = s1.limit(max_per_pg).offset(offset1)
rs1 = g.conn.execute(s1)
If s1 = s1.compile(engine) is used, it returns sqlalchemy.dialects.postgresql.psycopg2.PGCompiler_psycopg2 object which doesn't have limit functionality
How to convert sqlalchemy.sql.elements.TextClause to sqlalchemy.sql.selectable.Select using sqlalchemy core 1.0.8 to solve the above?
using sqlalchemy core v. 1.0.8, python 2.7,flask 0.12
Converted the TextClause to Select as :
s1 = select([text('d.*, (select array(select localities.name from localities, localities_boys where localities.id = localities_boys.locality_id and localities_boys.boy_id = d.id and localities_boys.boy_id is not null)) from delivery_boys d')])
Hence able to use limit and offset on the generated Select object

Python - How to create an Excel Calculated Field without modifying original source of Data

I have 2 tables on Excel:
.
I've created an excel Pivot Table using Python but I could not find a simple way to create a calculated field inside it (like I would do with VB) which matches Region from left table and Region from right table.
So I did this, using the module win32com.client:
First, stored the content of the tables in two lists : myTable and myRates.
Then, added a new column to the original left table where I calculated CA * (1 + rate). The code here:
calField = [['CA Bonifié']] #first element as a title for the new column :
for a, testMyTable in enumerate(myTable):
for b, testMyRates in enumerate(myRates):
if a >0 and b > 0:
if testMyTable[0] == testMyRates[0]:
calField.append( [ testMyTable[ len(testMyTable)-1 ] * ( 1+testMyRates[1] ) ] )
for i, testDataRow in enumerate(calField):
for j, testDataItem in enumerate(testDataRow):
Sheet1.Cells(i+1,len(testMyTable)+1).Value = testDataItem
What it does in the sheet "source":
What it does in the created sheet "TCD":
Result is ok but I don't like this method as it alterates the original table. So I'm looking a simplest method to do that.
Thanks in advance for your help
PS : The whole code below. May it help.
import win32com.client
Excel = win32com.client.gencache.EnsureDispatch('Excel.Application')
win32c = win32com.client.constants
Excel.Visible = True
wb = Excel.Workbooks.Open('C:/Users/Documents/Python/classeur.xlsx')
Sheet1 = wb.Worksheets('Source')
def getContiguousRange(fichier, sheet, row, col):
bottom = row
while sheet.Cells(bottom + 1, col).Value not in [None, '']:
bottom = bottom + 1
right = col
while sheet.Cells(row, right + 1).Value not in [None, '']:
right = right + 1
return sheet.Range(sheet.Cells(row, col), sheet.Cells(bottom, right)).Value
myTable = getContiguousRange(fichier = wb, sheet = Sheet1, row = 1, col = 1)
myRates = getContiguousRange(fichier = wb, sheet = Sheet1, row = 1, col = 8)
calField = [['CA Bonifié']]
for a, testMyTable in enumerate(myTable):
for b, testMyRates in enumerate(myRates):
if a >0 and b > 0:
if testMyTable[0] == testMyRates[0]:
calField.append( [ testMyTable[ len(testMyTable)-1 ] * ( 1+testMyRates[1] ) ] )
for i, testDataRow in enumerate(calField):
for j, testDataItem in enumerate(testDataRow):
Sheet1.Cells(i+1,len(testMyTable)+1).Value = testDataItem
cl1 = Sheet1.Cells(1,1)
cl2 = Sheet1.Cells(len(myTable),len(myTable[0])+1)
pivotSourceRange = Sheet1.Range(cl1,cl2)
pivotSourceRange.Select()
Sheet2 = wb.Sheets.Add (After=wb.Sheets (1))
Sheet2.Name = 'TCD'
cl3=Sheet2.Cells(4,1)
pivotTargetRange= Sheet2.Range(cl3,cl3)
pivotTableName = 'tableauCroisé'
pivotCache = wb.PivotCaches().Create(SourceType=win32c.xlDatabase, SourceData=pivotSourceRange, Version=win32c.xlPivotTableVersion14)
pivotTable = pivotCache.CreatePivotTable(TableDestination=pivotTargetRange, TableName=pivotTableName, DefaultVersion=win32c.xlPivotTableVersion14)
pivotTable.PivotFields('Service').Orientation = win32c.xlRowField
pivotTable.PivotFields('Service').Position = 1
pivotTable.PivotFields('Region').Orientation = win32c.xlPageField
pivotTable.PivotFields('Region').Position = 1
pivotTable.PivotFields('Region').CurrentPage = 'IDF'
dataField = pivotTable.AddDataField(pivotTable.PivotFields('CA'))
dataField.NumberFormat = '# ### €'
calculField = pivotTable.AddDataField(pivotTable.PivotFields('CA Bonifié'))
calculField.NumberFormat = '# ### €'
# wb.SaveCopyAs('C:/Users/Documents/Python/tcd.xlsx')
# wb.Close(True)
# Excel.Application.Quit()
Note: I'm using Sheet1 as the Image show all relevant indices and its easier to verify.
You can move the Formula to the PivotTabel at a later Step, once verified.
STEP Replace Column E with the Formula =VLOOKUP
Reference: how-to-use-vlookup-match
Replace the following in your Code:
for row, testDataRow in enumerate(calField, 2):
#Sheet1.Cells(i+1,len(testMyTable)+1).Value = testDataItem
Sheet1.Cells(row, 5).Formula = '=VLOOKUP(A{}, H1:I5, MATCH(H1,H1:I1))'.format(row)
The Result should show the matching Taux!
Come back and confirm Results are OK!
STEP Compute Taux

Google Sheets API - Formatting inserted values

Through this code I've update a bunch of rows in Google Spreadsheet.
The request goes well and returns me the updatedRange below.
result = service.spreadsheets().values().append(
spreadsheetId=spreadsheetId,
range=rangeName,
valueInputOption="RAW",
insertDataOption="INSERT_ROWS",
body=body
).execute()
print(result)
print("Range updated")
updateRange = result['updates']['updatedRange']
Now I would like to do a batchUpdate request to set the formatting or set a protected range, but those API require a range specified as startRowIndex, endRowIndex and so on.
How could I retrieve the rows index from the updatedRange?
Waiting for a native or better answer, I'll post a function I've created to translate a namedRange into a gridRange.
The function is far from perfect and does not translate the sheet name to a sheet id (I left that task to another specific function), but accept named ranges in the form:
sheet!A:B
sheet!A1:B
sheet!A:B5
sheet!A1:B5
Here is the code
import re
def namedRange2Grid(self, rangeName):
ascii_uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
match = re.match(".*?\!([A-Z0-9]+)\:([A-Z0-9]+)", rangeName)
if match:
start = match.group(1)
end = match.group(2)
matchStart = re.match("([A-Z]{1,})([1-9]+){0,}", start)
matchEnd = re.match("([A-Z]{1,})([1-9]+){0,}", end)
if matchStart and matchEnd:
GridRange = {}
letterStart = matchStart.group(1)
letterEnd = matchEnd.group(1)
if matchStart.group(2):
numberStart = int(matchStart.group(2))
GridRange['startRowIndex'] = numberStart - 1
if matchEnd.group(2):
numberEnd = int(matchEnd.group(2))
GridRange['endRowIndex'] = numberEnd
i = 0
for l in range(0, len(letterStart)):
i = i + (l * len(ascii_uppercase))
i = i + ascii_uppercase.index(letterStart[l])
GridRange['startColumnIndex'] = i
i = 0
for l in range(0, len(letterEnd)):
i = i + (l * len(ascii_uppercase))
i = i + ascii_uppercase.index(letterEnd[l])
GridRange['endColumnIndex'] = i + 1
return GridRange

Python script to build least-cost paths between several polygons: How to speed up it?

I created a python program which uses the function "CostPath" of ArcGIS to automatically build least-cost paths (LCPs) between several polygons contained in the shapefile "selected_patches.shp". My python program seems to work but it is much too slow. I must build 275493 LCPs. Unfortunately, I don't know how to speed up my program (I am a beginner in Python programming language and ArcGIS). Or is there another solution to calculate rapidly least-cost paths between several polygons with ArcGIS (I use ArcGIS 10.1) ? Here is my code:
# Import system modules
import arcpy
from arcpy import env
from arcpy.sa import *
arcpy.CheckOutExtension("Spatial")
# Overwrite outputs
arcpy.env.overwriteOutput = True
# Set the workspace
arcpy.env.workspace = "C:\Users\LCP"
# Set the extent environment
arcpy.env.extent = "costs.tif"
rowsInPatches_start = arcpy.SearchCursor("selected_patches.shp")
for rowStart in rowsInPatches_start:
ID_patch_start = rowStart.getValue("GRIDCODE")
expressionForSelectInPatches_start = "GRIDCODE=%s" % (ID_patch_start) ## Define SQL expression for the fonction Select Layer By Attribute
# Process: Select Layer By Attribute in Patches_start
arcpy.MakeFeatureLayer_management("selected_patches.shp", "Selected_patch_start", expressionForSelectInPatches_start)
# Process: Cost Distance
outCostDist=CostDistance("Selected_patch_start", "costs.tif", "", "outCostLink.tif")
# Save the output
outCostDist.save("outCostDist.tif")
rowsInSelectedPatches_end = arcpy.SearchCursor("selected_patches.shp")
for rowEnd in rowsInSelectedPatches_end:
ID_patch_end = rowEnd.getValue("GRIDCODE")
expressionForSelectInPatches_end = "GRIDCODE=%s" % (ID_patch_end) ## Define SQL expression for the fonction Select Layer By Attribute
# Process: Select Layer By Attribute in Patches_end
arcpy.MakeFeatureLayer_management("selected_patches.shp", "Selected_patch_end", expressionForSelectInPatches_end)
# Process: Cost Path
outCostPath = CostPath("Selected_patch_end", "outCostDist.tif", "outCostLink.tif", "EACH_ZONE","FID")
# Save the output
outCostPath.save('P_' + str(int(ID_patch_start)) + '_' + str(int(ID_patch_end)) + ".tif")
# Writing in file .txt
outfile=open('P_' + str(int(ID_patch_start)) + '_' + str(int(ID_patch_end)) + ".txt", "w")
rowsTxt = arcpy.SearchCursor('P_' + str(int(ID_patch_start)) + '_' + str(int(ID_patch_end)) + ".tif")
for rowTxt in rowsTxt:
value = rowTxt.getValue("Value")
count = rowTxt.getValue("Count")
pathcost = rowTxt.getValue("PATHCOST")
startrow = rowTxt.getValue("STARTROW")
startcol = rowTxt.getValue("STARTCOL")
print value, count, pathcost, startrow, startcol
outfile.write(str(value) + " " + str(count) + " " + str(pathcost) + " " + str(startrow) + " " + str(startcol) + "\n")
outfile.close()
Thanks very much for your help.
The speed it takes to write to disc vs calculating your cost can be a bottleneck, consider adding a thread to handle all of your writes.
This:
for rowTxt in rowsTxt:
value = rowTxt.getValue("Value")
count = rowTxt.getValue("Count")
pathcost = rowTxt.getValue("PATHCOST")
startrow = rowTxt.getValue("STARTROW")
startcol = rowTxt.getValue("STARTCOL")
print value, count, pathcost, startrow, startcol
outfile.write(str(value) + " " + str(count) + " " + str(pathcost) + " " + str(startrow) + " " + str(startcol) + "\n")
Can be converted into a thread function by making rowsTxt a global variable, and having your thread write to disk from rowsTxt.
After you complete all of your processing you can have an additional global boolean so that your thread function can end when you are done writing everything and you can close your thread.
Example thread function I currently use:
import threading
class ThreadExample:
def __init__(self):
self.receiveThread = None
def startRXThread(self):
self.receiveThread = threading.Thread(target = self.receive)
self.receiveThread.start()
def stopRXThread(self):
if self.receiveThread is not None:
self.receiveThread.__Thread__stop()
self.receiveThread.join()
self.receiveThread = None
def receive(self):
while true:
#do stuff for the life of the thread
#in my case, I listen on a socket for data
#and write it out
So for your case, you could add a class variable to the thread class
self.rowsTxt
and then update your receive to check self.rowsTxt, and if it is not empty, handle it as u do in the code snippet i took from you above. After you handle it, set self.rowsTxt back to None. You could update your threads self.rowsTxt with your main function as it gets rowsTxt. Consider using a buffer like list for self.rowsTxt so you don't miss writing anything.
The most immediate change you can make to significant improve speed would be to switch to data access cursors (e.g. arcpy.da.SearchCursor()). To illustrate, I ran a benchmark test a while back to see the data access cursors perform compared to the old cursors.
The attached figure shows the results of a benchmark test on the new da method UpdateCursor versus the old UpdateCursor method. Essentially, the benchmark test performs the following workflow:
Create random points (10, 100, 1000, 10000, 100000)
Randomly sample from a normal distribution and add value to a new
column in the random points attribute table with a cursor
Run 5 iterations of each random point scenario for both the new and
old UpdateCursor methods and write the mean value to lists
Plot the results
import arcpy, os, numpy, time
arcpy.env.overwriteOutput = True
outws = r'C:\temp'
fc = os.path.join(outws, 'randomPoints.shp')
iterations = [10, 100, 1000, 10000, 100000]
old = []
new = []
meanOld = []
meanNew = []
for x in iterations:
arcpy.CreateRandomPoints_management(outws, 'randomPoints', '', '', x)
arcpy.AddField_management(fc, 'randFloat', 'FLOAT')
for y in range(5):
# Old method ArcGIS 10.0 and earlier
start = time.clock()
rows = arcpy.UpdateCursor(fc)
for row in rows:
# generate random float from normal distribution
s = float(numpy.random.normal(100, 10, 1))
row.randFloat = s
rows.updateRow(row)
del row, rows
end = time.clock()
total = end - start
old.append(total)
del start, end, total
# New method 10.1 and later
start = time.clock()
with arcpy.da.UpdateCursor(fc, ['randFloat']) as cursor:
for row in cursor:
# generate random float from normal distribution
s = float(numpy.random.normal(100, 10, 1))
row[0] = s
cursor.updateRow(row)
end = time.clock()
total = end - start
new.append(total)
del start, end, total
meanOld.append(round(numpy.mean(old),4))
meanNew.append(round(numpy.mean(new),4))
#######################
# plot the results
import matplotlib.pyplot as plt
plt.plot(iterations, meanNew, label = 'New (da)')
plt.plot(iterations, meanOld, label = 'Old')
plt.title('arcpy.da.UpdateCursor -vs- arcpy.UpdateCursor')
plt.xlabel('Random Points')
plt.ylabel('Time (minutes)')
plt.legend(loc = 2)
plt.show()

Categories