I'm trying to update a microsoft word - 2010 table by deleting its contents (except the first row contents) using python and win32com client component. I even looked at the MSDN library (http://msdn.microsoft.com/en-us/library/bb244515.aspx ) and found that Delete could help me in this, something like this.
(Also had a look # How to read contents of an Table in MS-Word file Using Python? )
..///
# delete row 1 of table 1
doc.Tables(1).Rows(1).Delete
# delete cell 1,1 of table 1
doc.Tables(1).Cell(1, 1).Delete
..///
but on doing the above steps, the table row isn't deleted (neither the cell [1,1] is deleted). Is there something that I'm missing ? Any suggestions are welcome.
Python function for clearing the table contents is pasted herein with
..//
def updateTable(name):
#tell word to open the document
word.Documents.Open (IP_Directory_Dest + "\\" + name)
#open it internally (i guess...)
doc = word.Documents(1)
## doc.Content.Text = "This is the string to add to the document."
## doc.Content.MoveEnd()
## doc.Content.Select
## doc.Tables(1).Rows(2).Select
## Selection.InsertRowsBelow
## doc.Tables [1]. Rows [1]. Cells [1]. Range.Text = '123123 '
## doc.Tables [1]. Rows.Add () # add a line
# specifically select TABLE # 1
table = doc.Tables(1)
# count the number of rows in TABLE # 1
numRows = table.Rows.Count
# count number of columns
numCols = table.Columns.Count
print ('Number of Rows in TABLE',numRows)
print ('Number of Columns in TABLE',numCols)
# print the row 1 of TABLE # 1 -- for checking
print ('### 1 - CHECK this ONE ... ',table.Rows(1).Range.Text)
# delete row 1 of table 1
doc.Tables(1).Rows(1).Delete
# delete cell 1,1 of table 1
doc.Tables(1).Cell(1, 1).Delete
# again count the number of rows in table
numRows = table.Rows.Count
print numRows
# print the row 1 of TABLE # 1 -- after Deleting the first ROW --> for checking
print ('### 2 - CHECK this ONE ... ',table.Rows(1).Range.Text)
# get the number of tables
numberTables = doc.Tables.Count
# count the number of tables in document
print numberTables
#delete ALL tables
tables = doc.Tables
for table in tables:
# to get the content of Row # 1, Column # 1 of table
print table.Cell(Row =1, Column = 1).Range.Text
## table.Delete(Row =2)
# this one deletes the whole table (which is not needed)
#table.Delete()
#re-save in IP folder
doc.SaveAs(IP_Directory_Dest + "\\" + name)
#close the stream
doc.Close()
...///
Please ignore the commented out sections (I was also trying to make the stuff work)
All,
So this is what I have figured out. I'm going to share the code which I have written to fix it.
While doing this, I learned how to clear content of a table (specific row and column), how to add a row, get the count of columns and rows in a word table et al. Also I figured out, since there in not much documentation available on the API's used for python/win32 (except the MSDN library), the one way which i thought to get used to these APIs is to understand the VB code (mostly its present # MSDN http://msdn.microsoft.com/en-us/library/bb244515.aspx ) and try to make a corresponding similar code for python-win32 too. Thats my understanding.
..///
########################
#
# Purpose : To update the Table contents present in file
# # name : name of the document to process.
# # tableCount : Specific Table number to edit.
#
#######################
def updateTable(name,tableCount):
#tell word to open the document
word.Documents.Open (IP_Directory_Dest + "\\" + name)
#open it internally
doc = word.Documents(1)
# answer to Question # 2 (how to update a specific cell in a TABLE)
# clearing Table # 1, Row # 1, cell # 1 content
doc.Tables (1). Rows (1). Cells (1). Range.Text = ''
#Clearing Table # 1, Row # 1, Col # 4 content to blank
doc.Tables (1). Cell(1, 4). Range.Text = ''
# specifically select TABLE # tableCount
table = doc.Tables(tableCount)
# count the number of rows in TABLE # 1
numRows = table.Rows.Count
# count number of columns
numCols = table.Columns.Count
print ('Number of Rows in TABLE',numRows)
print ('Number of Columns in TABLE',numCols)
# NOTE : ROW # 1 WILL NOT BE DELETED, AS IT IS COMMON FOR BOTH IP AND IR
# delete and add the same number of rows
for row in range(numRows):
# add a row at the end of table
doc.Tables(tableCount).Rows.Add ()
# delete row 2 of table (so as to make the effect of adding rows equal)
doc.Tables(tableCount).Rows(2).Delete()
#re-save in IP folder
doc.SaveAs(IP_Directory_Dest + "\\" + name)
#close the stream
doc.Close()
..///
Add empty parentheses at the end of the line:
doc.Tables(1).Rows(1).Delete()
Those parentheses are necessary to call a method.
Related
Let's say we have a CSV file with payment information like this:
somthing1:500.00
somthing2:300.00
somthing3:200.00
I need to update different rows in a tableB in a Firebird database that has a column called paid with the values in the second column in the CSV file. For that reason iI created a global temporary table just to pump the data in it and then to update the desired table. What I tried is this:
idd = 0
with open('file.csv', 'r', encoding='utf-8', newline='') as file:
reader = list(csv.reader(file, delimiter=':'))
for row in reader:
idd += 1
c.execute("""INSERT INTO temp_table (id,code,paid) VALUES (?,?,?)""",
(idd,str(row[0]),str(row[1]), ))
This works as expected - the table is populated. After that I tried to update other table like this:
c.execute("""select paid from temp_table;""")
res = c.fetchall()
for r in res:
c.execute(f"""UPDATE tableB SET paid = {r[0]} WHERE oid = 10;""")
This does work - for every result from the SELECT query it updates all of the rows with the same value of the next result from that. I tried with MERGE in the database itself with the same result - every row is updated with the same value from temp_table:
MERGE INTO tableB b
USING (SELECT paid FROM temp_table) e
ON b.paid = 0 AND oid = 10
WHEN MATCHED THEN
UPDATE SET b.paid = e.paid;
I need some way to update first row from tableB with first row from the CSV and so on. What am I missing?
The problem is that you are not correlating rows from the temp_table with rows in the tableB.
In your initial attempt, you update all rows in tableB with oid = 10. Similarly, in your second attempt you update all rows with paid = 0 without correlating to temp_table.
You need add a condition to match rows between the tables, say both have an id column:
MERGE INTO tableB b
USING (SELECT paid FROM temp_table) e
ON b.paid = 0 AND b.oid = 10 AND b.id = e.id
WHEN MATCHED THEN
UPDATE SET b.paid = e.paid;
I have a query that reaches into a MySQL database and grabs row data that match the column "cab" which is a variable that is passed on from a previous html page. That variable is cabwrite.
SQL's response is working just fine, it queries and matches the column 'cab' with all data point in the rows that match id cab.
Once that happens I then remove the data I don't need line identifier and cab.
The output from that is result_set.
However when I print the data to verify its what I expect I'm met with the same data for every row I have.
Example data:
Query has 4 matching rows that is finds
This is currently what I'm getting:
> data =
> ["(g11,none,tech11)","(g2,none,tech13)","(g3,none,tech15)","(g4,none,tech31)"]
> ["(g11,none,tech11)","(g2,none,tech13)","(g3,none,tech15)","(g4,none,tech31)"]
> ["(g11,none,tech11)","(g2,none,tech13)","(g3,none,tech15)","(g4,none,tech31)"]
> ["(g11,none,tech11)","(g2,none,tech13)","(g3,none,tech15)","(g4,none,tech31)"]
Code:
cursor = connection1.cursor(MySQLdb.cursors.DictCursor)
cursor.execute("SELECT * FROM devices WHERE cab=%s " , [cabwrite])
result_set = cursor.fetchall()
data = []
for row in result_set:
localint = "('%s','%s','%s')" % ( row["localint"], row["devicename"], row["hostname"])
l = str(localint)
data.append(l)
print (data)
This is what I want it too look like:
data = [(g11,none,tech11),(g2,none,tech13),(g3,none,tech15),(g4,none,tech31)]
["('Gi3/0/13','None','TECH2_HELP')", "('Gi3/0/7','None','TECH2_1507')", "('Gi1/0/11','None','TECH2_1189')", "('Gi3/0/35','None','TECH2_4081')", "('Gi3/0/41','None','TECH2_5625')", "('Gi3/0/25','None','TECH2_4598')", "('Gi3/0/43','None','TECH2_1966')", "('Gi3/0/23','None','TECH2_2573')", "('Gi3/0/19','None','TECH2_1800')", "('Gi3/0/39','None','TECH2_1529')"]
Thanks Tripleee did what you recommended and found my issue... legacy FOR clause in my code upstream was causing the issue.
I have two lists : one contains the column names of categorical variables and the other numeric as shown below.
cat_cols = ['stat','zip','turned_off','turned_on']
num_cols = ['acu_m1','acu_cnt_m1','acu_cnt_m2','acu_wifi_m2']
These are the columns names in a table in Redshift.
I want to pass these as a parameter to pull only numeric columns from a table in Redshift(PostgreSql),write that into a csv and close the csv.
Next I want to pull only cat_cols and open the csv and then append to it and close it.
my query so far:
#1.Pull num data:
seg = ['seg1','seg2']
sql_data = str(""" SELECT {num_cols} """ + """FROM public.""" + str(seg) + """ order by random() limit 50000 ;""")
df_data = pd.read_sql(sql_data, cnxn)
# Write to csv.
df_data.to_csv("df_sample.csv",index = False)
#2.Pull cat data:
sql_data = str(""" SELECT {cat_cols} """ + """FROM public.""" + str(seg) + """ order by random() limit 50000 ;""")
df_data = pd.read_sql(sql_data, cnxn)
# Append to df_seg.csv and close the connection to csv.
with open("df_sample.csv",'rw'):
## Append to the csv ##
This is the first time I am trying to do selective querying based on python lists and hence stuck on how to pass the list as column names to select from table.
Can someone please help me with this?
If you want, to make a query in a string representation, in your case will be better to use format method, or f-strings (required python 3.6+).
Example for the your case, only with built-in format function.
seg = ['seg1', 'seg2']
num_cols = ['acu_m1','acu_cnt_m1','acu_cnt_m2','acu_wifi_m2']
query = """
SELECT {} FROM public.{} order by random() limit 50000;
""".format(', '.join(num_cols), seg)
print(query)
If you want use only one item from the seg array, use seg[0] or seg[1] in format function.
I hope this will help you!
I've got a ESRI Point Shape file with (amongst others) a nMSLINK field and a DIAMETER field. The MSLINK is not unique, because of a spatial join. What I want to achieve is to keep only the features in the shapefile that have a unique MSLINK and the smallest DIAMETER value, together with the corresponding values in the other fields. I can use a searchcursor to achieve this (looping through all features and removing each feature that does not comply, but this takes ages (> 75000 features). I was wondering if eg. numpy could do the trick faster in ArcMap/arcpy.
I think, making that kind of processing would definitely be a lot faster if you work on memory instead of interacting with arcgis. For example, by putting all the rows first into a python object (probably a namedtuple would be a good option here). Then you can find out which rows you want to delete or insert.
The fastest approach depends on a) if you have a lot of (MSLINK) repeated rows, then the fastest would be inserting just the ones you need in a new layer. Or b) if the rows to be deleted are just a few compared to the total of rows, then deleting is faster.
For a) you'll need to fetch all fields into the tuple, including the point coordinates, so that you can just create a new feature class and insert the new rows.
# Example of Variant a:
from collections import namedtuple
# assuming the following:
source_fc # contains name of the fclass
the_path # contains path to the shape
cleaned_fc # the name of the cleaned fclass
# use all fields of source_fc plus the shape token to get a touple with xy
# coordinates (using 'mslink' and 'diam' here to simplify the example)
fields = ['mslink', 'diam', 'field3', ... ]
all_fields = fields + ['SHAPE#XY']
# define a namedtuple to hold and work with the rows, use the name 'point' to
# hold the coordinates-tuple
Row = namedtuple('Row', fields + ['point'])
data = []
with arcpy.da.SearchCursor(source_fc, fields) as sc:
for r in sc:
# unzip the values from each row into a new Row (namedtuple) and append
# to data
data.append(Row(*r))
# now just delete the rows we don't want, for this, the easiest way, is probably
# to order the tuple first after MSLINK and then after the diamater...
data = sorted(data, key = lambda x : (x.mslink, x.diam))
# ... now just keep the first ones for each mslink
to_keep = []
last_mslink = None
for d in data:
if last_mslink != d.mslink:
last_mslink = d.mslink
to_keep.append(d)
# create a new feature class with the same fields as the source_fc
arcpy.CreateFeatureclass_management(
out_path=the_path, out_name=cleaned_fc, template=source_fc)
with arcpy.da.InsertCursor(cleaned_fc, all_fields) as ic:
for r in to_keep:
ic.insertRow(*r)
And for alternative b) I would just fetch 3 fields, a unique ID, MSLINK and the diameter. Then make a delete list (here you only need the unique ids). Then loop again through the feature class and delete the rows with the id on your delete-list. Just to be sure, I would duplicate the feature class first, and work on a copy.
There are a few steps you can take to accomplish this task more efficiently. First and foremost, making use of the data analyst cursor as opposed to the older version of cursor will increase the speed of your process. This assumes you are working in 10.1 or beyond. Then you can employ summary statistics, namely its ability to find a minimum value based off a case field. For yours, the case field would be nMSLINK.
The code below first creates a statistics table with all unique 'nMSLINK' values, and its corresponding minimum 'DIAMETER' value. I then use a table select to select out only rows in the table whose 'FREQUENCY' field is not 1. From here I iterate through my new table and start to build a list of strings that will make up a final sql statement. After this iteration, I use the python join function to create an sql string that looks something like this:
("nMSLINK" = 'value1' AND "DIAMETER" <> 624.0) OR ("nMSLINK" = 'value2' AND "DIAMETER" <> 1302.0) OR ("nMSLINK" = 'value3' AND "DIAMETER" <> 1036.0) ...
The sql selects rows where nMSLINK values are not unique and where DIAMETER values are not the minimum. Using this SQL, I select by attribute and delete selected rows.
This SQL statement is written assuming your feature class is in a file geodatabase and that 'nMSLINK' is a string field and 'DIAMETER' is a numeric field.
The code has the following inputs:
Feature: The feature to be analyzed
Workspace: A folder that will store a couple intermediate tables temporarily
TempTableName1: A name for one temporary table.
TempTableName2: A name for a second temporary table
Field1 = The nonunique field
Field2 = The field with the numeric values that you wish to find the lowest of
Code:
# Import modules
from arcpy import *
import os
# Local variables
#Feature to analyze
Feature = r"C:\E1B8\ScriptTesting\Workspace\Workspace.gdb\testfeatureclass"
#Workspace to export table of identicals
Workspace = r"C:\E1B8\ScriptTesting\Workspace"
#Name of temp DBF table file
TempTableName1 = "Table1"
TempTableName2 = "Table2"
#Field names
Field1 = "nMSLINK" #nonunique
Field2 = "DIAMETER" #field with numeric values
#Make layer to allow selection
MakeFeatureLayer_management (Feature, "lyr")
#Path for first temp table
Table = os.path.join (Workspace, TempTableName1)
#Create statistics table with min value
Statistics_analysis (Feature, Table, [[Field2, "MIN"]], [Field1])
#SQL Select rows with frequency not equal to one
sql = '"FREQUENCY" <> 1'
# Path for second temp table
Table2 = os.path.join (Workspace, TempTableName2)
# Select rows with Frequency not equal to one
TableSelect_analysis (Table, Table2, sql)
#Empty list for sql bits
li = []
# Iterate through second table
cursor = da.SearchCursor (Table2, [Field1, "MIN_" + Field2])
for row in cursor:
# Add SQL bit to list
sqlbit = '("' + Field1 + '" = \'' + row[0] + '\' AND "' + Field2 + '" <> ' + str(row[1]) + ")"
li.append (sqlbit)
del row
del cursor
#Create SQL for selection of unwanted features
sql = " OR ".join (li)
print sql
#Select based on SQL
SelectLayerByAttribute_management ("lyr", "", sql)
#Delete selected features
DeleteFeatures_management ("lyr")
#delete temp files
Delete_management ("lyr")
Delete_management (Table)
Delete_management (Table2)
This should be quicker than a straight-up cursor. Let me know if this makes sense. Good luck!
I am searching for a persistent data storage solution that can handle heterogenous data stored on disk. PyTables seems like an obvious choice, but the only information I can find on how to append new columns is a tutorial example. The tutorial has the user create a new table with added column, copy the old table into the new table, and finally delete the old table. This seems like a huge pain. Is this how it has to be done?
If so, what are better alternatives for storing mixed data on disk that can accommodate new columns with relative ease? I have looked at sqlite3 as well and the column options seem rather limited there, too.
Yes, you must create a new table and copy the original data. This is because Tables are a dense format. This gives it a huge performance benefits but one of the costs is that adding new columns is somewhat expensive.
thanks for Anthony Scopatz's answer.
I search website and in github, I found someone has shown how to add columns in PyTables.
Example showing how to add a column in PyTables
orginal version ,Example showing how to add a column in PyTables, but have some difficulty to migrate.
revised version, Isolated the copying logic, while some terms is deprecated, and it has some minor error in adding new columns.
based on their's contribution, I updated the code for adding new column in PyTables. (Python 3.6, windows)
# -*- coding: utf-8 -*-
"""
PyTables, append a column
"""
import tables as tb
pth='d:/download/'
# Describe a water class
class Water(tb.IsDescription):
waterbody_name = tb.StringCol(16, pos=1) # 16-character String
lati = tb.Int32Col(pos=2) # integer
longi = tb.Int32Col(pos=3) # integer
airpressure = tb.Float32Col(pos=4) # float (single-precision)
temperature = tb.Float64Col(pos=5) # double (double-precision)
# Open a file in "w"rite mode
# if don't include pth, then it will be in the same path as the code.
fileh = tb.open_file(pth+"myadd-column.h5", mode = "w")
# Create a table in the root directory and append data...
tableroot = fileh.create_table(fileh.root, 'root_table', Water,
"A table at root", tb.Filters(1))
tableroot.append([("Mediterranean", 10, 0, 10*10, 10**2),
("Mediterranean", 11, -1, 11*11, 11**2),
("Adriatic", 12, -2, 12*12, 12**2)])
print ("\nContents of the table in root:\n",
fileh.root.root_table[:])
# Create a new table in newgroup group and append several rows
group = fileh.create_group(fileh.root, "newgroup")
table = fileh.create_table(group, 'orginal_table', Water, "A table", tb.Filters(1))
table.append([("Atlantic", 10, 0, 10*10, 10**2),
("Pacific", 11, -1, 11*11, 11**2),
("Atlantic", 12, -2, 12*12, 12**2)])
print ("\nContents of the original table in newgroup:\n",
fileh.root.newgroup.orginal_table[:])
# close the file
fileh.close()
#%% Open it again in append mode
fileh = tb.open_file(pth+"myadd-column.h5", "a")
group = fileh.root.newgroup
table = group.orginal_table
# Isolated the copying logic
def append_column(table, group, name, column):
"""Returns a copy of `table` with an empty `column` appended named `name`."""
description = table.description._v_colObjects.copy()
description[name] = column
copy = tb.Table(group, table.name+"_copy", description)
# Copy the user attributes
table.attrs._f_copy(copy)
# Fill the rows of new table with default values
for i in range(table.nrows):
copy.row.append()
# Flush the rows to disk
copy.flush()
# Copy the columns of source table to destination
for col in descr:
getattr(copy.cols, col)[:] = getattr(table.cols, col)[:]
# choose wether remove the original table
# table.remove()
return copy
# Get a description of table in dictionary format
descr = table.description._v_colObjects
descr2 = descr.copy()
# Add a column to description
descr2["hot"] = tb.BoolCol(dflt=False)
# append orginal and added data to table2
table2 = append_column(table, group, "hot", tb.BoolCol(dflt=False))
# Fill the new column
table2.cols.hot[:] = [row["temperature"] > 11**2 for row in table ]
# Move table2 to table, you can use the same name as original one.
table2.move('/newgroup','new_table')
# Print the new table
print ("\nContents of the table with column added:\n",
fileh.root.newgroup.new_table[:])
# Finally, close the file
fileh.close()