How to update an Excel column? - python

I have two excel files which are Table1 and Table 2 as shown:
All the data have it unique id, and i wish to update the cost column in table 1 according to data in table 2.
May I know how can I update the "Cost" column in table 1 according to the "Invoice id" in table 2 automatically?
I have tried with the coding below and unluckily seems no work, no matching occur, Table 1 cost still remain as table 1 cost. I have check with the cell type for both invoice id and cost column in both excel and it shown they are identical which is all of them are int.
However i still cant figure out where is the part of wrong.
# Import library which going to use
import openpyxl
from openpyxl import load_workbook
book1 = 'Book1.xlsx'
book2 = 'Book2.xlsx'
Book1 = openpyxl.load_workbook(book1)
book1_firstsheet = Book1['template']
Book2 = openpyxl.load_workbook(book2)
book2_firstsheet = Book2['template']
for book1_rownumber in range (2,book1_firstsheet.max_row):
book1_invoice= book1_firstsheet.cell(row=book1_rownumber,column=3).value
book1_cost=book1_firstsheet.cell(row=book1_rownumber,column=4).value
for book2_rownumber in range (2,book2_firstsheet.max_row):
book2_invoice= book2_firstsheet.cell(row=book2_rownumber,column=3).value
book2_cost=book2_firstsheet.cell(row=book2_rownumber,column=4).value
if book1_invoice == book2_invoice:
book1_cost = book2_cost
new_file= "new_workbook.xlsx"
Book1.save(new_file)

Change your last line (book1_cost = book2_cost) after the if condition to the following
book1_firstsheet.cell(row=book1_rownumber,column=4).value = book2_firstsheet.cell(row=book2_rownumber,column=4).value
and it should work, in my opinion

Related

Search SQL request with two tables on PostgreSQL. SQLAlchemy. Python

Need help in request making on SQL or SQLAlchemy
First table named as Rows
sid
unit_sid
ROW_UUID1
UNIT_UUID1
ROW_UUID2
UNIT_UUID1
ROW_UUID3
UNIT_UUID
Second table with name Records
row_sid (==SID from ROWS)
item_sid
content (str)
ROW_UUID1
ITEM_UUID1
Decription 1
ROW_UUID1
ITEM_UUID2
Decription 1
ROW_UUID2
ITEM_UUID1
Description 3
ROW_UUID2
ITEM_UUID2
Description 2
ROW_UUID3
ITEM_UUID1
Description 5
ROW_UUID3
ITEM_UUID2
Description 1
I need an example of a SQL query, where I can specify a search for several content values for different item_sid
For example I need all ROWS where
item_sid == ITEM_UUID1 and content == Description 1
item_sid == ITEM_UUID2 and content == Description 1
Request like bellow will not work for me, because I need search in two item_sid in same time for receiving unique ROWS
select row_sid
from rows
left join record on rows.sid = record.row_sid
where (item_sid = '877aeeb4-c68e-4942-b259-288e7aa3c04b' and
content like '%TEXT%')
and (item_sid = 'cc22f239-db6c-4041-92c6-8705cb621525' and
content like '%TEXT2%') GROUP BY row_sid
Solved like
select row_sid
from rows
left join record on rows.sid = record.row_sid
where (item_sid = '877aeeb4-c68e-4942-b259-288e7aa3c04b' and
content like '%TEXT%')
or (item_sid = 'cc22f239-db6c-4041-92c6-8705cb621525' and
content like '%TEXT2%') GROUP BY row_sid having count(row_sid) = 2
But maybe there are more beautiful solution? I want to request different number of item_sids (2-5) in the same time

Python docxtpl - How to change the size of the column?

I am using docxtpl to generate word documents. I am inserting multiple tables to a Word document and wondering how I can change the size of the specific column in the code.
Here is the table in my template.docx:
The table looks in the report like in the image:
Here is a glance how I insert the table to the docx:
## Create panda data frame from the dict
df_bauwerke = pd.DataFrame.from_dict(outDict['Bauwerke']['Schlüsselbauwerke'], orient='index')
## Convert panda df to DocxTemplate table format
table_bauwerke = {
"bauwerke_col_labels": list(df_bauwerke.columns.values),
"bauwerke_tbl_contents": [{"cols": rows} for rows in df_bauwerke.values.tolist()]}
context_to_load.update(table_bauwerke)
I would like to change the width of column 'Jahr' & 'Name' (Name is larger and Jahr is narrower) and the rest stay as they are. Can I influence that in my script?

Pull column names along with data in Teradata Python module

I am running the below snippet in python:
with udaExec.connect(method="ODBC", system=<server IP>,username=<user>,password=<pwd>) as session:
for row in session.execute("""sel top 3 * from retail.employee"""):
print(row)
The above query is returning data without the column names. How do I pull column names along with data from the employee table while using teradata python module in python3.x ?
I will use pandas and teradata to get full control of data.
import teradata
import pandas as pd
with udaExec.connect(method="ODBC", system=<server IP>,username=<user>,password=<pwd>) as session:
query = '''sel top 3 * from re
tail.employee'''
df = pd.read_sql(query,session)
print(df.columns.tolist()) #columns
print(df.head(2)) # beautiful first 2 rows
I've found pandas pretty pretty thick, but useful at times.
But I see the column names are in the cursor description: https://pypi.org/project/teradatasql/#CursorAttributes
The index isn't working for me in pypi for this page, so you'll probably have to scroll down, but you should find the following:
.description
Read-only attribute consisting of a sequence of seven-item sequences that each describe a result set column, available after a SQL request is executed.
.description[Column][0] provides the column name.
.description[Column][1] provides the column type code as an object comparable to one of the Type Objects listed below.
.description[Column][2] provides the column display size in characters. Not implemented yet.
.description[Column][3] provides the column size in bytes.
.description[Column][4] provides the column precision if applicable, or None otherwise.
.description[Column][5] provides the column scale if applicable, or None otherwise.
.description[Column][6] provides the column nullability as True or False.
If you want to replicate pandas to_dict, you can do the following:
with teradatasql.connect(**conn) as con:
with con.cursor () as cur:
cur.execute("sel top 3 * from retail.employee;")
rows = cur.fetchall()
columns=[d[0] for d in cur.description]
list_of_dict=[{columns[i]:rows[j][i] for i in range(0,len(columns))} for j in range(1,len(rows[0]))]
Result:
[
{
"Name":"John Doe",
"SomeOtherEmployeeColumn":"arbitrary data"
}
]
Have you tried:
with udaExec.connect(method="ODBC", system=<server IP>,username=<user>,password=<pwd>) as session:
for row in session.execute("""sel top 3 * from retail.employee"""):
print(row.name + ": " row.val)

Is the only way to add a column in PyTables to create a new table and copy?

I am searching for a persistent data storage solution that can handle heterogenous data stored on disk. PyTables seems like an obvious choice, but the only information I can find on how to append new columns is a tutorial example. The tutorial has the user create a new table with added column, copy the old table into the new table, and finally delete the old table. This seems like a huge pain. Is this how it has to be done?
If so, what are better alternatives for storing mixed data on disk that can accommodate new columns with relative ease? I have looked at sqlite3 as well and the column options seem rather limited there, too.
Yes, you must create a new table and copy the original data. This is because Tables are a dense format. This gives it a huge performance benefits but one of the costs is that adding new columns is somewhat expensive.
thanks for Anthony Scopatz's answer.
I search website and in github, I found someone has shown how to add columns in PyTables.
Example showing how to add a column in PyTables
orginal version ,Example showing how to add a column in PyTables, but have some difficulty to migrate.
revised version, Isolated the copying logic, while some terms is deprecated, and it has some minor error in adding new columns.
based on their's contribution, I updated the code for adding new column in PyTables. (Python 3.6, windows)
# -*- coding: utf-8 -*-
"""
PyTables, append a column
"""
import tables as tb
pth='d:/download/'
# Describe a water class
class Water(tb.IsDescription):
waterbody_name = tb.StringCol(16, pos=1) # 16-character String
lati = tb.Int32Col(pos=2) # integer
longi = tb.Int32Col(pos=3) # integer
airpressure = tb.Float32Col(pos=4) # float (single-precision)
temperature = tb.Float64Col(pos=5) # double (double-precision)
# Open a file in "w"rite mode
# if don't include pth, then it will be in the same path as the code.
fileh = tb.open_file(pth+"myadd-column.h5", mode = "w")
# Create a table in the root directory and append data...
tableroot = fileh.create_table(fileh.root, 'root_table', Water,
"A table at root", tb.Filters(1))
tableroot.append([("Mediterranean", 10, 0, 10*10, 10**2),
("Mediterranean", 11, -1, 11*11, 11**2),
("Adriatic", 12, -2, 12*12, 12**2)])
print ("\nContents of the table in root:\n",
fileh.root.root_table[:])
# Create a new table in newgroup group and append several rows
group = fileh.create_group(fileh.root, "newgroup")
table = fileh.create_table(group, 'orginal_table', Water, "A table", tb.Filters(1))
table.append([("Atlantic", 10, 0, 10*10, 10**2),
("Pacific", 11, -1, 11*11, 11**2),
("Atlantic", 12, -2, 12*12, 12**2)])
print ("\nContents of the original table in newgroup:\n",
fileh.root.newgroup.orginal_table[:])
# close the file
fileh.close()
#%% Open it again in append mode
fileh = tb.open_file(pth+"myadd-column.h5", "a")
group = fileh.root.newgroup
table = group.orginal_table
# Isolated the copying logic
def append_column(table, group, name, column):
"""Returns a copy of `table` with an empty `column` appended named `name`."""
description = table.description._v_colObjects.copy()
description[name] = column
copy = tb.Table(group, table.name+"_copy", description)
# Copy the user attributes
table.attrs._f_copy(copy)
# Fill the rows of new table with default values
for i in range(table.nrows):
copy.row.append()
# Flush the rows to disk
copy.flush()
# Copy the columns of source table to destination
for col in descr:
getattr(copy.cols, col)[:] = getattr(table.cols, col)[:]
# choose wether remove the original table
# table.remove()
return copy
# Get a description of table in dictionary format
descr = table.description._v_colObjects
descr2 = descr.copy()
# Add a column to description
descr2["hot"] = tb.BoolCol(dflt=False)
# append orginal and added data to table2
table2 = append_column(table, group, "hot", tb.BoolCol(dflt=False))
# Fill the new column
table2.cols.hot[:] = [row["temperature"] > 11**2 for row in table ]
# Move table2 to table, you can use the same name as original one.
table2.move('/newgroup','new_table')
# Print the new table
print ("\nContents of the table with column added:\n",
fileh.root.newgroup.new_table[:])
# Finally, close the file
fileh.close()

Populate Unique ID field after Sorting, Python

I am trying to create an new unique id field in an access table. I already have one field called SITE_ID_FD, but it is historical. The format of the unique value in that field isn't what our current format is, so I am creating a new field with the new format.
Old Format = M001, M002, K003, K004, S005, M006, etc
New format = 12001, 12002, 12003, 12004, 12005, 12006, etc
I wrote the following script:
fc = r"Z:\test.gdb\testfc"
x = 12001
cursor = arcpy.UpdateCursor(fc)
for row in cursor:
row.setValue("SITE_ID", x)
cursor.updateRow(row)
x+= 1
This works fine, but it populates the new id field based on the default sorting of objectID. I need to sort 2 fields first and then populate the new id field based on that sorting (I want to sort by a field called SITE and then by the old id field SITE_ID_FD)
I tried manually sorting the 2 fields in hopes that Python would honor the sort, but it doesn't. I'm not sure how to do this in Python. Can anyone suggest a method?
A possible solution is when you are creating your update cursor. you can specify to the cursor the fields by which you wish it to be sorted (sorry for my english..), they explain this in the documentation: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//000v0000003m000000
so it goes like this:
UpdateCursor(dataset, {where_clause}, {spatial_reference}, {fields}, {sort_fields})
and you are intrested only in the sort_fields so assuming that your code will work well on a sorted table and that you want the table ordered asscending the second part of your code should look like this:
fc = r"Z:\test.gdb\testfc"
x = 12001
cursor = arcpy.UpdateCursor(fc,"","","","SITE A, SITE_ID_FD A")
#if you want to sort it descending you need to write it with a D
#>> cursor = arcpy.UpdateCursor(fc,"","","","SITE D, SITE_ID_FD D")
for row in cursor:
row.setValue("SITE_ID", x)
cursor.updateRow(row)
x+= 1
i hope this helps
Added a link to the arcpy docs in a comment, but from what I can tell, this will create a new, sorted dataset--
import arcpy
from arcpy import env
env.workspace = r"z:\test.gdb"
arcpy.Sort_management("testfc", "testfc_sort", [["SITE", "ASCENDING"],
["SITE_IF_FD", "ASCENDING]])
And this will, on the sorted dataset, do what you want:
fc = r"Z:\test.gdb\testfc_sort"
x = 12001
cursor = arcpy.UpdateCursor(fc)
for row in cursor:
row.setValue("SITE_ID", x)
cursor.updateRow(row)
x+= 1
I'm assuming there's some way to just copy the sorted/modified dataset back over the original, so it's all good?
I'll admit, I don't use arcpy, and the docs could be a lot more explicit.

Categories