I am using docxtpl to generate word documents. I am inserting multiple tables to a Word document and wondering how I can change the size of the specific column in the code.
Here is the table in my template.docx:
The table looks in the report like in the image:
Here is a glance how I insert the table to the docx:
## Create panda data frame from the dict
df_bauwerke = pd.DataFrame.from_dict(outDict['Bauwerke']['Schlüsselbauwerke'], orient='index')
## Convert panda df to DocxTemplate table format
table_bauwerke = {
"bauwerke_col_labels": list(df_bauwerke.columns.values),
"bauwerke_tbl_contents": [{"cols": rows} for rows in df_bauwerke.values.tolist()]}
context_to_load.update(table_bauwerke)
I would like to change the width of column 'Jahr' & 'Name' (Name is larger and Jahr is narrower) and the rest stay as they are. Can I influence that in my script?
Related
I have two excel files which are Table1 and Table 2 as shown:
All the data have it unique id, and i wish to update the cost column in table 1 according to data in table 2.
May I know how can I update the "Cost" column in table 1 according to the "Invoice id" in table 2 automatically?
I have tried with the coding below and unluckily seems no work, no matching occur, Table 1 cost still remain as table 1 cost. I have check with the cell type for both invoice id and cost column in both excel and it shown they are identical which is all of them are int.
However i still cant figure out where is the part of wrong.
# Import library which going to use
import openpyxl
from openpyxl import load_workbook
book1 = 'Book1.xlsx'
book2 = 'Book2.xlsx'
Book1 = openpyxl.load_workbook(book1)
book1_firstsheet = Book1['template']
Book2 = openpyxl.load_workbook(book2)
book2_firstsheet = Book2['template']
for book1_rownumber in range (2,book1_firstsheet.max_row):
book1_invoice= book1_firstsheet.cell(row=book1_rownumber,column=3).value
book1_cost=book1_firstsheet.cell(row=book1_rownumber,column=4).value
for book2_rownumber in range (2,book2_firstsheet.max_row):
book2_invoice= book2_firstsheet.cell(row=book2_rownumber,column=3).value
book2_cost=book2_firstsheet.cell(row=book2_rownumber,column=4).value
if book1_invoice == book2_invoice:
book1_cost = book2_cost
new_file= "new_workbook.xlsx"
Book1.save(new_file)
Change your last line (book1_cost = book2_cost) after the if condition to the following
book1_firstsheet.cell(row=book1_rownumber,column=4).value = book2_firstsheet.cell(row=book2_rownumber,column=4).value
and it should work, in my opinion
I have two lists : one contains the column names of categorical variables and the other numeric as shown below.
cat_cols = ['stat','zip','turned_off','turned_on']
num_cols = ['acu_m1','acu_cnt_m1','acu_cnt_m2','acu_wifi_m2']
These are the columns names in a table in Redshift.
I want to pass these as a parameter to pull only numeric columns from a table in Redshift(PostgreSql),write that into a csv and close the csv.
Next I want to pull only cat_cols and open the csv and then append to it and close it.
my query so far:
#1.Pull num data:
seg = ['seg1','seg2']
sql_data = str(""" SELECT {num_cols} """ + """FROM public.""" + str(seg) + """ order by random() limit 50000 ;""")
df_data = pd.read_sql(sql_data, cnxn)
# Write to csv.
df_data.to_csv("df_sample.csv",index = False)
#2.Pull cat data:
sql_data = str(""" SELECT {cat_cols} """ + """FROM public.""" + str(seg) + """ order by random() limit 50000 ;""")
df_data = pd.read_sql(sql_data, cnxn)
# Append to df_seg.csv and close the connection to csv.
with open("df_sample.csv",'rw'):
## Append to the csv ##
This is the first time I am trying to do selective querying based on python lists and hence stuck on how to pass the list as column names to select from table.
Can someone please help me with this?
If you want, to make a query in a string representation, in your case will be better to use format method, or f-strings (required python 3.6+).
Example for the your case, only with built-in format function.
seg = ['seg1', 'seg2']
num_cols = ['acu_m1','acu_cnt_m1','acu_cnt_m2','acu_wifi_m2']
query = """
SELECT {} FROM public.{} order by random() limit 50000;
""".format(', '.join(num_cols), seg)
print(query)
If you want use only one item from the seg array, use seg[0] or seg[1] in format function.
I hope this will help you!
I am running the below snippet in python:
with udaExec.connect(method="ODBC", system=<server IP>,username=<user>,password=<pwd>) as session:
for row in session.execute("""sel top 3 * from retail.employee"""):
print(row)
The above query is returning data without the column names. How do I pull column names along with data from the employee table while using teradata python module in python3.x ?
I will use pandas and teradata to get full control of data.
import teradata
import pandas as pd
with udaExec.connect(method="ODBC", system=<server IP>,username=<user>,password=<pwd>) as session:
query = '''sel top 3 * from re
tail.employee'''
df = pd.read_sql(query,session)
print(df.columns.tolist()) #columns
print(df.head(2)) # beautiful first 2 rows
I've found pandas pretty pretty thick, but useful at times.
But I see the column names are in the cursor description: https://pypi.org/project/teradatasql/#CursorAttributes
The index isn't working for me in pypi for this page, so you'll probably have to scroll down, but you should find the following:
.description
Read-only attribute consisting of a sequence of seven-item sequences that each describe a result set column, available after a SQL request is executed.
.description[Column][0] provides the column name.
.description[Column][1] provides the column type code as an object comparable to one of the Type Objects listed below.
.description[Column][2] provides the column display size in characters. Not implemented yet.
.description[Column][3] provides the column size in bytes.
.description[Column][4] provides the column precision if applicable, or None otherwise.
.description[Column][5] provides the column scale if applicable, or None otherwise.
.description[Column][6] provides the column nullability as True or False.
If you want to replicate pandas to_dict, you can do the following:
with teradatasql.connect(**conn) as con:
with con.cursor () as cur:
cur.execute("sel top 3 * from retail.employee;")
rows = cur.fetchall()
columns=[d[0] for d in cur.description]
list_of_dict=[{columns[i]:rows[j][i] for i in range(0,len(columns))} for j in range(1,len(rows[0]))]
Result:
[
{
"Name":"John Doe",
"SomeOtherEmployeeColumn":"arbitrary data"
}
]
Have you tried:
with udaExec.connect(method="ODBC", system=<server IP>,username=<user>,password=<pwd>) as session:
for row in session.execute("""sel top 3 * from retail.employee"""):
print(row.name + ": " row.val)
I have a table X in big query with 170,000 rows . The values on this table as based on complex calculations done on the values from a table Y. These are done in python so as to automate the ingestion when Y gets updated.
Every time Y updates, I recompute the values needed for X in my script and insert them using the script below using streaming:
def stream_data(table, json_data):
data = json.loads(str(json_data))
# Reload the table to get the schema.
table.reload()
rows = [data]
errors = table.insert_data(rows)
if not errors:
print('Loaded 1 row into {}'.format( table))
else:
print('Errors:')
The problem here is that I have to delete all rows in the table before I insert . I know a query to do this but it fails because big query does not allow DML when there is a streaming buffer on the table and this is for one day apparently.
IS there a workaround where I can delete all rows in X , recompute based on Y and then insert the new values using the code above ??
Possibly turning the streaming buffer off ??!!
Another option would be to drop the whole table and recreate it . But my table is huge with 60 columns and the JSON for the schema would be huge . I couldn't find samples where I can create a new table with schema passed from json/file ? Some samples in this would be great.
A third option is to make the streaming insert smart that it does an update instead of insert if the row has changed . This again is a DML operation and goes back to original problem.
UPDATE:
another approach I tried is to delete the table and recreate it . Before delete I copy the schema so I can set it in the new table.:
def stream_data( json_data):
bigquery_client = bigquery.Client("myproject")
dataset = bigquery_client.dataset("mydataset")
table = dataset.table("test")
data = json.loads(json_data)
schema=table.schema
table.delete()
table = dataset.table("test")
# Set the table schema
table = dataset.table("test",schema)
table.create()
rows = [data]
errors = table.insert_data(rows)
if not errors:
print('Loaded 1 row ')
else:
print('Errors:')
This gives me an error :
ValueError: Set either 'view_query' or 'schema'.
UPDATE 2:
Key was to do a
table.reload() before
schema=table.schema to fix the above!
I am searching for a persistent data storage solution that can handle heterogenous data stored on disk. PyTables seems like an obvious choice, but the only information I can find on how to append new columns is a tutorial example. The tutorial has the user create a new table with added column, copy the old table into the new table, and finally delete the old table. This seems like a huge pain. Is this how it has to be done?
If so, what are better alternatives for storing mixed data on disk that can accommodate new columns with relative ease? I have looked at sqlite3 as well and the column options seem rather limited there, too.
Yes, you must create a new table and copy the original data. This is because Tables are a dense format. This gives it a huge performance benefits but one of the costs is that adding new columns is somewhat expensive.
thanks for Anthony Scopatz's answer.
I search website and in github, I found someone has shown how to add columns in PyTables.
Example showing how to add a column in PyTables
orginal version ,Example showing how to add a column in PyTables, but have some difficulty to migrate.
revised version, Isolated the copying logic, while some terms is deprecated, and it has some minor error in adding new columns.
based on their's contribution, I updated the code for adding new column in PyTables. (Python 3.6, windows)
# -*- coding: utf-8 -*-
"""
PyTables, append a column
"""
import tables as tb
pth='d:/download/'
# Describe a water class
class Water(tb.IsDescription):
waterbody_name = tb.StringCol(16, pos=1) # 16-character String
lati = tb.Int32Col(pos=2) # integer
longi = tb.Int32Col(pos=3) # integer
airpressure = tb.Float32Col(pos=4) # float (single-precision)
temperature = tb.Float64Col(pos=5) # double (double-precision)
# Open a file in "w"rite mode
# if don't include pth, then it will be in the same path as the code.
fileh = tb.open_file(pth+"myadd-column.h5", mode = "w")
# Create a table in the root directory and append data...
tableroot = fileh.create_table(fileh.root, 'root_table', Water,
"A table at root", tb.Filters(1))
tableroot.append([("Mediterranean", 10, 0, 10*10, 10**2),
("Mediterranean", 11, -1, 11*11, 11**2),
("Adriatic", 12, -2, 12*12, 12**2)])
print ("\nContents of the table in root:\n",
fileh.root.root_table[:])
# Create a new table in newgroup group and append several rows
group = fileh.create_group(fileh.root, "newgroup")
table = fileh.create_table(group, 'orginal_table', Water, "A table", tb.Filters(1))
table.append([("Atlantic", 10, 0, 10*10, 10**2),
("Pacific", 11, -1, 11*11, 11**2),
("Atlantic", 12, -2, 12*12, 12**2)])
print ("\nContents of the original table in newgroup:\n",
fileh.root.newgroup.orginal_table[:])
# close the file
fileh.close()
#%% Open it again in append mode
fileh = tb.open_file(pth+"myadd-column.h5", "a")
group = fileh.root.newgroup
table = group.orginal_table
# Isolated the copying logic
def append_column(table, group, name, column):
"""Returns a copy of `table` with an empty `column` appended named `name`."""
description = table.description._v_colObjects.copy()
description[name] = column
copy = tb.Table(group, table.name+"_copy", description)
# Copy the user attributes
table.attrs._f_copy(copy)
# Fill the rows of new table with default values
for i in range(table.nrows):
copy.row.append()
# Flush the rows to disk
copy.flush()
# Copy the columns of source table to destination
for col in descr:
getattr(copy.cols, col)[:] = getattr(table.cols, col)[:]
# choose wether remove the original table
# table.remove()
return copy
# Get a description of table in dictionary format
descr = table.description._v_colObjects
descr2 = descr.copy()
# Add a column to description
descr2["hot"] = tb.BoolCol(dflt=False)
# append orginal and added data to table2
table2 = append_column(table, group, "hot", tb.BoolCol(dflt=False))
# Fill the new column
table2.cols.hot[:] = [row["temperature"] > 11**2 for row in table ]
# Move table2 to table, you can use the same name as original one.
table2.move('/newgroup','new_table')
# Print the new table
print ("\nContents of the table with column added:\n",
fileh.root.newgroup.new_table[:])
# Finally, close the file
fileh.close()