Fill tables in a template Word with Python (DocxTemplate, Jinja2) - python

I am trying to fill with Python a table in Word with DocxTemplate and I have some issues to do it properly. I want to use 2 dictionnaries to fill the data in 1 table, in the figure below.
Table to fill
The 2 dictionnaries are filled in a loop and I write the template document at the end.
The input document to create my dictionnaries is an DB extraction written in SQL.
My main issue is when I want to fill the table with my data in the 2 different dictionnaries.
In the code below I will give as an example the 2 dictionnaries with values in it.
# -*- coding: utf8 -*-
#
#
from docxtpl import DocxTemplate
if __name__ == "__main__":
document = DocxTemplate("template.docx")
DicoOccuTable = {'`num_carnet_adresses`': '`annuaire_telephonique`\n`carnet_adresses`\n`carnet_adresses_complement',
'`num_eleve`': '`CFA_apprentissage_ctrl_coherence`\n`CFA_apprentissage_ctrl_examen`}
DicoChamp = {'`num_carnet_adresses`': 72, '`num_eleve`': 66}
template_values = {}
#
template_values["keys"] = [[{"name":cle, "occu":val} for cle,val in DicoChamp.items()],
[{"table":vals} for cles,vals in DicoOccuTable.items()]]
#
document.render(template_values)
document.save('output/' + nomTable.replace('`','') + '.docx')
As a result the two lines for the table are created but nothing is written within...
I would like to add that it's only been 1 week that I work on Python, so I feel that I don't manage properly the different objects here.
If you have any suggestion to help me, I would appreciate it !
I put here the loop to create the dictionnaries, it may help you to understand why I coded it wrong :)
for c in ChampList:
with open("db_reference.sql", "r") as f:
listTable = []
line = f.readlines()
for l in line:
if 'CREATE TABLE' in l:
begin = True
linecreateTable = l
x = linecreateTable.split()
nomTable = x[2]
elif c in l and begin == True:
listTable.append(nomTable)
elif ') ENGINE=MyISAM DEFAULT CHARSET=latin1;' in l:
begin = False
nbreOccu=len(listTable)
Tables = "\n".join(listTable)
DicoChamp.update({c:nbreOccu})
DicoOccuTable.update({c:Tables})
# DicoChamp = {c:nbreOccu}
template_values = {}
Thank You very much !

Finally I found a solution for this problem. Here it is.
Instead of using 2 dictionnaries I created 1 dictionnary with this strucuture :
Dico = { Champ : [Occu , Tables] }
The full code for creating the table is detailed below :
from docxtpl import DocxTemplate
document = DocxTemplate("template.docx")
template_values = {}
Context = {}
for c in ChampList:
listTable = []
nbreOccu = 0
OccuTables = []
with open("db_reference.sql", "r") as g:
listTable = []
ligne = g.readlines()
for li in ligne:
if 'CREATE TABLE' in li:
begin = True
linecreateTable2 = li
y = linecreateTable2.split()
nomTable2 = y[2]
elif c in li and begin == True:
listTable.append(nomTable2)
elif ') ENGINE=MyISAM DEFAULT CHARSET=latin1;' in li:
begin = False
elif '/*!40101 SET COLLATION_CONNECTION=#OLD_COLLATION_CONNECTION */;' in li:
nbreOccu=len(listTable)
inter = "\n".join(listTable)
OccuTables.append(nbreOccu)
OccuTables.append(inter)
ChampNumPropre = c.replace('`','')
Context.update({ChampNumPropre:OccuTables})
else:
continue
template_values["keys"] = [{"label":cle, "cols":val} for cle,val in Context.items()]
#
document.render(template_values)
document.save('output/' + nomTable.replace('`','') + '.docx')
And I used a table with the following structure :
I hope you will find your answers here and good luck !

Related

pylucence cannot find a word that was presented in the text which indexed earlier

I use pylucence 9.4.1 to index a document and I just noticed a weird problem. There are some words, e.g. 'baby', that are present in the document but pylucene is unable to find them in the index.
This is my code to index the document:
(The document can be downloaded from here.
filepath = os.getcwd() + '/' + 'wiki_movie_plots_deduped.csv'
def indexDocument(title, year, plot):
ft = FieldType()
ft.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
ft.setStored(True)
ft.setTokenized(True)
ft.setStoreTermVectors(True)
ft.setStoreTermVectorOffsets(True)
ft.setStoreTermVectorPositions(True)
doc = document.Document()
doc.add(document.Field("Title", title, ft))
doc.add(document.Field("Plot", plot, ft))
writer.addDocument(doc)
def CloseWriter():
writer.close()
def makeInvertedIndex(file_path):
df = pd.read_csv(file_path)
print(df.columns)
docid = 0
for i in df.index:
print(docid, '-', df['Title'][i])
indexDocument(df['Title'][i], df['Release Year'][i], df['Plot'][i])
docid += 1
indexPath = File('index/').toPath()
indexDir = FSDirectory.open(indexPath)
writerConfig = IndexWriterConfig(EnglishAnalyzer())
writer = IndexWriter(indexDir, writerConfig)
inverted = makeInvertedIndex(filepath)
CloseWriter()
This is the code to search the created index for a keyword:
keyword = 'baby'
fieldname = 'Title'
result = list()
indexPath = File('index/').toPath()
directory = FSDirectory.open(indexPath)
analyzer = StandardAnalyzer()
reader = DirectoryReader.open(directory)
searcher = IndexSearcher(DirectoryReader.open(directory))
query = QueryParser(fieldname, analyzer).parse(keyword)
print('query', query)
numdocs = searcher.count(query)
print("#-docs:", numdocs)
searcher.setSimilarity(BM25Similarity(1.2,0.75))
scoreDocs = searcher.search(query, 1000).scoreDocs # it returns TopDocs object containing scoreDocs and totalHits
# scoreDoc object contains docId and score
print('total hit:', searcher.search(query, 100).totalHits)
print("%s total matching documents" % (len(scoreDocs)))
Any help to understand the problem is appreciated.

Python Chess Data (FEN) into Stockfish for Python

I am trying to use stockfish to evaluate a chess position using FEN notation all in Python. I am mainly using two libraries (pgnToFen I found on github here: https://github.com/SindreSvendby/pgnToFen and Stockfish the MIT licensed one here: https://github.com/zhelyabuzhsky/stockfish). After many bugs I have reached problem after problem. Stockfish not only can't analyse this FEN position (3b2k1/1p3pp1/8/3pP1P1/pP3P2/P2pB3/6K1/8 b f3 -) but it infinitely loops! "No worries!" and thought changing the source code would be accomplishable. Changed to _put(), but basically I am unable to put dummy values in because stdin.flush() won't execute once I give it those values! Meaning I don't even think I can skip to the next row in my dataframe. :( The code I changed is below.
def _put(self, command: str, tmp_time) -> None:
if not self.stockfish.stdin:
raise BrokenPipeError()
self.stockfish.stdin.write(f"{command}\n")
try:
self.stockfish.stdin.flush()
except:
if command != "quit":
self.stockfish.stdin.write('isready\n')
try:
time.sleep(tmp_time)
self.stockfish.stdin.flush()
except:
#print ('Imma head out', file=sys.stderr)
raise ValueError('Imma head out...')
#sys.stderr.close()
def get_evaluation(self) -> dict:
"""Evaluates current position
Returns:
A dictionary of the current advantage with "type" as "cp" (centipawns) or "mate" (checkmate in)
"""
evaluation = dict()
fen_position = self.get_fen_position()
if "w" in fen_position: # w can only be in FEN if it is whites move
compare = 1
else: # stockfish shows advantage relative to current player, convention is to do white positive
compare = -1
self._put(f"position {fen_position}", 5)
self._go()
x=0
while True:
x=x+1
text = self._read_line()
#print(text)
splitted_text = text.split(" ")
if splitted_text[0] == "info":
for n in range(len(splitted_text)):
if splitted_text[n] == "score":
evaluation = {
"type": splitted_text[n + 1],
"value": int(splitted_text[n + 2]) * compare,
}
elif splitted_text[0] == "bestmove":
return evaluation
elif x == 500:
evaluation = {
"type": 'cp',
"value": 10000,
}
return evaluation
and last but not least change to the init_ contructor below:
self._stockfish_major_version: float = float(self._read_line().split(" ")[1])
And the code where I am importing this code to is below, this is where errors pop up.
import pandas as pd
import re
import nltk
import numpy as np
from stockfish import Stockfish
import os
import sys
sys.path.insert(0, r'C:\Users\path\to\pgntofen')
import pgntofen
#nltk.download('punkt')
#Changed models.py for major version line 39 in stockfish from int to float
stockfish = Stockfish(r"C:\Users\path\to\Stockfish.exe")
file = r'C:\Users\path\to\selenium-pandas output.csv'
chunksize = 10 ** 6
for chunk in pd.read_csv(file, chunksize=chunksize):
for index, row in chunk.iterrows():
FullMovesStr = str(row['FullMoves'])
FullMovesStr = FullMovesStr.replace('+', '')
if "e.p" in FullMovesStr:
row.to_csv(r'C:\Users\MyName\Logger.csv', header=None, index=False, mode='a')
print('Enpassant')
continue
tokens = nltk.word_tokenize(FullMovesStr)
movelist = []
for tokenit in range(len(tokens)):
if "." in str(tokens[tokenit]):
try:
tokenstripped = re.sub(r"[0-9]+\.", "", tokens[tokenit])
token = [tokenstripped, tokens[tokenit+1]]
movelist.append(token)
except:
continue
else:
continue
DFMoves = pd.DataFrame(movelist, columns=[['WhiteMove', 'BlackMove']])
DFMoves['index'] = row['index']
DFMoves['Date'] = row['Date']
DFMoves['White'] = row['White']
DFMoves['Black'] = row['Black']
DFMoves['W ELO'] = row['W ELO']
DFMoves['B ELO'] = row['B ELO']
DFMoves['Av ELO'] = row['Av ELO']
DFMoves['Event'] = row['Event']
DFMoves['Site'] = row['Site']
DFMoves['ECO'] = row['ECO']
DFMoves['Opening'] = row['Opening']
pd.set_option('display.max_rows', DFMoves.shape[0]+1)
print(DFMoves[['WhiteMove', 'BlackMove']])
seqmoves = []
#seqmovesBlack = []
evalmove = []
pgnConverter = pgntofen.PgnToFen()
#stockfish.set_fen_position("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1")
#rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
for index, row in DFMoves.iterrows():
try:
stockfish.set_fen_position("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1")
except:
evalmove.append("?")
continue
#stockfish.set_fen_position("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1")
pgnConverter.resetBoard()
WhiteMove = str(row['WhiteMove'])
BlackMove = str(row['BlackMove'])
if index == 0:
PGNMoves1 = [WhiteMove]
seqmoves.append(WhiteMove)
#seqmoves.append(BlackMove)
else:
seqmoves.append(WhiteMove)
#seqmoves.append(BlackMove)
PGNMoves1 = seqmoves.copy()
#print(seqmoves)
try:
pgnConverter.pgnToFen(PGNMoves1)
fen = pgnConverter.getFullFen()
except:
break
try:
stockfish.set_fen_position(fen)
print(stockfish.get_board_visual())
evalpos = stockfish.get_evaluation()
evalmove.append(evalpos)
except:
pass
try:
stockfish.set_fen_position("rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1")
except:
evalmove.append("?")
continue
pgnConverter.resetBoard()
if index == 0:
PGNMoves2 = [WhiteMove, BlackMove]
seqmoves.append(BlackMove)
else:
seqmoves.append(BlackMove)
PGNMoves2 = seqmoves.copy()
try:
pgnConverter.pgnToFen(PGNMoves2)
fen = pgnConverter.getFullFen()
except:
break
try:
stockfish.set_fen_position(fen)
print(stockfish.get_board_visual())
evalpos = stockfish.get_evaluation()
print(evalpos)
evalmove.append(evalpos)
except:
pass
#DFMoves['EvalWhite'] = evalwhite
#DFMoves['EvalBlack'] = evalblack
print(evalmove)
So the detailed question is getting stockfish.get_evalution() to just skip, or better yet fix the problem, for this ( 3b2k1/1p3pp1/8/3pP1P1/pP3P2/P2pB3/6K1/8 b f3 - ) FEN position. I have been working on this problem for quite a while so any insight into this would be very much appreciated.
My specs are Windows 10, Python 3.9, Processor:Intel(R) Core(TM) i9-10980XE CPU # 3.00GHz 3.00 GHz and RAM is 64.0 GB.
Thanks :)
Ok. It seems your fen is invalid (3b2k1/1p3pp1/8/3pP1P1/pP3P2/P2pB3/6K1/8 b f3 -). So check that. And python-chess (https://python-chess.readthedocs.io/en/latest/index.html) library allows you to use FEN AND chess engines. So, pretty cool no ? Here is an example of theses two fantastics tools :
import chess
import chess.engine
import chess.pgn
pgn = open("your_pgn_file.pgn")
game = chess.pgn.read_game(pgn)
engine = chess.engine.SimpleEngine.popen_uci("your_stockfish_path.exe")
# Iterate through all moves, play them on a board and analyse them.
board = game.board()
for move in game.mainline_moves():
board.push(move)
print(engine.analyse(board, chess.engine.Limit(time=0.1))["score"])

How to simplify python code in for loop or another

I have the following development which I'm working with the ElementTree and Pandas module in Python:
import xml.etree.ElementTree as ET
import pandas as pd
file_xml = ET.parse('example1.xml')
rootXML = file_xml.getroot()
def transfor_data_atri(rootXML):
file_xml = ET.parse(rootXML)
data_XML = [
{"Name": signal.attrib["Name"],
# "Value": signal.attrib["Value"]
"Value": int(signal.attrib["Value"].split(' ')[0])
} for signal in file_xml.findall(".//Signal")
]
signals_df = pd.DataFrame(data_XML)
extract_name_value(signals_df)
def extract_name_value(signals_df):
#print(signals_df)
signal_ig_st = signals_df[signals_df.Name.isin(["Status"])]
row_values_ig_st = signal_ig_st.T
vector_ig_st = row_values_ig_st.iloc[[1]]
signal_nav_DSP_rq = signals_df[signals_df.Name.isin(["SetDSP"])]
row_values_nav_DSP_rq = signal_nav_DSP_rq.T
vector_nav_DSP_rq = row_values_nav_DSP_rq.iloc[[1]]
signal_HMI_st = signals_df[signals_df.Name.isin(["HMI"])]
row_values_HMI_st = signal_HMI_st.T
vector_HMI_st = row_values_HMI_st.iloc[[1]]
signal_delay_ac = signals_df[signals_df.Name.isin(["Delay"])]
row_values_delay_ac = signal_delay_ac.T
vector_delay_ac = row_values_delay_ac.iloc[[1]]
signal_AutoConfigO_Rear = signals_df[signals_df.Name.isin(["AutoConfigO_Rear"])]
row_values_AutoConfigO_Rear = signal_AutoConfigO_Rear.T
vector_AutoConfigO_Rear = row_values_AutoConfigO_Rear.iloc[[1]]
signal_ACO_Front = signals_df[signals_df.Name.isin(["AutoConfigO_Front"])]
row_values_ACO_Front = signal_ACO_Front.T
vertor_ACO_Front = row_values_ACO_Front.iloc[[1]]
signal_ACO_Drvr = signals_df[signals_df.Name.isin(["AutoConfigO_Drvr"])]
row_values_ACO_Drvr = signal_ACO_Drvr.T
vector_ACO_Drvr = row_values_ACO_Drvr.iloc[[1]]
signal_ACO_Allst = signals_df[signals_df.Name.isin(["AutoConfigO_Allst"])]
row_values_ACO_Allst = signal_ACO_Allst.T
vector_ACO_Allst = row_values_ACO_Allst.iloc[[1]]
signal_RURRq_st = signals_df[signals_df.Name.isin(["RUResReqstStat"])]
row_values_RURRq_st = signal_RURRq_st.T
vector_RURRq_st = row_values_RURRq_st.iloc[[1]]
signal_RURqSy_st = signals_df[signals_df.Name.isin(["RUReqstrSystem"])]
row_values_RURqSy_st = signal_RURqSy_st.T
vector_RURqSy_st = row_values_RURqSy_st.iloc[[1]]
signal_RUAudS_st = signals_df[signals_df.Name.isin(["RUSource"])]
row_values_RUAudS_st = signal_RUAudS_st.T
vector_RUAudS_st = row_values_RUAudS_st.iloc[[1]]
signal_DSP_st = signals_df[signals_df.Name.isin(["DSP"])]
row_values_DSP = signal_DSP.T
vector_DSP = row_values_DSP.iloc[[1]]
print('1: ', vector_ig_st)
print('2: ', vector_nav_DSP_rq)
print('3: ', vector_HMI_st)
print('4: ', vector_delay_ac)
The output of the above is the following, they are the first 4 prints and it is fine, because it is what they want, but I have to simplify the code, so that any type of xml file of the type example.xml, can be read not only example1.xml:
The simplified code is required to bring the data as it is in the names_list variable, but not to use this variable, which is actually hard-coded:
names_list = [
'Status', 'SetDSP', 'HMI', 'Delay', 'AutoConfigO_Rear',
'AutoConfigO_Front', 'AutoConfigO_Drvr','AutoConfigO_Allst',
'RUResReqstStat', 'RUReqstrSystem', 'RUSource', 'DSP'
]
So when the client wants to put another XML file with the same structure, but with other names that are not in the code, it can read them without problem. Beforehand thank you very much.
I hope I'm understanding the questions correctly. my understanding is that
you want to dynamically produce the extract_name_value() function, and make it not as bulky in your code.
Im sorry, but I failed to comprehend the for i in signal_name: print(i) part of the question. perhaps you can rephrase the question, and help me understand?
my solution to the extract_name_value() part would be using the exec() function.
it is a built-in solution for dynamic execution.
name_list = ['Status', 'SetDSP', 'HMI', 'Delay', 'AutoConfigO_Rear',
'AutoConfigO_Front', 'AutoConfigO_Drvr', 'AutoConfigO_Allst',
'RUResReqstStat', 'RUReqstrSystem', 'RUSource', 'DSP']
def _build_extract_name_value_func(name_list):
extract_name_value_func = ""
for name in name_list:
holder_func = f"""
signal_{name} = signals_df[signals_df.Name.isin([{name}])]
row_values_{name} = signal_{name}.T
vector_{name} = row_values_{name}.iloc[[1]]
vector_list.append(vector_{name})
"""
extract_name_value_func += holder_func
return extract_name_value_func
def extract_name_value(name_list):
extract_name_value_func = build_extract_name_value_func(name_list)
exec(extract_name_value_func)
the code was not tested with actual data, because I am not familiar with handling xml structures. But I hope the python part can be some help to you.
I was able to solve it, I used a for loop and iterated the dataframe itself:
for i in signals_df.Name:
signal = signals_df [signals_df.Name.isin ([i])]
row_values = signal.T
vector = row_values.iloc [[1]]
print (vector)

Searching through html in scrapy?

Is it possible to use a for loop to search through the text of tags that correspond to a certain phrase. I've been trying to create this loop but isn't hasn't been working. Any help is appreciated thanks! Here is my code:
def parse_page(self, response):
titles2 = response.xpath('//div[#id = "mainColumn"]/h1/text()').extract_first()
year = response.xpath('//div[#id = "mainColumn"]/h1/span/text()').extract()[0].strip()
aud = response.xpath('//div[#id="scorePanel"]/div[2]')
a_score = aud.xpath('./div[1]/a/div/div[2]/div[1]/span/text()').extract()
a_count = aud.xpath('./div[2]/div[2]/text()').extract()
c_score = response.xpath('//a[#id = "tomato_meter_link"]/span/span[1]/text()').extract()[0].strip()
c_count = response.xpath('//div[#id = "scoreStats"]/div[3]/span[2]/text()').extract()[0].strip()
info = response.xpath('//div[#class="panel-body content_body"]/ul')
mp_rating = info.xpath('./li[1]/div[2]/text()').extract()[0].strip()
genre = info.xpath('./li[2]/div[2]/a/text()').extract_first()
date = info.xpath('./li[5]/div[2]/time/text()').extract_first()
box = response.xpath('//section[#class = "panel panel-rt panel-box "]/div')
actor1 = box.xpath('./div/div[1]/div/a/span/text()').extract()
actor2 = box.xpath('./div/div[2]/div/a/span/text()').extract()
actor3 = box.xpath('./div/div[3]/div/a/span/text()').extract_first()
for x in info.xpath('//li'):
if info.xpath("./li[x]/div[1][contains(text(), 'Box Office: ')/text()]]
box_office = info.xpath('./li[x]/div[2]/text()')
else if info.xpath('./li[x]/div[1]/text()').extract[0] == "Runtime: "):
runtime = info.xpath('./li[x]/div[2]/time/text()')
Your for loop is completely wrong:
1. You're using info. but searching from the root
for x in info.xpath('.//li'):
2. x is a HTML node element and you can use it this way:
if x.xpath("./div[1][contains(., 'Box Office: ')]"):
box_office = x.xpath('./div[2]/text()').extract_first()
I think you might need re() or re_first() to match the certain phrase.
For example:
elif info.xpath('./li[x]/div[1]/text()').re_first('Runtime:') == "Runtime: "):
runtime = info.xpath('./li[x]/div[2]/time/text()')
And you need to modify your for loop, cuz the variable x in it is actually a Selector but not a number, so it's not right to use it like this: li[x].
gangabass in the last answer made a good point on this.

Index similar entries in Python

I have a column of data (easily imported from Google Docs thanks to gspread) that I'd like to intelligently align. I ingest entries into a dictionary. Input can include email, twitter handle or a blog URL. For example:
mike.j#gmail.com
#mikej45
j.mike#world.eu
_http://tumblr.com/mikej45
Right now, the "dumb" version is:
def NomineeCount(spreadsheet):
worksheet = spreadsheet.sheet1
nominees = worksheet.col_values(6) # F = 6
unique_nominees = {}
for c in nominees:
pattern = re.compile(r'\s+')
c = re.sub(pattern, '', c)
if unique_nominees.has_key(c) == True: # If we already have the name
unique_nominees[c] += 1
else:
unique_nominees[c] = 1
# Print out the alphabetical list of nominees with leading vote count
for w in sorted(unique_nominees.keys()):
print string.rjust(str(unique_nominees[w]), 2)+ " " + w
return nominees
What's an efficient(-ish) way to add in some smarts during the if process?
You can try with defaultdict:
from collections import defaultdict
unique_nominees = defaultdict(lambda: 0)
unique_nominees[c] += 1

Categories