Conversion of Text sentences to CONLL Format - python
I want to convert the Normal english text into CONLL-U format for maltparser for finding dependency in the text in Python. I tried in java but was failed to do so, below is the format I'm looking for-
String[] tokens = new String[11];
tokens[0] = "1\thappiness\t_\tN\tNN\tDD|SS\t2\tSS";
tokens[1] = "2\tis\t_\tV\tVV\tPS|SM\t0\tROOT";
tokens[2] = "3\tthe\t_\tAB\tAB\tKS\t2\t+A";
tokens[3] = "4\tkey\t_\tPR\tPR\t_\t2\tAA";
tokens[4] = "5\tof\t_\tN\tEN\t_\t7\tDT";
tokens[5] = "6\tsuccess\t_\tP\tTP\tPA\t7\tAT";
tokens[6] = "7\tin\t_\tN\tNN\t_\t4\tPA";
tokens[7] = "8\tthis\t_\tPR\tPR\t_\t7\tET";
tokens[8] = "9\tlife\t_\tR\tRO\t_\t10\tDT";
tokens[9] = "10\tfor\t_\tN\tNN\t_\t8\tPA";
tokens[10] = "11\tsure\t_\tP\tIP\t_\t2\tIP";
I have tried in java but I can not use the standford APIs, I want the same in python.
//This is the example of java code but here the tokens which is created needs to be parsed via code not manually-
MaltParserService service = new MaltParserService(true);
// in the CoNLL data format.
String[] tokens = new String[11];
tokens[0] = "1\thappiness\t_\tN\tNN\tDD|SS\t2\tSS";
tokens[1] = "2\tis\t_\tV\tVV\tPS|SM\t0\tROOT";
tokens[2] = "3\tthe\t_\tAB\tAB\tKS\t2\t+A";
tokens[3] = "4\tkey\t_\tPR\tPR\t_\t2\tAA";
tokens[4] = "5\tof\t_\tN\tEN\t_\t7\tDT";
tokens[5] = "6\tsuccess\t_\tP\tTP\tPA\t7\tAT";
tokens[6] = "7\tin\t_\tN\tNN\t_\t4\tPA";
tokens[7] = "8\tthis\t_\tPR\tPR\t_\t7\tET";
tokens[8] = "9\tlife\t_\tR\tRO\t_\t10\tDT";
tokens[9] = "10\tfor\t_\tN\tNN\t_\t8\tPA";
tokens[10] = "11\tsure\t_\tP\tIP\t_\t2\tIP";
// Print out the string array
for (int i = 0; i < tokens.length; i++) {
System.out.println(tokens[i]);
}
// Reads the data format specification file
DataFormatSpecification dataFormatSpecification = service.readDataFormatSpecification(args[0]);
// Use the data format specification file to build a dependency structure based on the string array
DependencyStructure graph = service.toDependencyStructure(tokens, dataFormatSpecification);
// Print the dependency structure
System.out.println(graph);
There is now a port of the Standford library to Python (with improvements) called Stanza. You can find it here: https://stanfordnlp.github.io/stanza/
Example of usage:
>>> import stanza
>>> stanza.download('en') # download English model
>>> nlp = stanza.Pipeline('en') # initialize English neural pipeline
>>> doc = nlp("Barack Obama was born in Hawaii.") # run annotation over a sentence
Related
How to read inline-styles from WxPython
I'm trying to put text into a RichTextCtrl and then, after the user has made edits, I want to get the edited text back out along with the styles. Its the second part I'm having trouble with. Out of all the methods to get styles out of the buffer, none of them are really user-friendly. The best I've come up with is to walk through the text a character at a time with GetStyleForRange(range, style). There has got to be a better way to do this! Here's my code now, which walks through gathering a list of text segments and styles. Please give me a better way to do this. I have to be missing something. buffer: wx.richtext.RichTextBuffer = self.rtc.GetBuffer() end = len(buffer.GetText()) # Variables for text/style reading loop ch: str curStyle: str i: int = 0 style = wx.richtext.RichTextAttr() text: List[str] = [] textItems: List[Tuple[str, str]] = [] # Read the style of the first character self.rtc.GetStyleForRange(wx.richtext.RichTextRange(i, i + 1), style) curStyle = self.describeStyle(style) # Loop until we hit the end. Use a while loop so we can control the index increment. while i < end + 1: # Read the current character and its style as `ch` and `newStyle` ch = buffer.GetTextForRange(wx.richtext.RichTextRange(i, i)) self.rtc.GetStyleForRange(wx.richtext.RichTextRange(i, i + 1), style) newStyle = self.describeStyle(style) # If the style has changed, we flush the collected text and start new collection if text and newStyle != curStyle and ch != '\n': newText = "".join(text) textItems.append((newText, curStyle)) text = [] self.rtc.GetStyleForRange(wx.richtext.RichTextRange(i + 1, i + 2), style) curStyle = self.describeStyle(style) # Otherwise, collect the character and continue else: i += 1 text.append(ch) # Capture the last text being collected newText = "".join(text) textItems.append((newText, newStyle))
Here's a C++ version of the solution I mentioned in the comment above. It's a simple tree walk using a queue, so I think should be translatable to python easily. const wxRichTextBuffer& buffer = m_richText1->GetBuffer(); std::deque<const wxRichTextObject*> objects; objects.push_front(&buffer); while ( !objects.empty() ) { const wxRichTextObject* curObject = objects.front(); objects.pop_front(); if ( !curObject->IsComposite() ) { wxRichTextRange range = curObject->GetRange(); const wxRichTextAttr& attr = curObject->GetAttributes(); // Do something with range and attr here. } else { // This is a composite object. Add its children to the queue. // The children are added in reverse order to do a depth first walk. const wxRichTextCompositeObject* curComposite = static_cast<const wxRichTextCompositeObject*>(curObject); size_t childCount = curComposite->GetChildCount() ; for ( int i = childCount - 1 ; i >= 0 ; --i ) { objects.push_front(curComposite->GetChild(i)); } } }
How can I save the headers and values in Html <script> as a table in the csv file?
I'm new to writing code. Using slenium and beautifulsoup, I managed to reach the script I want among dozens of scripts on the web page. I am looking for script [17]. When these codes are executed, the script [17] gives a result as follows. the last part of my codes html=driver.page_source soup=BeautifulSoup(html, "html.parser") scripts=soup.find_all("script") x=scripts[17] print(x) result, output note: The list of dates is ahead of the script [17]. slide the bar. Dummy Data Dummy Data <script language="JavaScript"> var theHlp='/yardim/matris.asp';var theTitle = 'Piyasa Değeri';var theCaption='Cam (TL)';var lastmod = '';var h='<a class=hisselink href=../Hisse/HisseAnaliz.aspx?HNO=';var e='<a class=hisselink href=../endeks/endeksAnaliz.aspx?HNO=';var d='<center><font face=symbol size=1 color=#FF0000><b>ß</b></font></center>';var u='<center><font face=symbol size=1 color=#008000><b>İ</b></font></center>';var n='<center><font face=symbol size=1 color=#00A000><b>=</b></font></center>';var fr='<font color=#FF0000>';var fg='<font color=#008000>';var theFooter=new Array();var theCols = new Array();theCols[0] = new Array('Hisse',4,50);theCols[1] = new Array('2012.12',1,60);theCols[2] = new Array('2013.03',1,60);theCols[3] = new Array('2013.06',1,60);theCols[4] = new Array('2013.09',1,60);theCols[5] = new Array('2013.12',1,60);theCols[6] = new Array('2014.03',1,60);theCols[7] = new Array('2014.06',1,60);theCols[8] = new Array('2014.09',1,60);theCols[9] = new Array('2014.12',1,60);theCols[10] = new Array('2015.03',1,60);theCols[11] = new Array('2015.06',1,60);theCols[12] = new Array('2015.09',1,60);theCols[13] = new Array('2015.12',1,60);theCols[14] = new Array('2016.03',1,60);theCols[15] = new Array('2016.06',1,60);theCols[16] = new Array('2016.09',1,60);theCols[17] = new Array('2016.12',1,60);theCols[18] = new Array('2017.03',1,60);theCols[19] = new Array('2017.06',1,60);theCols[20] = new Array('2017.09',1,60);theCols[21] = new Array('2017.12',1,60);theCols[22] = new Array('2018.03',1,60);theCols[23] = new Array('2018.06',1,60);theCols[24] = new Array('2018.09',1,60);theCols[25] = new Array('2018.12',1,60);theCols[26] = new Array('2019.03',1,60);theCols[27] = new Array('2019.06',1,60);theCols[28] = new Array('2019.09',1,60);theCols[29] = new Array('2019.12',1,60);theCols[30] = new Array('2020.03',1,60);var theRows = new Array(); theRows[0] = new Array ('<b>'+h+'30>ANA</B></a>','1,114,919,783.60','1,142,792,778.19','1,091,028,645.38','991,850,000.48','796,800,000.38','697,200,000.34','751,150,000.36','723,720,000.33','888,000,000.40','790,320,000.36','883,560,000.40','927,960,000.42','737,040,000.33','879,120,000.40','914,640,000.41','927,960,000.42','1,172,160,000.53','1,416,360,000.64','1,589,520,000.72','1,552,500,000.41','1,972,500,000.53','2,520,000,000.67','2,160,000,000.58','2,475,000,000.66','2,010,000,000.54','2,250,000,000.60','2,077,500,000.55','2,332,500,000.62','3,270,000,000.87','2,347,500,000.63'); theRows[1] = new Array ('<b>'+h+'89>DEN</B></a>','55,200,000.00','55,920,000.00','45,960,000.00','42,600,000.00','35,760,000.00','39,600,000.00','40,200,000.00','47,700,000.00','50,460,000.00','45,300,000.00','41,760,000.00','59,340,000.00','66,600,000.00','97,020,000.00','81,060,000.00','69,300,000.00','79,800,000.00','68,400,000.00','66,900,000.00','66,960,000.00','71,220,000.00','71,520,000.00','71,880,000.00','60,600,000.00','69,120,000.00','62,640,000.00','57,180,000.00','89,850,000.00','125,100,000.00','85,350,000.00'); theRows[2] = new Array ('<b>'+h+'269>SIS</B></a>','4,425,000,000.00','4,695,000,000.00','4,050,000,000.00','4,367,380,000.00','4,273,120,000.00','3,644,720,000.00','4,681,580,000.00','4,913,000,000.00','6,188,000,000.00','5,457,000,000.00','6,137,000,000.00','5,453,000,000.00','6,061,000,000.00','6,954,000,000.00','6,745,000,000.00','6,519,000,000.00','7,851,500,000.00','8,548,500,000.00','9,430,000,000.00','9,225,000,000.00','10,575,000,000.00','11,610,000,000.00','9,517,500,000.00','13,140,000,000.00','12,757,500,000.00','13,117,500,000.00','11,677,500,000.00','10,507,500,000.00','11,857,500,000.00','9,315,000,000.00'); theRows[3] = new Array ('<b>'+h+'297>TRK</B></a>','1,692,579,200.00','1,983,924,800.00','1,831,315,200.00','1,704,000,000.00','1,803,400,000.00','1,498,100,000.00','1,803,400,000.00','1,884,450,000.00','2,542,160,000.00','2,180,050,000.00','2,069,200,000.00','1,682,600,000.00','1,619,950,000.00','1,852,650,000.00','2,040,600,000.00','2,315,700,000.00','2,641,200,000.00','2,938,800,000.00','3,599,100,000.00','4,101,900,000.00','5,220,600,000.00','5,808,200,000.00','4,689,500,000.00','5,375,000,000.00','3,787,500,000.00','4,150,000,000.00','3,662,500,000.00','3,712,500,000.00','4,375,000,000.00','3,587,500,000.00'); var thetable=new mytable();thetable.tableWidth=650;thetable.shownum=false;thetable.controlaccess=true;thetable.visCols=new Array(true,true,true,true,true);thetable.initsort=new Array(0,-1);thetable.inittable();thetable.refreshTable();</script> My purpose is to extract this output into a table and save it as a csv file. How can i extract this script as i want? all dates should be on top, all names should be on the far right, all values should be between the two. Hisse 2012.12 2013.3 2013.4 ... ANA 1,114,919,783.60 1,142,792,778.19 1,091,028,645.38 ... DEN 55,200,000.00 55,920,000.00 45,960,000.00 .... . . .
Solution The custom-function process_scripts() will produce what you are looking for. I am using the dummy data given below (at the end). First we check that the code does what it is expected and so we create a pandas dataframe to see the output. You could also open this Colab Jupyter Notebook and run it on Cloud for free. This will allow you to not worry about any installation or setup and simply focus on examining the solution itself. 1. Processing A Single Script ## Define CSV file-output folder-path OUTPUT_PATH = './output' ## Process scripts dfs = process_scripts(scripts = [s], output_path = OUTPUT_PATH, save_to_csv = False, verbose = 0) print(dfs[0].reset_index(drop=True)) Output: Name 2012.12 ... 2019.12 2020.03 0 ANA 1,114,919,783.60 ... 3,270,000,000.87 2,347,500,000.63 1 DEN 55,200,000.00 ... 125,100,000.00 85,350,000.00 2 SIS 4,425,000,000.00 ... 11,857,500,000.00 9,315,000,000.00 3 TRK 1,692,579,200.00 ... 4,375,000,000.00 3,587,500,000.00 [4 rows x 31 columns] 2. Processing All the Scripts You can process all your scripts using the custom-define function process_scripts(). The code is given below. ## Define CSV file-output folder-path OUTPUT_PATH = './output' ## Process scripts dfs = process_scripts(scripts, output_path = OUTPUT_PATH, save_to_csv = True, verbose = 0) ## To clear the output dir-contents #!rm -f $OUTPUT_PATH/* I did this on Google Colab and it worked as expected. 3. Making Paths in OS-agnostic Manner Making paths for windows or unix based systems could be very different. The following shows you a method to achieve that without having to worry about which OS you will run the code. I have used the os library here. However, I would suggest you to look at the Pathlib library as well. # Define relative path for output-folder OUTPUT_PATH = './output' # Dynamically define absolute path pwd = os.getcwd() # present-working-directory OUTPUT_PATH = os.path.join(pwd, os.path.abspath(OUTPUT_PATH)) 4. Code: custom function process_scripts() Here we use the regex (regular expression) library, along with pandas for organizing the data in a tabular format and then writing to csv file. The tqdm library is used to give you a nice progressbar while processing multiple scripts. Please see the comments in the code to know what to do if you are running it not from a jupyter notebook. The os library is used for path manipulation and creation of output-directory. #pip install -U pandas #pip install tqdm import pandas as pd import re # regex import os from tqdm.notebook import tqdm # Use the following line if not using a jupyter notebook # from tqdm import tqdm def process_scripts(scripts, output_path='./output', save_to_csv: bool=False, verbose: int=0): """Process all scripts and return a list of dataframes and optionally write each dataframe to a CSV file. Parameters ---------- scripts: list of scripts output_path (str): output-folder-path for csv files save_to_csv (bool): default is False verbose (int): prints output for verbose>0 Example ------- OUTPUT_PATH = './output' dfs = process_scripts(scripts, output_path = OUTPUT_PATH, save_to_csv = True, verbose = 0) ## To clear the output dir-contents #!rm -f $OUTPUT_PATH/* """ ## Define regex patterns and compile for speed pat_header = re.compile(r"theCols\[\d+\] = new Array\s*\([\'](\d{4}\.\d{1,2})[\'],\d+,\d+\)") pat_line = re.compile(r"theRows\[\d+\] = new Array\s*\((.*)\).*") pat_code = re.compile("([A-Z]{3})") # Calculate zfill-digits zfill_digits = len(str(len(scripts))) print(f'Total scripts: {len(scripts)}') # Create output_path if not os.path.exists(output_path): os.makedirs(output_path) # Define a list of dataframes: # An accumulator of all scripts dfs = [] ## If you do not have tqdm installed, uncomment the # next line and comment out the following line. #for script_num, script in enumerate(scripts): for script_num, script in enumerate(tqdm(scripts, desc='Scripts Processed')): ## Extract: Headers, Rows # Rows : code (Name: single column), line-data (multi-column) headers = script.strip().split('\n', 0)[0] headers = ['Name'] + re.findall(pat_header, headers) lines = re.findall(pat_line, script) codes = [re.findall(pat_code, line)[0] for line in lines] # Clean data for each row lines_data = dict() for line, code in zip(lines, codes): line_data = line.replace("','", "|").split('|') line_data[-1] = line_data[-1].replace("'", "") line_data[0] = code lines_data.update({code: line_data.copy()}) if verbose>0: print('{}: {}'.format(script_num, codes)) ## Load data into a pandas-dataframe # and write to csv. df = pd.DataFrame(lines_data).T df.columns = headers dfs.append(df.copy()) # update list # Write to CSV if save_to_csv: num_label = str(script_num).zfill(zfill_digits) script_file_csv = f'Script_{num_label}.csv' script_path = os.path.join(output_path, script_file_csv) df.to_csv(script_path, index=False) return dfs 5. Dummy Data ## Dummy Data s = """ <script language="JavaScript"> var theHlp='/yardim/matris.asp';var theTitle = 'Piyasa Değeri';var theCaption='Cam (TL)';var lastmod = '';var h='<a class=hisselink href=../Hisse/HisseAnaliz.aspx?HNO=';var e='<a class=hisselink href=../endeks/endeksAnaliz.aspx?HNO=';var d='<center><font face=symbol size=1 color=#FF0000><b>ß</b></font></center>';var u='<center><font face=symbol size=1 color=#008000><b>İ</b></font></center>';var n='<center><font face=symbol size=1 color=#00A000><b>=</b></font></center>';var fr='<font color=#FF0000>';var fg='<font color=#008000>';var theFooter=new Array();var theCols = new Array();theCols[0] = new Array('Hisse',4,50);theCols[1] = new Array('2012.12',1,60);theCols[2] = new Array('2013.03',1,60);theCols[3] = new Array('2013.06',1,60);theCols[4] = new Array('2013.09',1,60);theCols[5] = new Array('2013.12',1,60);theCols[6] = new Array('2014.03',1,60);theCols[7] = new Array('2014.06',1,60);theCols[8] = new Array('2014.09',1,60);theCols[9] = new Array('2014.12',1,60);theCols[10] = new Array('2015.03',1,60);theCols[11] = new Array('2015.06',1,60);theCols[12] = new Array('2015.09',1,60);theCols[13] = new Array('2015.12',1,60);theCols[14] = new Array('2016.03',1,60);theCols[15] = new Array('2016.06',1,60);theCols[16] = new Array('2016.09',1,60);theCols[17] = new Array('2016.12',1,60);theCols[18] = new Array('2017.03',1,60);theCols[19] = new Array('2017.06',1,60);theCols[20] = new Array('2017.09',1,60);theCols[21] = new Array('2017.12',1,60);theCols[22] = new Array('2018.03',1,60);theCols[23] = new Array('2018.06',1,60);theCols[24] = new Array('2018.09',1,60);theCols[25] = new Array('2018.12',1,60);theCols[26] = new Array('2019.03',1,60);theCols[27] = new Array('2019.06',1,60);theCols[28] = new Array('2019.09',1,60);theCols[29] = new Array('2019.12',1,60);theCols[30] = new Array('2020.03',1,60);var theRows = new Array(); theRows[0] = new Array ('<b>'+h+'30>ANA</B></a>','1,114,919,783.60','1,142,792,778.19','1,091,028,645.38','991,850,000.48','796,800,000.38','697,200,000.34','751,150,000.36','723,720,000.33','888,000,000.40','790,320,000.36','883,560,000.40','927,960,000.42','737,040,000.33','879,120,000.40','914,640,000.41','927,960,000.42','1,172,160,000.53','1,416,360,000.64','1,589,520,000.72','1,552,500,000.41','1,972,500,000.53','2,520,000,000.67','2,160,000,000.58','2,475,000,000.66','2,010,000,000.54','2,250,000,000.60','2,077,500,000.55','2,332,500,000.62','3,270,000,000.87','2,347,500,000.63'); theRows[1] = new Array ('<b>'+h+'89>DEN</B></a>','55,200,000.00','55,920,000.00','45,960,000.00','42,600,000.00','35,760,000.00','39,600,000.00','40,200,000.00','47,700,000.00','50,460,000.00','45,300,000.00','41,760,000.00','59,340,000.00','66,600,000.00','97,020,000.00','81,060,000.00','69,300,000.00','79,800,000.00','68,400,000.00','66,900,000.00','66,960,000.00','71,220,000.00','71,520,000.00','71,880,000.00','60,600,000.00','69,120,000.00','62,640,000.00','57,180,000.00','89,850,000.00','125,100,000.00','85,350,000.00'); theRows[2] = new Array ('<b>'+h+'269>SIS</B></a>','4,425,000,000.00','4,695,000,000.00','4,050,000,000.00','4,367,380,000.00','4,273,120,000.00','3,644,720,000.00','4,681,580,000.00','4,913,000,000.00','6,188,000,000.00','5,457,000,000.00','6,137,000,000.00','5,453,000,000.00','6,061,000,000.00','6,954,000,000.00','6,745,000,000.00','6,519,000,000.00','7,851,500,000.00','8,548,500,000.00','9,430,000,000.00','9,225,000,000.00','10,575,000,000.00','11,610,000,000.00','9,517,500,000.00','13,140,000,000.00','12,757,500,000.00','13,117,500,000.00','11,677,500,000.00','10,507,500,000.00','11,857,500,000.00','9,315,000,000.00'); theRows[3] = new Array ('<b>'+h+'297>TRK</B></a>','1,692,579,200.00','1,983,924,800.00','1,831,315,200.00','1,704,000,000.00','1,803,400,000.00','1,498,100,000.00','1,803,400,000.00','1,884,450,000.00','2,542,160,000.00','2,180,050,000.00','2,069,200,000.00','1,682,600,000.00','1,619,950,000.00','1,852,650,000.00','2,040,600,000.00','2,315,700,000.00','2,641,200,000.00','2,938,800,000.00','3,599,100,000.00','4,101,900,000.00','5,220,600,000.00','5,808,200,000.00','4,689,500,000.00','5,375,000,000.00','3,787,500,000.00','4,150,000,000.00','3,662,500,000.00','3,712,500,000.00','4,375,000,000.00','3,587,500,000.00'); var thetable=new mytable();thetable.tableWidth=650;thetable.shownum=false;thetable.controlaccess=true;thetable.visCols=new Array(true,true,true,true,true);thetable.initsort=new Array(0,-1);thetable.inittable();thetable.refreshTable();</script> """ ## Make a dummy list of scripts scripts = [s for _ in range(10)]
According to the provided <script> in your question, you can do something like code below to have a list of Dates for each name ANA, DEN ..: for _ in range(1, len(aaa.split("<b>'"))-1): s = aaa.split("<b>'")[_].split("'") print(_) lst = [] for i in s: if "</B>" in i: name = i.split('>')[1].split("<")[0] print("{} = ".format(name), end="") if any(j.isdigit() for j in i) and ',' in i: lst.append(i) print(lst) It's just an example code, so it's not that beautiful :) Hope this will help you.
Transcribe .docx files via python-docx to modify font and font size. Need to reconstruct paragraphs in target files
The intention is to transcribe .docx files to have modified font and font sizes while keeping the run attributes such as bold, underline, italic etc. I'll then add some headers and graphics to the newly created target.docx files How to reconstruct paragraphs from runs? Each one, currently, gets it's own separate line! from docx import Document from docx.shared import Pt def main(filename): try: src_doc = Document(filename) trg_doc = Document() style = trg_doc.styles['Normal'] font = style.font font.name = 'Times' font.size = Pt(11) for p_cnt in range(len(src_doc.paragraphs)): for r_cnt in range(len(src_doc.paragraphs[p_cnt].runs)): curr_run = src_doc.paragraphs[p_cnt].runs[r_cnt] print('Run: ', curr_run.text) paragraph = trg_doc.add_paragraph() if curr_run.bold: paragraph.add_run(curr_run.text).bold = True elif curr_run.italic: paragraph.add_run(curr_run.text).italic = True elif curr_run.underline: paragraph.add_run(curr_run.text).underline = True else: paragraph.add_run(curr_run.text) trg_doc.save('../Output/the_target.docx') except IOError: print('There was an error opening the file') if __name__ == '__main__': main("../Input/Current_File.docx Input: 1.0 PURPOSE The purpose of this procedure is to ensure all feedback is logged, documented and any resulting complaints are received, evaluated, and reviewed in accordance with 21 CFR Part 820 and ISO 13485 Output: PURPOSE The purpose of this procedure is to ensure all feedback is logged, documented and any resulting complaints are received, evaluated, and reviewed in accordance with 21 CFR P art 820 and ISO 13485 .
You're adding a new paragraph for each run. Your core loop needs to look more like this: for src_paragraph in src_doc.paragraphs: tgt_paragraph = tgt_doc.add_paragraph() for src_run in src_paragraph.runs: print('Run: ', src_run.text) tgt_run = tgt_paragraph.add_run(src_run.text) if src_run.bold: tgt_run.bold = True if src_run.italic: tgt_run.italic = True if src_run.underline: tgt_run.underline = True
Replaced for p_cnt in range(len(src_doc.paragraphs)): for r_cnt in range(len(src_doc.paragraphs[p_cnt].runs)): curr_run = src_doc.paragraphs[p_cnt].runs[r_cnt] Where the construction of runs occur I use a construction similar to that suggested by Scanny. Here each run doesn't become a paragraph. src_doc = docx.Document(path) trgt_doc = docx.api.Document() # Generate new Target file from Source File for src_paragraph in src_doc.paragraphs: src_paragraph_format = src_paragraph.paragraph_format # Get Target section(s) for Headers/Footers sections = trgt_doc.sections section = sections[0] sectPr = section._sectPr footer = section.footer paragraph = footer.paragraphs[0] trgt_paragraph = trgt_doc.add_paragraph() trgt_paragraph_format = trgt_paragraph.paragraph_format trgt_paragraph.style.name = src_paragraph.style.name trgt_paragraph_format.left_indent = src_paragraph_format.left_indent trgt_paragraph_format.right_indent = src_paragraph_format.right_indent trgt_paragraph_format.space_before = Pt(2) trgt_paragraph_format.space_after = Pt(2) font = trgt_paragraph.style.font font.name = 'Times' font.size = Pt(11) # Transcribe source file runs for src_run in src_paragraph.runs: trgt_run = trgt_paragraph.add_run(src_run.text) trgt_paragraph_format = trgt_paragraph.paragraph_format if src_run.font.highlight_color == WD_COLOR_INDEX.BRIGHT_GREEN: trgt_run.font.highlight_color = WD_COLOR_INDEX.BRIGHT_GREEN if src_run.bold: trgt_run.bold = True if src_run.italic: trgt_run.italic = True if src_run.underline: trgt_run.underline = True*
How to replace a part of variable name in a template file
I am trying to write a file from a sample template file. I need to replace ONLY $UPPERINTERFACE with interface. This is the sample template.txt localparam $UPPERINTERFACE_WDTH = 1; localparam $UPPERINTERFACE_DPTH = 8; localparam $UPPERINTERFACE_WTCHD = 2; This is the code: from string import Template intf = "interface" rdfh = open("template.txt", "r").readlines() wrfh = open("myfile.txt", "w") for line in rdfh: s = Template(line) s = s.substitute(UPPERINTERFACE=intf.upper()) wrfh.write(s) rdfh.close() wrfh.close() Expected output: localparam interface_WDTH = 1; localparam interface_DPTH = 8; localparam interface_WTCHD = 2; As it is taking $UPPERINTERFACE_WDTH as a variable to be replaced, I am getting following error: KeyError: 'UPPERINTERFACE_WDTH' Is there any way I can replace only $UPPERINTERFACE with interface here?
You can use curly braces {} to narrow down the template key as in following template string: >>> line = 'localparam ${UPPERINTERFACE}_WDTH = 1;' >>> Template(line).substitute(UPPERINTERFACE=intf.upper()) 'localparam INTERFACE_WDTH = 1;' The documentation states the following: ${identifier} is equivalent to $identifier. It is required when valid identifier characters follow the placeholder but are not part of the placeholder, such as "${noun}ification".
Retrieve User Entry IDs from MAPI
I extended the win32comext MAPI with the Interface IExchangeModifyTable to edit ACLs via the MAPI. I can modify existing ACL entries, but I stuck in adding new entries. I need the users entry ID to add it, according this C example (Example Source from MSDN) STDMETHODIMP AddUserPermission( LPSTR szUserAlias, LPMAPISESSION lpSession, LPEXCHANGEMODIFYTABLE lpExchModTbl, ACLRIGHTS frights) { HRESULT hr = S_OK; LPADRBOOK lpAdrBook; ULONG cbEid; LPENTRYID lpEid = NULL; SPropValue prop[2] = {0}; ROWLIST rowList = {0}; char szExName[MAX_PATH]; // Replace with "/o=OrganizationName/ou=SiteName/cn=Recipients/cn=" char* szServerDN = "/o=org/ou=site/cn=Recipients/cn="; strcpy(szExName, szServerDN); strcat(szExName, szUserAlias); // Open the address book. hr = lpSession->OpenAddressBook(0, 0, MAPI_ACCESS_MODIFY, &lpAdrBook ); if ( FAILED( hr ) ) goto cleanup; // Obtain the entry ID for the recipient. hr = HrCreateDirEntryIdEx(lpAdrBook, szExName, &cbEid, &lpEid); if ( FAILED( hr ) ) goto cleanup; prop[0].ulPropTag = PR_MEMBER_ENTRYID; prop[0].Value.bin.cb = cbEid; prop[0].Value.bin.lpb = (BYTE*)lpEid; prop[1].ulPropTag = PR_MEMBER_RIGHTS; prop[1].Value.l = frights; rowList.cEntries = 1; rowList.aEntries->ulRowFlags = ROW_ADD; rowList.aEntries->cValues = 2; rowList.aEntries->rgPropVals = &prop[0]; hr = lpExchModTbl->ModifyTable(0, &rowList); if(FAILED(hr)) goto cleanup; printf("Added user permission. \n"); cleanup: if (lpAdrBook) lpAdrBook->Release(); return hr; } I can open the Address Book, but HrCreateDirEntryIdEx is not provided in the pywin32 mapi. I found it in the exchange extension, which does not compile on my system, the missing library problem. Do you have any idea to retrieve the users entry ID? Thank. Patrick
I got this piece of code and it works fine from binascii import b2a_hex, a2b_hex import active_directory as ad # entry_type, see http://msdn.microsoft.com/en-us/library/cc840018.aspx # + AB_DT_CONTAINER 0x000000100 # + AB_DT_TEMPLATE 0x000000101 # + AB_DT_OOUSER 0x000000102 # + AB_DT_SEARCH 0x000000200 # ab_flags, maybe see here: https://svn.openchange.org/openchange/trunk/libmapi/mapidefs.h def gen_exchange_entry_id(user_id, ab_flags=0, entry_type = 0): muidEMSAB = "DCA740C8C042101AB4B908002B2FE182" version = 1 # Find user and bail out if it's not there ad_obj = ad.find_user(user_id) if not ad_obj: return None return "%08X%s%08X%08X%s00" % ( ab_flags, muidEMSAB, version, entry_type, b2a_hex(ad_obj.legacyExchangeDN.upper()).upper(), ) data = gen_exchange_entry_id("myusername") print data print len(a2b_hex(data))