Python - looking for a faster way to extract substrings from string

Python - looking for a faster way to extract substrings from string - python

I have a long string, and I've extracted the substrings I wanted. I am looking for a method which uses less lines of code to get my output. I'm after all the sub strings which start with CN=., and removing everything else up-to the semi-colon..
example list output (see picture)
The script I'm currently using is below
import re
import fnmatch
import os
# System call
os.system("")
# Class of different styles
class style():
BLACK = '\033[30m'
RED = '\033[31m'
GREEN = '\033[32m'
YELLOW = '\033[33m'
BLUE = '\033[34m'
MAGENTA = '\033[35m'
CYAN = '\033[36m'
WHITE = '\033[37m'
UNDERLINE = '\033[4m'
RESET = '\033[0m'
CNString = "CN=User2,OU=blurb,OU=Test,DC=Test,DC=Testal;CN=User4,OU=blurb,OU=Test,DC=Test,DC=Testal;CN=User56,OU=blurb,OU=Test,DC=Test,DC=Testal;CN=User9,OU=blurb,OU=Test,DC=Test,DC=Testal;CN=Jane45 user,OU=blurb,OU=Test,DC=Test,DC=Testal;CN=User-Donna,OU=blurb,OU=Test,DC=Test,DC=Testal;CN=User76 smith,OU=blurb,OU=Test4,DC=Test,DC=Testal;CN=Pink Panther,OU=blurb,OU=Test,DC=Testing,DC=Testal;CN=Testuser78,OU=blurb,OU=Tester,DC=Test,DC=Testal;CN=great Scott,OU=blurb,OU=Test,DC=Test,DC=Local;CN=Leah Human,OU=blurb,OU=Test,DC=Test,DC=Testal;CN=Alan Desai,OU=blurb,OU=Test,DC=Test,DC=Testal;CN=Duff Beer,OU=Groups,OU=Test,DC=Test,DC=Testal;CN=Jane Doe,OU=Users,OU=Test76,DC=Test,DC=Testal;CN=simple user67,OU=Users,OU=Test,DC=Test,DC=Testal;CN=test O'Lord,OU=Users,OU=Test,DC=Concero,DC=Testal"
newstring1 = CNString.replace(';','];')
print(newstring1)
newstring2 = newstring1.replace(',OU=',',[OU=')
print(newstring2)
newstring3 = newstring2.replace(',[OU','],[OU')
print(newstring3)
newstring4 = newstring3.replace('],[OU',',[OU')
print(newstring4)
newstring5 = newstring4.replace('];',']];')
print(newstring5)
endstring = "]]"
newstring6 = newstring5 + endstring
print(newstring6)
newstring7 = re.sub("\[.*?\]","()",newstring6)
print(newstring7)
print(style.YELLOW + "Line Break")
newstring8 = newstring7.replace(',()]','')
print(style.RESET + newstring8)
newstring9 = newstring8.split(';')
for cnname in newstring9:
print(style.GREEN + cnname)

Not sure why your code is juggling with those square brackets. Wouldn't this do it?
names = re.findall(r"\bCN=[^,;]*", CNString)

cn_list = [elem.split(",")[0] for elem in CNString.split(";") if elem.startswith("CN=")]
If I print cn_list I obtain:
['CN=User2', 'CN=User4', 'CN=User56', 'CN=User9', 'CN=Jane45 user', 'CN=User-Donna', 'CN=User76 smith', 'CN=Pink Panther', 'CN=Testuser78', 'CN=great Scott', 'CN=Leah Human', 'CN=Alan Desai', 'CN=Duff Beer', 'CN=Jane Doe', 'CN=simple user67', "CN=test O'Lord"]

Related

Shadowed text without knowingly doing it using python-pptx. How to remove it?

I am generating a pptx slide with textboxes using python-pptx module. I am able to generate four textboxes with some text formatted to my liking. However, the text is always formatted with shadowing, where I do not want any. I have been searching the documentation but I cannot seem to find any reference of shadow formatting. The code I have been using is below.
import pptx
from datetime import datetime
import pandas as pd
#Colour Definition
black = (0, 0, 0)
Purple = (91, 43, 130)
#Presentation Information
pptTitle = 'Title'
pptSubtitle = 'Subtitle'
author = 'Name'
date = datetime.now().strftime('%B %d, %Y')
prs = pptx.Presentation()
slide = prs.slides.add_slide(prs.slide_layouts[6])
prs.slide_width, prs.slide_height = pptx.util.Cm(33.867), pptx.util.Cm(19.05)
titleFormat_df = pd.DataFrame.from_dict({
2: [pptTitle, pptx.util.Pt(36), Purple, pptx.enum.text.MSO_ANCHOR.BOTTOM, pptx.util.Cm(1.27), pptx.util.Cm(3.19), pptx.util.Cm(31.33), pptx.util.Cm(5.08)],
3: [pptSubtitle, pptx.util.Pt(28), black, pptx.enum.text.MSO_ANCHOR.TOP, pptx.util.Cm(1.27), pptx.util.Cm(8.29), pptx.util.Cm(31.33), pptx.util.Cm(2.75)],
4: [author, pptx.util.Pt(24), black, pptx.enum.text.MSO_ANCHOR.BOTTOM, pptx.util.Cm(1.27), pptx.util.Cm(11.06), pptx.util.Cm(17.59), pptx.util.Cm(2.03)],
5: [date, pptx.util.Pt(20), black, pptx.enum.text.MSO_ANCHOR.TOP, pptx.util.Cm(1.27), pptx.util.Cm(13.12), pptx.util.Cm(17.59), pptx.util.Cm(1.1)]
})
titleFormat_df.index = ['Text', 'Font Size', 'Colour', 'Vert Anchor', 'left', 'top', 'width', 'height']
for sid in titleFormat_df:
sh = slide.shapes.add_shape(pptx.enum.shapes.MSO_SHAPE.RECTANGLE, titleFormat_df[sid]['left'], titleFormat_df[sid]['top'], titleFormat_df[sid]['width'], titleFormat_df[sid]['height'])
sh.fill.background()
sh.line.fill.background()
tf = sh.text_frame
tf.clear()
tf.vertical_anchor = titleFormat_df[sid]['Vert Anchor']
tf.paragraphs[0].alignment = pptx.enum.text.PP_ALIGN.LEFT
run = tf.paragraphs[0].add_run()
run.text = titleFormat_df[sid]['Text']
run.font.name = 'Arial'
run.font.size = titleFormat_df[sid]['Font Size']
c = titleFormat_df[sid]['Colour']
run.font.color.rgb = pptx.dml.color.RGBColor(c[0], c[1], c[2])
prs.save('test.pptx')
How can I remove this formatting option? And how can I add when I want it that way?
Thanks in advance!

Pyparsing nested transformString

I had something working for a little while to transform a tag from lua to hmtl, but recently I got a special case where those tags could be nested. Here is a quick sample out of my code :
from pyparsing import Literal, Word, Suppress, SkipTo, LineEnd, hexnums
text = "|c71d5FFFFI'm saying something in color|cFFFFFFFF then in white |r|r"
def colorize (t):
hexRGB = "".join(list(t.hex)[:6])
return "<span style=\"color:#{};\">{}</span>".format(hexRGB, t.content)
vbar = Literal("|")
eol = LineEnd().suppress()
endTag = ((vbar + (Literal("r")|Literal("R"))|eol))
parser = (
Suppress(vbar + (Literal("c")|Literal("C"))) +
Word(hexnums, exact=8).setResultsName("hex") +
SkipTo(endTag).setResultsName("content") +
Suppress(endTag)
).addParseAction(colorize)
result = parser.transformString(text)
print (result)
I saw an another similar question Pyparsing: nested Markdown emphasis, but my problem is a bit different, sometime there is no closetag and lineEnd is acting as one.

You can add a while loop to iterate over result until all the colors are found:
from pyparsing import Literal, Word, Suppress, SkipTo, LineEnd, hexnums
def colorize (t):
hexRGB = "".join(list(t.hex)[:6])
return "<span style=\"color:#{};\">{}</span>".format(hexRGB, t.content)
vbar = Literal("|")
eol = LineEnd().suppress()
endTag = ((vbar + (Literal("r")|Literal("R"))|eol))
parser = (
Suppress(vbar + (Literal("c")|Literal("C"))) +
Word(hexnums, exact=8).setResultsName("hex") +
SkipTo(endTag).setResultsName("content") +
Suppress(endTag)
).addParseAction(colorize)
result = parser.transformString(text)
new_result = parser.transformString(result)
while(result != new_result):
result = new_result
new_result = parser.transformString(result)
print (result)
when text = "|c71d5FFFFI'm saying something in color|cFFFFFFFF then in white |r|r":
output:
<span style="color:#71d5FF;">I'm saying something in color<span style="color:#FFFFFF;"> then in white</span></span>
when text = "|c71d5FFFFI'm saying something in color"
output:
<span style="color:#71d5FF;">I'm saying something in color</span>

Color Text Only Partially Working [duplicate]

This question already has an answer here:
Printing colors in python terminal [duplicate]
(1 answer)
Closed 6 years ago.
When I input something like:
print('\x1b[6;30;42m' + 'Success!' + '\x1b[0m')
the output I get is:
With no color.
I've heard you any need to enable vt100 emulation for windows, but when I search for how to do that, I haven't seen any answers.
All answers are very appreciated!

printf("\033[1;42m Success! \033[0m \n");
color as below:
none = "\033[0m"
black = "\033[0;30m"
dark_gray = "\033[1;30m"
blue = "\033[0;34m"
light_blue = "\033[1;34m"
green = "\033[0;32m"
light_green -= "\033[1;32m"
cyan = "\033[0;36m"
light_cyan = "\033[1;36m"
red = "\033[0;31m"
light_red = "\033[1;31m"
purple = "\033[0;35m"
light_purple = "\033[1;35m"
brown = "\033[0;33m"
yellow = "\033[1;33m"
light_gray = "\033[0;37m"
white = "\033[1;37m"

Python docx add_paragraph() inserts leading newline

I'm able to use a paragraph object to select font size, color, bold, etc. within a table cell. But, add_paragraph() seems to always insert a leading \n into the cell and this messes up the formatting on some tables.
If I just use the cell.text('') method it doesn't insert this newline but then I can't control the text attributes.
Is there a way to eliminate this leading newline?
Here is my function:
def add_table_cell(table, row, col, text, fontSize=8, r=0, g=0, b=0, width=-1):
cell = table.cell(row,col)
if (width!=-1):
cell.width = Inches(width)
para = cell.add_paragraph(style=None)
para.alignment = WD_ALIGN_PARAGRAPH.LEFT
run = para.add_run(text)
run.bold = False
run.font.size = Pt(fontSize)
run.font.color.type == MSO_COLOR_TYPE.RGB
run.font.color.rgb = RGBColor(r, g, b)

I tried the following and it worked out for me. Not sure if is the best approach:
cells[0].text = 'Some text' #Write the text to the cell
#Modify the paragraph alignment, first paragraph
cells[0].paragraphs[0].paragraph_format.alignment=WD_ALIGN_PARAGRAPH.CENTER

The solution that I find is to use text attribute instead of add_paragraph() but than use add_run():
row_cells[0].text = ''
row_cells[0].paragraphs[0].add_run('Total').bold = True
row_cells[0].paragraphs[0].paragraph_format.alignment = WD_ALIGN_PARAGRAPH.RIGHT

I've look through the documentation of cell, and it's not the problem of add_paragraph(). The problem is when you having a cell, by default, it will have a paragraph inside it.
class docx.table._Cell:
paragraphs: ... By default, a new cell contains a single paragraph. Read-only
Therefore, if you want to add paragraphs in the first row of cell, you should first delete the default paragraph first. Since python-docx don't have paragraph.delete(), you can use the function mention in this github issue: feature: Paragraph.delete()
def delete_paragraph(paragraph):
p = paragraph._element
p.getparent().remove(p)
p._p = p._element = None
Therefore, you should do something like:
cell = table.cell(0,0)
paragraph = cell.paragraphs[0]
delete_paragraph(paragraph)
paragraph = cell.add_paragraph('text you want to add', style='style you want')
Update at 10/8/2022
Sorry, the above approach is kinda unnecessary.
It's much intuitive to edit the default paragraph instead of first deleting it and add it back.
For the function add_table_cell, just replace the para = cell.paragraphs[0]
and para.style = None, the para.style = None is not necessary as it should be default value for a new paragraph.

Here is what worked for me. I don't call add_paragraph(). I just reference the first paragraph with this call -> para = cell.paragraphs[0]. Everything else after that is the usual api calls.
table = doc.add_table( rows=1, cols=3 ) # bar codes
for tableRow in table.rows:
for cell in tableRow.cells:
para = cell.paragraphs[0]
run = para.add_run( "*" + specIDStr + "*" )
font = run.font
font.name = 'Free 3 of 9'
font.size = Pt( 20 )
run = para.add_run( "\n" + specIDStr
+ "\n" + firstName + " " + lastName
+ "\tDOB: " + dob )
font = run.font
font.name = 'Arial'
font.size = Pt( 8 )

colour terminal text effects with Python

I'm trying to implement colour cycling on my text in Python, ie i want it to cycle through the colour of every character typed (amongst other effects) My progress so far has been hacked together from an ansi colour recipe improvement suggestions welcomed.
I was also vaguely aware of, but never used: termcolor, colorama, curses
during the hack i managed to make the attributes not work (ie reverse blink etc) and its not perfect probably mainly because I dont understand these lines properly:
cmd.append(format % (colours[tmpword]+fgoffset))
c=format % attrs[tmpword] if tmpword in attrs else None
if anyone can clarify that a bit, I would appreciate it. this runs and does something, but its not quite there. I changed the code so instead of having to separate colour commands from your string you can include them.
#!/usr/bin/env python
'''
"arg" is a string or None
if "arg" is None : the terminal is reset to his default values.
if "arg" is a string it must contain "sep" separated values.
if args are found in globals "attrs" or "colors", or start with "#" \
they are interpreted as ANSI commands else they are output as text.
#* commands:
#x;y : go to xy
# : go to 1;1
## : clear screen and go to 1;1
#[colour] : set foreground colour
^[colour] : set background colour
examples:
echo('#red') : set red as the foreground color
echo('#red ^blue') : red on blue
echo('#red #blink') : blinking red
echo() : restore terminal default values
echo('#reverse') : swap default colors
echo('^cyan #blue reverse') : blue on cyan <=> echo('blue cyan)
echo('#red #reverse') : a way to set up the background only
echo('#red #reverse #blink') : you can specify any combinaison of \
attributes in any order with or without colors
echo('#blink Python') : output a blinking 'Python'
echo('## hello') : clear the screen and print 'hello' at 1;1
colours:
{'blue': 4, 'grey': 0, 'yellow': 3, 'green': 2, 'cyan': 6, 'magenta': 5, 'white': 7, 'red': 1}
'''
'''
Set ANSI Terminal Color and Attributes.
'''
from sys import stdout
import random
import sys
import time
esc = '%s['%chr(27)
reset = '%s0m'%esc
format = '1;%dm'
fgoffset, bgoffset = 30, 40
for k, v in dict(
attrs = 'none bold faint italic underline blink fast reverse concealed',
colours = 'grey red green yellow blue magenta cyan white'
).items(): globals()[k]=dict((s,i) for i,s in enumerate(v.split()))
bpoints = ( " [*] ", " [!] ", )
def echo(arg=None, sep=' ', end='\n', rndcase=True, txtspeed=0.03, bnum=0):
cmd, txt = [reset], []
if arg:
if bnum != 0:
sys.stdout.write(bpoints[bnum-1])
# split the line up into 'sep' seperated values - arglist
arglist=arg.split(sep)
# cycle through arglist - word seperated list
for word in arglist:
if word.startswith('#'):
### First check for a colour command next if deals with position ###
# go through each fg and bg colour
tmpword = word[1:]
if tmpword in colours:
cmd.append(format % (colours[tmpword]+fgoffset))
c=format % attrs[tmpword] if tmpword in attrs else None
if c and c not in cmd:
cmd.append(c)
stdout.write(esc.join(cmd))
continue
# positioning (starts with #)
word=word[1:]
if word=='#':
cmd.append('2J')
cmd.append('H')
stdout.write(esc.join(cmd))
continue
else:
cmd.append('%sH'%word)
stdout.write(esc.join(cmd))
continue
if word.startswith('^'):
### First check for a colour command next if deals with position ###
# go through each fg and bg colour
tmpword = word[1:]
if tmpword in colours:
cmd.append(format % (colours[tmpword]+bgoffset))
c=format % attrs[tmpword] if tmpword in attrs else None
if c and c not in cmd:
cmd.append(c)
stdout.write(esc.join(cmd))
continue
else:
for x in word:
if rndcase:
# thankyou mark!
if random.randint(0,1):
x = x.upper()
else:
x = x.lower()
stdout.write(x)
stdout.flush()
time.sleep(txtspeed)
stdout.write(' ')
time.sleep(txtspeed)
if txt and end: txt[-1]+=end
stdout.write(esc.join(cmd)+sep.join(txt))
if __name__ == '__main__':
echo('##') # clear screen
#echo('#reverse') # attrs are ahem not working
print 'default colors at 1;1 on a cleared screen'
echo('#red hello this is red')
echo('#blue this is blue #red i can ^blue change #yellow blah #cyan the colours in ^default the text string')
print
echo()
echo('default')
echo('#cyan ^blue cyan blue')
print
echo()
echo('#cyan this text has a bullet point',bnum=1)
print
echo('#yellow this yellow text has another bullet point',bnum=2)
print
echo('#blue this blue text has a bullet point and no random case',bnum=1,rndcase=False)
print
echo('#red this red text has no bullet point, no random case and no typing effect',txtspeed=0,bnum=0,rndcase=False)
# echo('#blue ^cyan blue cyan')
#echo('#red #reverse red reverse')
# echo('yellow red yellow on red 1')
# echo('yellow,red,yellow on red 2', sep=',')
# print 'yellow on red 3'
# for bg in colours:
# echo(bg.title().center(8), sep='.', end='')
# for fg in colours:
# att=[fg, bg]
# if fg==bg: att.append('blink')
# att.append(fg.center(8))
# echo(','.join(att), sep=',', end='')
#for att in attrs:
# echo('%s,%s' % (att, att.title().center(10)), sep=',', end='')
# print
from time import sleep, strftime, gmtime
colist='#grey #blue #cyan #white #cyan #blue'.split()
while True:
try:
for c in colist:
sleep(.1)
echo('%s #28;33 hit ctrl-c to quit' % c,txtspeed=0)
echo('%s #29;33 hit ctrl-c to quit' % c,rndcase=False,txtspeed=0)
#echo('#yellow #6;66 %s' % strftime('%H:%M:%S', gmtime()))
except KeyboardInterrupt:
break
except:
raise
echo('#10;1')
print
should also mention that i have absolutely no idea what this line does :) - well i see that it puts colours into a dictionary object, but how it does it is confusing. not used to this python syntax yet.
for k, v in dict(
attrs = 'none bold faint italic underline blink fast reverse concealed',
colours = 'grey red green yellow blue magenta cyan white'
).items(): globals()[k]=dict((s,i) for i,s in enumerate(v.split()))

This is a rather convoluted code - but, sticking to you r question, about the lines:
cmd.append(format % (colours[tmpword]+fgoffset))
This expression appends to the list named cmd the interpolation of the string contained in the variable format with the result of the expression (colours[tmpword]+fgoffset))- which concatenates the code in the color table (colours) named by tmpword with fgoffset.
The format string contains '1;%dm' which means it expects an integer number, whcih will replace the "%d" inside it. (Python's % string substitution inherits from C's printf formatting) . You "colours" color table ont he other hand is built in a convoluted way I'd recomend in no code, setting directly the entry in "globals" for it - but let's assume it does have the correct numeric value for each color entry. In that case, adding it to fgoffset will generate color codes out of range (IRCC, above 15) for some color codes and offsets.
Now the second line in which you are in doubt:
c=format % attrs[tmpword] if tmpword in attrs else None
This if is just Python's ternary operator - equivalent to the C'ish expr?:val1: val2
It is equivalent to:
if tmpword in attrs:
c = format % attrs[tmpword]
else:
c = format % None
Note that it has less precedence than the % operator.
Maybe you would prefer:
c= (format % attrs[tmpword]) if tmpword in attrs else ''
instead

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - looking for a faster way to extract substrings from string - python

Not sure why your code is juggling with those square brackets. Wouldn't this do it? names = re.findall(r"\bCN=[^,;]*", CNString)

Related

Shadowed text without knowingly doing it using python-pptx. How to remove it?

Pyparsing nested transformString

Color Text Only Partially Working [duplicate]

Python docx add_paragraph() inserts leading newline

colour terminal text effects with Python

Categories

Resources