I'm writing some basic script to create basic charts using python's svgwrite. I have successfully been able to create groups with other items, such as circles, paths and lines. However when adding several text elements into a group those are not properly shown in a group when I open the svg figure with Inkscape. The text shows up all right, but it is just not grouped.
This is my piece of code:
# Create group for constellation names
const_names = dwg.add(dwg.g(id='constellation_names',
stroke='none',
fill=config.constellation_name_font.color.get_hex_rgb(),
fill_opacity=config.constellation_name_font.color.get_float_alpha(),
font_size=config.constellation_name_font.size*pt,
font_family=config.constellation_name_font.font_family))
log.warning("Constellation name groups are not working!")
if config.constellation_name_enable:
w, h = constellation.get_mean_position()
# Add every text item into the group
const_names.add(dwg.text(constellation.name,
insert=(w*pt, h*pt),
)
)
Turns out this was a type-8 error (I had a bug on the code). This is how my code ended up looking like. All text instances are grouped on a single group.
def _add_constellation_names(dwg, constellations, config):
const_names = dwg.add(dwg.g(id='constellation_names',
stroke='none',
fill=config.constellation_name_font.color.get_hex_rgb(),
fill_opacity=config.constellation_name_font.color.get_float_alpha(),
font_size=config.constellation_name_font.size*pt,
font_family=config.constellation_name_font.font_family))
for constellation in constellations:
kwargs = {}
if constellation.custom_color != None:
kwargs["fill"] = constellation.custom_color.get_hex_rgb()
kwargs["fill_opacity"] = constellation.custom_color.get_float_alpha()
w, h = constellation.get_mean_position()
const_names.add(dwg.text(constellation.get_display_name(),
insert=(w*pt, h*pt),
text_anchor="middle",
**kwargs,
)
)
Related
We have paper invoices coming in, which are in paper format. We take images of these invoices, and wish to extract the information contained within the cells of the tabular region(s), and export them as CSV or similar.
The tables include multiple columns, and the cells contain numbers and words.
I have been searching around for ML-based Python procedures to have this performed, expecting this to be a relatively straightforward task (or maybe I'm mistaken), yet not much luck in coming across a procedure.
I can detect the horizontal and vertical lines, and combine them to locate the cells. But retrieving the information contained within the cells seems to be problematic.
Could I please get help?
I followed one procedure from this reference, yet came across an error with "bitnot":
import pytesseract
extract=[]
for i in range(len(order)):
for j in range(len(order[i])):
inside=''
if(len(order[i][j])==0):
extract.append(' ')
else:
for k in range(len(order[i][j])):
side1,side2,width,height = order[i][j][k][0],order[i][j][k][1], order[i][j][k][2],order[i][j][k][3]
final_extract = bitnot[side2:side2+h, side1:side1+width]
final_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (2, 1))
get_border = cv2.copyMakeBorder(final_extract,2,2,2,2, cv2.BORDER_CONSTANT,value=[255,255])
resize = cv2.resize(get_border, None, fx=2, fy=2, interpolation=cv2.INTER_CUBIC)
dil = cv2.dilate(resize, final_kernel,iterations=1)
ero = cv2.erode(dil, final_kernel,iterations=2)
ocr = pytesseract.image_to_string(ero)
if(len(ocr)==0):
ocr = pytesseract.image_to_string(ero, config='--psm 3')
inside = inside +" "+ ocr
extract.append(inside)
a = np.array(extract)
dataset = pd.DataFrame(a.reshape(len(hor), total))
dataset.to_excel("output1.xlsx")
The error I get is this:
final_extract = bitnot[side2:side2+h, side1:side1+width]
NameError: name 'bitnot' is not defined`
I need to find all projects and shared projects within a Gitlab group with subgroups. I managed to list the names of all projects like this:
group = gl.groups.get(11111, lazy=True)
# find all projects, also in subgroups
projects=group.projects.list(include_subgroups=True, all=True)
for prj in projects:
print(prj.attributes['name'])
print("")
What I am missing is to list also the shared projects within the group. Or maybe to put this in other words: find out all projects where my group is a member. Is this possible with the Python API?
So, inspired by the answer of sytech, I found out that it was not working in the first place, as the shared projects were still hidden in the subgroups. So I came up with the following code that digs through all various levels of subgroups to find all shared projects. I assume this can be written way more elegant, but it works for me:
# group definition
main_group_id = 11111
# create empty list that will contain final result
list_subgroups_id_all = []
# create empty list that act as temporal storage of the results outside the function
list_subgroups_id_stored = []
# function to create a list of subgroups of a group (id)
def find_subgroups(group_id):
# retrieve group object
group = gl.groups.get(group_id)
# create empty lists to store id of subgroups
list_subgroups_id = []
#iterate through group to find id of all subgroups
for sub in group.subgroups.list():
list_subgroups_id.append(sub.id)
return(list_subgroups_id)
# function to iterate over the various groups for subgroup detection
def iterate_subgroups(group_id, list_subgroups_id_all):
# for a given id, find existing subgroups (id) and store them in a list
list_subgroups_id = find_subgroups(group_id)
# add the found items to the list storage variable, so that the results are not overwritten
list_subgroups_id_stored.append(list_subgroups_id)
# for each found subgroup_id, test if it is already part of the total id list
# if not, keep store it and test for more subgroups
for test_id in list_subgroups_id:
if test_id not in list_subgroups_id_all:
# add it to total subgroup id list (final results list)
list_subgroups_id_all.append(test_id)
# check whether test_id contains more subgroups
list_subgroups_id_tmp = iterate_subgroups(test_id, list_subgroups_id_all)
#if so, append to stored subgroup list that is currently checked
list_subgroups_id_stored.append(list_subgroups_id_tmp)
return(list_subgroups_id_all)
# find all subgroup and subsubgroups, etc... store ids in list
list_subgroups_id_all = iterate_subgroups(main_group_id , list_subgroups_id_all)
print("***ids of all subgroups***")
print(list_subgroups_id_all)
print("")
print("***names of all subgroups***")
list_names = []
for ids in list_subgroups_id_all:
group = gl.groups.get(ids)
group_name = group.attributes['name']
list_names.append(group_name)
print(list_names)
#print(list_subgroups_name_all)
print("")
# print all directly integrated projects of the main group, also those in subgroups
print("***integrated projects***")
group = gl.groups.get(main_group_id)
projects=group.projects.list(include_subgroups=True, all=True)
for prj in projects:
print(prj.attributes['name'])
print("")
# print all shared projects
print("***shared projects***")
for sub in list_subgroups_id_all:
group = gl.groups.get(sub)
for shared_prj in group.shared_projects:
print(shared_prj['path_with_namespace'])
print("")
One question that remains - at the very beginning I retrieve the main group by its id (here: 11111), but can I actually also get this id by looking for the name of the group? Something like: group_id = gl.group.get(attribute={'name','foo'}) (not working)?
You can get the shared projects by the .shared_projects attribute:
group = gl.groups.get(11111)
for proj in group.shared_projects:
print(proj['path_with_namespace'])
However, you cannot use the lazy=True argument to gl.groups.get.
>>> group = gl.groups.get(11111, lazy=True)
>>> group.shared_projects
AttributeError: shared_projects
I have provided this data frame,
as you see I have 3 index chapter, ParaIndex, (paragraph index) and Sentindex (sententcesindex), I have 70 chapters, 1699 Paragraph, and 6999 sentences
so each of them starts from the beginning (0 or 1 ), the problem is that I want to make a widget to call a "specific sentence" which placed in a specific paragraph of a chapter. something like this
https://towardsdatascience.com/interactive-controls-for-jupyter-notebooks-f5c94829aee6
but for extracting specific sentences in the specific paragraph of the specific chapter
I think I should have another index (like ChapParaSent ABBREVIATION for all) or even multidimensions index which show that this sentence where exactly placed
any idea how can I provide that using ipywidget
https://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html
#interact
def showDetail( Chapter=(1,70),ParaIndex=(0,1699),SentIndex=(0,6999)):
return df.loc[(df.Chapter == Chapter) & (df.ParaIndex==ParaIndex)&(df.SentIndex==SentIndex)]
the problem with this is since we do not know each chapter has how many paragraphs has as well as and we do not know in each paragraph SentIndex the index to start from which number most of the time we have no result.
the aim is to adopt this (or define a new index) in a way that with changing the bar buttons we have always one unique sentence
for example, here I have the result:
but when I changed to this :
I do not have any result, the REASON is obvious because we do not have any index as 1-2-1 since, in chapter 1, Paragraph index 2: Sentindex starts from 2!
One solution I saw that it was a complete definition of a multidimensional data frame but I need something easier that I can use by ipywidget...
many many thanks
Im sure there is a easier solution out there but that works I guess.
import pandas as pd
data = [
dict(Chapter=0, ParaIndex=0, SentIndex=0, content="0"),
dict(Chapter=1, ParaIndex=1, SentIndex=1, content="a"),
dict(Chapter=1, ParaIndex=1, SentIndex=2, content="b"),
dict(Chapter=2, ParaIndex=2, SentIndex=3, content="c"),
dict(Chapter=2, ParaIndex=2, SentIndex=4, content="d"),
dict(Chapter=2, ParaIndex=3, SentIndex=5, content="e"),
dict(Chapter=3, ParaIndex=4, SentIndex=6, content="f"),
]
df = pd.DataFrame(data)
def showbyindex(target_chapter, target_paragraph, target_sentence):
df_chapter = df.loc[df.Chapter==target_chapter]
unique_paragraphs = df_chapter.ParaIndex.unique()
paragraph_idx = unique_paragraphs[target_paragraph]
df_paragraph = df_chapter.loc[df.ParaIndex==paragraph_idx]
return df_paragraph.iloc[target_sentence]
showbyindex(target_chapter=2, target_paragraph=0, target_sentence=1)
Edit:
If you want the sliders only to be within a valid range you can define IntSliders for your interact decorator:
chapter_slider = widgets.IntSlider(min=0, max=max(df.Chapter.unique()), step=1, value=0)
paragraph_slider = widgets.IntSlider(min=0, max=1, step=1, value=0)
sentence_slider = widgets.IntSlider(min=0, max=1, step=1, value=0)
#interact(target_chapter=chapter_slider, target_paragraph=paragraph_slider, target_sentence=sentence_slider)
Now you have to check the valid number of paragraphs/sentences within your showbyindex function and set the sliders value/max accordingly.
if(...):
paragraph_slider.max = ...
...
I am using the python-pptx library for pptx manipulation. I want to add a bullet list in the pptx document.
I am using the following snippet to add list item:
p = text_frame.add_paragraph()
run = p.add_run()
p.level = 0
run.text = "First"
But it does not display bullet points; please guide.
It is currently not possible to access the bullet property using python-pptx, but I want to share a workaround that has served me well.
This requires the use of a pptx template, in which we exploit the fact that the levels in a slide layout can be customized individually.
For instance, in the slide layout you could set level 0 to be normal text, level 1 to be bullets, and level 2 to be numbers or any other list style you want. You can then modify font size, indentation (using the ruler at the top), and any other property of each level to get the look you want.
For my use-case, I just set levels 1 and 2 to have the same indentation and size as level 0, making it possible to create bullet lists and numbered lists by simply setting the level to the corresponding value.
This is how my slide layout looks in the template file:
slide layout example
And this is how I set the corresponding list style in the code:
p.level = 0 # Regular text
p.level = 1 # Bullet
p.level = 2 # Numbers
In theory, you should be able to set it up exactly the way you want, even with indented sub-lists and so on. The only limitation I am aware of is that there seems to be a maximum of 8 levels that can be customized in the slide layout.
My solution:
from pptx.oxml.xmlchemy import OxmlElement
def SubElement(parent, tagname, **kwargs):
element = OxmlElement(tagname)
element.attrib.update(kwargs)
parent.append(element)
return element
def makeParaBulletPointed(para):
"""Bullets are set to Arial,
actual text can be a different font"""
pPr = para._p.get_or_add_pPr()
## Set marL and indent attributes
pPr.set('marL','171450')
pPr.set('indent','171450')
## Add buFont
_ = SubElement(parent=pPr,
tagname="a:buFont",
typeface="Arial",
panose="020B0604020202020204",
pitchFamily="34",
charset="0"
)
## Add buChar
_ = SubElement(parent=pPr,
tagname='a:buChar',
char="•")
This question is still up to date on May 27, 2021.
Following up on #OD1995's answer I would like to add a little more detail as well as my turn on the problem.
I created a new package with the following code:
from pptx.oxml.xmlchemy import OxmlElement
def getBulletInfo(paragraph, run=None):
"""Returns the attributes of the given <a:pPr> OxmlElement
as well as its runs font-size.
*param: paragraph* pptx _paragraph object
*param: run* [optional] specific _run object
"""
pPr = paragraph._p.get_or_add_pPr()
if run is None:
run = paragraph.runs[0]
p_info = {
"marL": pPr.attrib['marL'],
"indent": pPr.attrib['indent'],
"level": paragraph.level,
"fontName": run.font.name,
"fontSize": run.font.size,
}
return p_info
def SubElement(parent, tagname, **kwargs):
"""Helper for Paragraph bullet Point
"""
element = OxmlElement(tagname)
element.attrib.update(kwargs)
parent.append(element)
return element
def pBullet(
paragraph, # paragraph object
font, # fontName of that needs to be applied to bullet
marL='864000',
indent='-322920',
size='350000' # fontSize (in )
):
"""Bullets are set to Arial,
actual text can be a different font
"""
pPr = paragraph._p.get_or_add_pPr()
# Set marL and indent attributes
# Indent is the space between the bullet and the text.
pPr.set('marL', marL)
pPr.set('indent', indent)
# Add buFont
_ = SubElement(parent=pPr,
tagname="a:buSzPct",
val="350000"
)
_ = SubElement(parent=pPr,
tagname="a:buFont",
typeface=font,
# panose="020B0604020202020204",
# pitchFamily="34",
# charset="0"
)
# Add buChar
_ = SubElement(parent=pPr,
tagname='a:buChar',
char="•"
)
The reason I did this is because I was frustrated that the bullet character was not of the same size as the original and the text was stuck to the bullet.
getBulletInfo() allows me to retrieve information from an existing paragraph.
I use this information to populate the element's attributes (so that it is identical to the template).
Anyways the main add-on is the creation of a sub-element <a:buSzPct> (documentation here and here). This is a size percentage that can go from 25% to 350% (100000 = 100%).
Try this:
p = text_frame.add_paragraph()
p.level = 0
p.text = "First"
Or if the text_frame already has a paragraph:
p = text_frame.paragraphs[0]
p.level = 0
p.text = "First"
What I want to do is duplicate a controller to other side and rename/replace _L to _R. So I just have to select controller and it will create a group and then another group to mirror it on right side and renaming that other group to _R. Then unparent first group to world. thats all I want to do. but I'm stuck on renaming. I know I have to sort list in reverse order to rename it but whenever I do it Maya says:
More than one object matches name
Duplicated object has different parent name and same children name. Please tell me how should I do it and what I'm missing.
import maya.cmds as cmds
list = cmds.ls(sl=1)
grp = cmds.group(em=1, name=("grp" + list[0]))
# creating constraint to match transform and deleting it
pc = cmds.pointConstraint(list, grp, o=[0,0,0], w=1)
oc = cmds.orientConstraint(list, grp, o=[0,0,0], w=1)
cmds.delete(pc, oc)
# parenting it to controller
cmds.parent(list, grp)
# creating new group to reverse it to another side
Newgrp = cmds.group(em=1)
cmds.parent(grp, Newgrp)
Reversedgrp = cmds.duplicate(Newgrp)
cmds.setAttr(Reversedgrp[0] +'.sx', -1)
selection = cmds.ls(Reversedgrp, long=1)
selection.sort(key=len, reverse=1)
Renaming in Maya is very annoying, because the names are your only handle to the objects themselves.
The usually trick is basically:
Duplicate the items with the rr flag, so you only get the top nodes
Use listRelatives with the ad and full flags to get all the children of the duplicated top node in long form like |Parent|Child|Grandchild. In this form the where the entire hierarchy above the name is listed in order (you can get this form with cmds.ls(l=True) on objects as well)
Sort that list and then reverse it. This will put the longest path names first, so you can start with the leaf nodes and work your way upwards
Now loop through the items and apply your renaming pattern
So something like this, though you probably want to replace the selection here with something you control:
import maya.cmds as cmds
dupes = cmds.duplicate(cmds.ls(sl=True), rr=True) # duplicate, return only roots
dupes += cmds.listRelatives(dupes, ad=True, f=True) # add children as long names
longnames = cmds.ls(dupes, l=True) # make sure we have long name for root
longnames.sort() # usually these sort automatically, but's good to be safe
for item in longnames[::-1]: # this is shorthand for 'walk through the list backwards'
shortname = item.rpartition("|")[-1] # get the last bit of the name
cmds.rename(item, shortname.replace("r","l")) # at last, rename the item
thanks "theodox" it was very usefull. but still little bit confused in sorting, long names, short names and .rpartition... but anyway i have created this script finally.
import maya.cmds as cmds
_list = cmds.ls(sl=1)
grp = cmds.group(em=1, name=("grp_"+ _list[0]))
#creating constraint to match transfor and deleting it.
pc=cmds.pointConstraint( _list, grp, o=[0,0,0],w=1 )
oc=cmds.orientConstraint( _list, grp, o=[0,0,0],w=1 )
cmds.delete(pc,oc)
cmds.parent( _list, grp )
Newgrp=cmds.group(em=1)
cmds.parent(grp,Newgrp)
#duplicating new group and reversing it to negative side
dupes = cmds.duplicate(cmds.ls(Newgrp,s=0), rr=True) # duplicate, return only roots
cmds.setAttr( dupes[0] +'.sx', -1 )
#renaming
dupes += cmds.listRelatives(dupes, ad=True, f=True) # add children as long names
longnames = cmds.ls(dupes, l=True,s=0) # make sure we have long name for root
longnames.sort() # usually these sort automatically, but's good to be safe
print longnames
for item in longnames[::-1]: # this is shorthand for 'walk through the list backwards'
shortname = item.rpartition("|")[-1] # get the last bit of the name
cmds.rename(item, shortname.replace("_L","_R")) # at last, rename the item
#ungrouping back to world and delting unused nodes
cmds.parent( grp, world=True )
duplicatedGrp=cmds.listRelatives(dupes[0], c=True)
cmds.parent( duplicatedGrp, world=True )
cmds.delete(dupes[0],Newgrp)
anyone can use this code for mirroring controllers just change "l","r" in rename command.
thank you.