conversion newick to graphml using python

conversion newick to graphml using python - python

I would like to convert a tree from newick to a format like graphml, that I can open with cytoscape.
So, I have a file "small.newick" that contain:
((raccoon:1,bear:6):0.8,((sea_lion:11.9, seal:12):7,((monkey:100,cat:47):20, weasel:18):2):3,dog:25);
So far, I did that way (Python 3.6.5 |Anaconda):
from Bio import Phylo
import networkx
Tree = Phylo.read("small.newick", 'newick')
G = Phylo.to_networkx(Tree)
networkx.write_graphml(G, 'small.graphml')
There is a problem with the Clade, that I can fix using this code:
from Bio import Phylo
import networkx
def clade_names_fix(tree):
for idx, clade in enumerate(tree.find_clades()):
if not clade.name:
clade.name=str(idx)
Tree = Phylo.read("small.newick", 'newick')
clade_names_fix(Tree)
G = Phylo.to_networkx(Tree)
networkx.write_graphml(G, 'small.graphml')
Giving me something that seem nice enough:
My questions are:
Is that a good way to do it? It seem weird to me that the function does not take care of the internal node names
If you replace one node name with a string long enough, it will be trimmed by the command Phylo.to_networkx(Tree). How to avoid that?
Example: substitution of "dog" by "test_tring_that_create_some_problem_later_on"

Looks like you got pretty far on this already. I can only suggest a few alternatives/extensions to your approach...
Unfortunately, I couldn't find a Cytoscape app that can read this format. I tried searching for PHYLIP, NEWICK and PHYLO. You might have more luck:
http://apps.cytoscape.org/
There is an old Cytoscape 2.x plugin that could read this format, but to run this you would need to install Cytoscape 2.8.3, import the network, then export as xGMML (or save as CYS) and then try to open in Cytoscape 3.7 in order to migrate back into the land of living code. Then again, if 2.8.3 does what you need for this particular case, then maybe you don't need to migrate:
http://apps.cytoscape.org/apps/phylotree
The best approach is programmatic, which you already explored. Finding an R or Python package that turns NEWICK into iGraph or GraphML is a solid strategy. Note that there are updated and slick Cytoscape libs in those languages as well, so you can do all label cleanup, layout, data visualization, analysis, export, etc all within the scripting environment:
https://bioconductor.org/packages/release/bioc/html/RCy3.html
https://py2cytoscape.readthedocs.io/en/latest/

After some research, I actually found a solution that work.
I decided to provide the link here for you, dear reader:
going to github

FYI for anyone coming across this now I think the first issue mentioned here has now been solved in BioPython. Using the same data as above, the networkx graph which is built contains all the internal nodes of the tree as well as the terminal nodes.
import matplotlib.pyplot as plt
import networkx
from Bio import Phylo
Tree = Phylo.read("small.newick", 'newick')
G = Phylo.to_networkx(Tree)
networkx.draw_networkx(G)
plt.savefig("small_graph.png")
Specs:
Python 3.8.10,
Bio 1.78,
networkx 2.5

Related

Python PPTX workaround to add Transitions to slides

I successfully automated the creation of pptx presentations using python-pptx, customising background, inserting text, images, etc.
How can I add custom Transitions to my slides? (E.g. "Transitions" > "Fade" from PowerPoint). As I could not find a function, my idea is to use workaround functions (going deep into xml): where do I start?
python 3.10.4,
PowerPoint v16.54,
MacOS Big Sur 11.6

So, on playing with this, the following worked for me:
xml = '''
<mc:AlternateContent xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006">
<mc:Choice xmlns:p14="http://schemas.microsoft.com/office/powerpoint/2010/main" Requires="p14">
<p:transition xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main" spd="slow" p14:dur="3400">
<p14:ripple />
</p:transition>
</mc:Choice>
<mc:Fallback>
<p:transition xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main" spd="slow">
<p:fade />
</p:transition>
</mc:Fallback>
</mc:AlternateContent>
'''
xmlFragment = parse_xml(xml)
slide.element.insert(-1, xmlFragment)
Where slide is the slide object in python-pptx.
You probably need the following import:
from pptx.oxml import parse_xml
Where I have ripple I first tested with reveal - and that worked as well. I'm sure there are other transitions. Before I attempt to add them to md2pptx I will want to find some more and figure out what kind of UI I want to surface them with.
Hope this helps.
(Edited for grammar.)

Use imshow with Matlab Python engine

After building and installing the Python engine shipped with Matlab 2019b in Anaconda
(TestEnvironment) PS C:\Program Files\MATLAB\R2019b\extern\engines\python> C:\Users\USER\Anaconda3\envs\TestEnvironment\python.exe .\setup.py build -b C:\Users\USER\MATLAB\build_temp install
for Python 3.7 I wrote a simple script to test a couple of features I'm interested in:
import matlab.engine as ml_e
# Start Matlab engine
eng = ml_e.start_matlab()
# Load MAT file into engine. The result is a dictionary
mat_file = "samples/lena.mat"
lenaMat = eng.load("samples/lena.mat")
print("Variables found in \"" + mat_file + "\"")
for key in lenaMat.keys():
print(key)
# print(lenaMat["lena512"])
# Use the variable from the MAT file to display it as an image
eng.imshow(lenaMat["lena512"], [])
I have a problem with imshow() (or any similar function that displays a figure in the Matlab GUI on the screen) namely that it shows quickly and then disappears, which - I guess - at least confirms that it is possible to use it. The only possibility to keep it on the screen is to add an infinite loop at the end:
while True:
continue
For obvious reasons this is not a good solution. I am not looking for a conversion of Matlab data to NumPy or similar and displaying it using matplotlib or similar third party libraries (I am aware that SciPy can load MAT files for example). The reason is simple - I would like to use Matlab (including loading whole environments) and for debugging purposes I'd like to be able to show this and that result without having to go through loops and hoops of converting the data manually.

Network is broken after loading from pickle [duplicate]

Context: I'm trying to run another researcher's code - it describes a traffic model for the Bay Area road network, which is subject to seismic hazard. I'm new to Python and therefore would really appreciate some help debugging the following error.
Issue: When I try to run the code for the sample data provided with the file, following the instructions in the README, I get the following error.
DN0a226926:quick_traffic_model gitanjali$ python mahmodel_road_only.py
You are considering 2 ground-motion intensity maps.
You are considering 1743 different site locations.
You are considering 2 different damage maps (1 per ground-motion intensity map).
Traceback (most recent call last):
File "mahmodel_road_only.py", line 288, in <module>
main()
File "mahmodel_road_only.py", line 219, in main
G = get_graph()
File "mahmodel_road_only.py", line 157, in get_graph
G = add_superdistrict_centroids(G)
File "mahmodel_road_only.py", line 46, in add_superdistrict_centroids
G.add_node(str(1000000 + i))
File "/Library/Python/2.7/site-packages/networkx-2.0-py2.7.egg/networkx/classes/digraph.py", line 412, in add_node
if n not in self._succ:
AttributeError: 'DiGraph' object has no attribute '_succ'
Debugging: Based on some other questions, it seems like this error stems from an issue with the networkx version (I'm using 2.0) or the Python version (I'm using 2.7.10). I went through the migration guide cited in other questions and found nothing that I needed to change in mahmodel_road_only.py. I also checked the digraph.py file and found that self._succ is defined. I also checked the definition of get_graph(), shown below, which calls networkx, but didn't see any obvious issues.
def get_graph():
import networkx
'''loads full mtc highway graph with dummy links and then adds a few
fake centroidal nodes for max flow and traffic assignment'''
G = networkx.read_gpickle("input/graphMTC_CentroidsLength3int.gpickle")
G = add_superdistrict_centroids(G)
assert not G.is_multigraph() # Directed! only one edge between nodes
G = networkx.freeze(G) #prevents edges or nodes to be added or deleted
return G
Question: How can I resolve this problem? Is it a matter of changing the Python or Networkx versions? If not, what next steps could you recommend for debugging?

I believe your problem is similar to that in AttributeError: 'DiGraph' object has no attribute '_node'
The issue there is that the graph being investigated was created in networkx 1.x and then pickled. The graph then has the attributes that a networkx 1.x object has. I believe this happened for you as well.
You've now opened it and you're applying tools from networkx 2.x to that graph. But those tools assume that it's a networkx 2.x DiGraph, with all the attributes expected in a 2.x DiGraph. In particular it expects _succ to be defined for a node, which a 1.x DiGraph does not have.
So here are two approaches that I believe will work:
Short term solution
Remove networkx 2.x and replace with networkx 1.11.
This is not optimal because networkx 2.x is more powerful. Also code that has been written to work in both 2.x and 1.x (following the migration guide you mentioned) will be less efficient in 1.x (for example there will be places where the 1.x code is using lists and the 2.x code is using generators).
Long term solution
Convert the 1.x graph into a 2.x graph (I can't test easily as I don't have 1.x on my computer at the moment - If anyone tries this, please leave a comment saying whether this works and whether your network was weighted):
#you need commands here to load the 1.x graph G
#
import networkx as nx #networkx 2.0
H = nx.DiGraph() #if it's a DiGraph()
#H=nx.Graph() #if it's a typical networkx Graph().
H.add_nodes_from(G.nodes(data=True))
H.add_edges_from(G.edges(data=True))
The data=True is used to make sure that any edge/node weights are preserved. H is now a networkx 2.x DiGraph, with the edges and nodes having whatever attributes G had. The networkx 2.x commands should work on it.
Bonus longer term solution
Contact the other researcher and warn him/her that the code example is now out of date.

Why do I get a Graph when I specify create_using=nx.DiGraph

I'm new to networkx in python and I had a problem in creating a DiGraph. I specify create_using=nx.DiGraph when I try to create a DiGraph using the adjacency matrix in pd dataframe, but I got a Graph instead of a DiGraph. Can anyone explain why?

This is apparently a bug, that is fixed in the new version of networkx. The problem is on this line.
You can either install the new version of networkx to fix it, or implement the solution yourself (you need just to add a word). If you chose the second, open the file networkx.convert_matrix.py, which in my system is found at: /usr/local/lib/python3.6/site-packages/networkx/convert_matrix.py (open the file as root using sudo), and change line 191 from:
G = from_numpy_matrix(A, create_using)
to
G = from_numpy_matrix(A, create_using=create_using)
And listo, bug should be solved. Note: is create_using=nx.DiGraph().

How can I display an empty staff using music21?

I am trying to produce a quick reworking of some educational materials on music showing how it may be able to create the associated media assets (images, audio files) from "code" in a Jupyter notebook using the Python music21 package.
It seems the simplest steps are the hardest. For example, how do I create an empty staff:
or a staff populated by notes but without a clef at the start?
If I do something like:
from music21 import *
s = stream.Stream()
s.append(note.Note('G4', type='whole'))
s.append(note.Note('A4', type='whole'))
s.append(note.Note('B4', type='whole'))
s.append(note.Note('C5', type='whole'))
s.show()
I get the following?

Try creating a stream.Measure object, so that barlines before the notes don't appear.
Music21 puts barlines and clefs, etc., in by default. You can manually put in a time signature of 4/1 and a treble clef and set them with ".style.hideObjectOnPrint" (or just ".hideObjectOnPrint" on older m21 versions). You will probably need to also set .rightBarline = bar.Barline('none') or something like that for the end.
It is possible, but I haven't ever fully tried all the parts of it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.