I have a list of networkx graphs, and I am trying to write a text file containing a massive edge list of all graphs. If you run the following code:
from torch_geometric.datasets import TUDataset
dataset = TUDataset(root='data/TUDataset', name='MUTAG')
Then do to data->TUDataset->MUTAG->raw, I am trying to replicate the raw files but using my data.
My raw data is a MATLAB .mat file containing a struct where the first column A is each individual graph's corresponding adjacency matrix where i create the networkx graph:
from scipy.io import loadmat
import pandas as pd
raw_data = loadmat('data_final3.mat', squeeze_me=True)
data = pd.DataFrame(raw_data['Graphs'])
import networkx as nx
A = data.pop('A')
nx_graph = []
for i in range(len(A)):
nx_graph.append(nx.Graph(A[i]))
I created the MUTAG_graph_indicator file using:
with open('graph_indicator.txt', 'w') as f:
for i in range(len(nx_graph)):
f.write((str(i)+'\n')*len(nx_graph[i].nodes))
If there is a way to do this either using python or MATLAB, I would greatly appreciate the help. Yes, torch_geometric does have from_networkx, but it doesn't seem to contain the same information as if I created the torch_geometric graphs the same way as the sample data.
Related
I've been pulling my hair out trying to make a bipartite graph from a csv file and so far all I have is a panda matrix that looks like this
My code so far is just
`
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# import pyexcel as pe
# import pyexcel.ext.xlsx
from networkx.algorithms import bipartite
mat = pd.read_csv("networkdata3.csv")
# mat = pd.read_excel("networkdata1.xlsx",sheet_name="sheet_name_1")
print(mat.info)
sand = nx.from_pandas_adjacency(mat)
`
and I have no clue what I'm doing wrong. Initially I was trying to read it in as the original xlsx file but then I just converted it to a csv and it started reading. I assume I can't make the graph because the column numbers are decimals and the error that spits out claims that the column numbers don't match up. So how else should I be doing this to actually start making some progress?
I'm VERY new to xarray, and I tried to import a satellite netcdf files into python using xarray using this file: https://tropomi.gesdisc.eosdis.nasa.gov/data//S5P_TROPOMI_Level2/S5P_L2__NO2____HiR.1/2020/003/S5P_OFFL_L2__NO2____20200103T170946_20200103T185116_11525_01_010302_20200105T100506.nc
This is the code I used:
import xarray as xr
import numpy as np
import pandas as pd
tropomi = xr.open_dataset('test2.nc', engine = 'netcdf4')
tropomi
Output:
But the output does not present any variables, and has 53 attributes - why is this happening?
Thanks!
I figured it out. When you open the file without a group defined, you get the global attributes with no variables. You need to include a group='PRODUCT' to get the data products, like this:
tropomi = xr.open_dataset('test2.nc', group='PRODUCT')
I am trying to load a large number of data files from the same folder in Python. The ultimate goal here is to simply choose which file I would like to use in calculations, rather than individually opening files.
Here is what I have. This seems to work in opening the data in the files, but I am having a hard time choosing a specific file I want to work with (and assigning a value to each column in each file).
import astropy
import numpy as np
import matplotlib.pyplot as plt
dir = '/S34_east_tfa/'
import glob, os
os.chdir(dir)
for file in glob.glob("*.data"):
data = np.loadtxt(file)
print (data)
Time = data[:,0]
Use a python dictionary, instead of overwriting the results in data variable inside your loop.
data_dict = dict()
for file in glob.glob("*.data"):
data_dict[file] = np.loadtxt(file)
Is this what you were looking for?
I am importing a CSV file that contains data which is all in a single column (the TXT file has the data separated by ";"
Is there anyway to get the data to load into Anaconda (using Panda) so that it is in separate columns, or can it be manipulated afterwards into columns?
The data can be found at the following web-address (this is data about sunspots):
http://www.sidc.be/silso/INFO/snmtotcsv.php
From this website http://www.sidc.be/silso/datafiles
I have managed to do this so far:
Start code by loading the Panda command set
from pandas import *
#Initial setup commands
import warnings
warnings.simplefilter('ignore', FutureWarning)
import matplotlib
matplotlib.rcParams['axes.grid'] = True # show gridlines by default
%matplotlib inline
from scipy.stats import spearmanr
#load data from CSV file
startdata = read_csv('SN_m_tot_V2.0.csv',header=None)
startdata = startdata.reset_index()
I received an answer elsewhere; the lines of code that takes into account the lack of column headings AND the separator being s semi-colon is:
colnames=['Year','Month','Year (fraction)','Sunspot number','Std dev.','N obs.','Provisional']
ssdata=read_csv('SN_m_tot_V2.0.csv',sep=';',header=None,names=colnames)
I have .dat file that I want to use in my script which draws scatter graph with data input from that .dat file. I have been manually converting .dat files to .csv for this purpose but I find it not satisfactory.
This is what I am using currently.
import pandas as pd import matplotlib.pyplot as plt import numpy as np
filename=raw_input('Enter filename ')
csv = pd.read_csv(filename)
data=csv[['deformation','stress']]
data=data.astype(float)
x=data['deformation']
y=data['stress']
plt.scatter(x,y,s=0.5)
fit=np.polyfit(x,y,15)
p=np.poly1d(fit)
plt.plot(x,p(x),"r--")
plt.show()
Programmer friend told me that it would be more convenient to convert it to JSON and use it as such. How would I go about this?
try using the numpy read feature
import numpy as np
yourArray = np.fromfile('YourData.dat',dtype=dtype)
yourArray = np.loadtxt('YourData.dat')
loadtxt is more flexible than fromfile