I have 50 different folders (from 0 to 50) with the data I want to plot named as data_A, data_B, data_C, data_D.
How do I iterate through the 50 folders, collect the data, apply a numerical operation and print the output (V) to a list?
The final goal would be to make a boxplot of (V) for each folder.
Hope that this attempt of code helps in understanding my aim:
directory='/path_to_data/'
folders = [0..50]
for i in (len(folders)):
A = (np.genfromtxt(directory/[i]/'data_A.dat')
B = (np.genfromtxt(directory/[i]/'data_B.dat')
C = (np.genfromtxt(directory/[i]/'data_C.dat')
D = (np.genfromtxt(directory/[i]/'data_D.dat')
V = (A+B+C+D)/4 #make and average of the data
DATA=[V,..]
NAMES=[folder$i,.. ]
done
Thanks!
Related
EDITED
I really need help from Networkx/graph experts.
Let us say I have the following data frames and I would like to convert these data frames to graphs. Then I would like to map the two graphs with corresponding nodes based on description and priority attributes.
df1
From description To priority
10 Start 20, 50 1
20 Left 40 2
50 Bottom 40 2
40 End - 1
df2
From description To priority
60 Start 70,80 1
70 Left 80, 90 2
80 Left 100 2
90 Bottom 100 2
100 End - 1
I just converted the two data frames and created a graph (g1, and g2).
And then I am trying to match the nodes based on their description and priority for only once. for example 10/60, 40/100, 50/90 but not 20/70, 20/80, and 70/80. 20 has three conditions to be mapped which are not what I want. Because I would like to map nodes for only once unless I would like to put them as a single node and mark the node as red to differentiate.
A node should only be mapped for only once means, for example, if I want to map 10, it has priority 1 and description Start on the first graph and then find the same priority and description on the second graph. For this, 60 is there. There are no other nodes other than 60. But if we take 20 on the first graph, it has priority 2 and description left. On the second graph, there are two nodes with priority 2 and description left which is 70 and 80. This creates confusion. I cannot map 20 twice like 20/70 and 20/80. But I would like to put them as a single node as shown below on the sample graph.
I am expecting the following result.
To get the above result, I tried it with the following python code.
mapped_list= []
for node_1, data_1 in g1.nodes(data=True):
for node_2, data_2 in g2.nodes(data=True):
if (((g1.node[node_1]['priority']) == (g2.node[node_2]['priority'])) &
((g1.node[node_1]['description']) == (g2.node[node_2]['description']))
):
if (node_1 in mapped_list) & (node_2 in mapped_list): // check of if the node exist on the mapped_list
pass
else:
name = str(node_1) + '/' + str(node_2)
mapped_list.append((data_1["priority"], data_1["descriptions"], node_1, name))
mapped_list.append((data_2["priority"], data_2["descriptions"], node_2, name))
Can anyone help me to achieve the above result shown on the figure /graph/? Any help is appreciated.
The way I'd go about this instead, is to build a new graph taking the nx.union of both graphs, and then "combine" together the start and end nodes that share attributes using contracted_nodes.
Let's start by creating both graphs from the dataframes:
df1 = df1.drop('To',1).join(df1.To.str.replace(' ','').str.split(',').explode())
df2 = df2.drop('To',1).join(df2.To.str.replace(' ','').str.split(',').explode())
g1 = nx.from_pandas_edgelist(df1.iloc[:-1,[0,3]].astype(int),
source='From', target='To', create_using=nx.DiGraph)
g2 = nx.from_pandas_edgelist(df2.iloc[:-1,[0,3]].astype(int),
source='From', target='To', create_using=nx.DiGraph)
df1_node_ix = df1.assign(graph='graph1').set_index('From').rename_axis('nodes')
nx.set_node_attributes(g1, values=df1_node_ix.description.to_dict(),
name='description')
nx.set_node_attributes(g1, values=df1_node_ix.priority.to_dict(),
name='priority')
nx.set_node_attributes(g1, values=df1_node_ix.graph.to_dict(),
name='graph')
df2_node_ix = df2.assign(graph='graph2').set_index('From').rename_axis('nodes')
nx.set_node_attributes(g2, values=df2_node_ix.description.to_dict(),
name='description')
nx.set_node_attributes(g2, values=df2_node_ix.priority.to_dict(),
name='priority')
nx.set_node_attributes(g2, values=df2_node_ix.graph.to_dict(),
name='graph')
Now by taking the nx.union of both graphs, we have:
g3 = nx.union(g1,g2)
from networkx.drawing.nx_agraph import graphviz_layout
plt.figure(figsize=(8,5))
pos=graphviz_layout(g3, prog='dot')
nx.draw(g3, pos=pos,
with_labels=True,
node_size=1500,
node_color='red',
arrowsize=20)
What we can do now is come up with some data structure which we can later use to easily combine the pairs of nodes that share attributes. For that we can sort the nodes by their description. Sorting them will enable us to use itertools.groupby to group consecutive equal pairs of nodes, which we can then easily combine using nx.contrated_nodes, and then just overwrite on the same previous graph. The nodes can be relabeled as specified in the question with nx.relabel_nodes:
from itertools import groupby
g3_node_view = g3.nodes(data=True)
sorted_by_descr = sorted(g3_node_view, key=lambda x: x[1]['description'])
node_colors = dict()
colors = {'Bottom':'saddlebrown', 'Start':'lightblue',
'Left':'green', 'End':'lightblue'}
all_graphs = {'graph1', 'graph2'}
for _, grouped_by_descr in groupby(sorted_by_descr,
key=lambda x: x[1]['description']):
for _, group in groupby(grouped_by_descr, key=lambda x: x[1]['priority']):
grouped_nodes = list(group)
nodes = [i[0] for i in grouped_nodes]
graphs = {i[1]['graph'] for i in grouped_nodes}
# check if there are two nodes that share attributes
# and both belong to different graphs
if len(nodes)==2 and graphs==all_graphs:
# contract both nodes and update graph
g3 = nx.contracted_nodes(g3, *nodes)
# define new contracted node name and relabel
new_node = '/'.join(map(str, nodes))
g3 = nx.relabel_nodes(g3, {nodes[0]:new_node})
node_colors[new_node] = colors[grouped_nodes[0][1]['description']]
else:
for node in nodes:
node_colors[node] = 'red'
Which would give:
plt.figure(figsize=(10,7))
pos=graphviz_layout(g3, prog='dot')
nx.draw(g3, pos=pos,
with_labels=True,
node_size=2500,
nodelist=node_colors.keys(),
node_color=node_colors.values(),
arrowsize=20)
This is my code
output_data = []
out = ''
i = 0
P = 500
X = 40000
while i<600:
subVals = values[i:i+X]
signal=subVals.val1
signal, rpeaks = biosppy.signals.ecg.ecg(signal, show=False)[1:3]
rpeaks=rpeaks.tolist()
nni = tools.nn_intervals(rpeaks)
fre = fd.welch_psd(nni)
tm = td.nni_parameters(nni)
f1=(fre['fft_peak'])
t1=(tm['nni_min'])
f11=np.asarray(f1)
t11=np.asarray(t1)
input_t=np.append(f11,t11)
output_t=subVals.BLEEDING
output_t=int(round(np.mean(output_t)))
i+=P
As you see we are in a loop and the goal here is to create a data frame or a csv file from input_t and output_t. Here is an example of them in one loop
input_t
array([2.83203125e-02, 1.21093750e-01, 3.33984375e-01, 8.17000000e+02])
output_t
0
I am trying to create matrix where for every rows, the first three columns is one iteration of input_t and last column is output_t. Based on the code, since i needs to be less than 600 and the initial value of i is 0 and the step is 600 so we have two loops which makes it 2 rows in total and 5 columns(4 values from input_t and 1 value from output_t) . I tried append, I tried something like out+="," but I am not sure why that is not working
Init any variable as a list before the loop and add results to it
out = []
while i<600:
....
input_t=np.append(f11,t11)
output_t=subVals.BLEEDING
output_t=int(round(np.mean(output_t)))
out.append(input_t+[output_t])
Now out is the list of lists which you can load to DataFrame
I need help with writing code for a work project. I have written a script that uses pandas to read an excel file. I have a while-loop written to iterate through each row and append latitude/longitude data from the excel file onto a map (Folium, Open Street Map)
The issue I've run into has to do with the GPS data. I download a CVS file with vehicle coordinates. On some of the vehicles I'm tracking, the GPS loses signal for whatever reason and doesn't come back online for hundreds of miles. This causes issues when I'm using line plots to track the vehicle movement on the map. I end up getting long straight lines running across cities since Folium is trying to connect the last GPS coordinate before the vehicle went offline, with the next GPS coordinate available once the vehicle is back online, which could be hundreds of miles away as shown here. I think if every time the script finds a gap in GPS coords, I can have a new loop generated that will basically start a completely new line plot and append it to the existing map. This way I should still see the entire vehicle route on the map but without the long lines trying to connect broken points together.
My idea is to have my script calculate the absolute value difference between each iteration of longitude data. If the difference between each point is greater than 0.01, I want my program to end the loop and to start a new loop. This new loop would then need to have new variables init. I will not know how many new loops would need to be created since there's no way to predict how many times the GPS will go offline/online in each vehicle.
https://gist.github.com/tapanojum/81460dd89cb079296fee0c48a3d625a7
import folium
import pandas as pd
# Pulls CSV file from this location and adds headers to the columns
df = pd.read_csv('Example.CSV',names=['Longitude', 'Latitude',])
lat = (df.Latitude / 10 ** 7) # Converting Lat/Lon into decimal degrees
lon = (df.Longitude / 10 ** 7)
zoom_start = 17 # Zoom level and starting location when map is opened
mapa = folium.Map(location=[lat[1], lon[1]], zoom_start=zoom_start)
i = 0
j = (lat[i] - lat[i - 1])
location = []
while i < len(lat):
if abs(j) < 0.01:
location.append((lat[i], lon[i]))
i += 1
else:
break
# This section is where additional loops would ideally be generated
# Line plot settings
c1 = folium.MultiPolyLine(locations=[location], color='blue', weight=1.5, opacity=0.5)
c1.add_to(mapa)
mapa.save(outfile="Example.html")
Here's pseudocode for how I want to accomplish this.
1) Python reads csv
2) Converts Long/Lat into decimal degrees
3) Init location1
4) Runs while loop to append coords
5) If abs(j) >= 0.01, break loop
6) Init location(2,3,...)
7) Generates new while i < len(lat): loop using location(2,3,...)
9) Repeats step 5-7 while i < len(lat) (Repeat as many times as there are
instances of abs(j) >= 0.01))
10) Creats (c1, c2, c3,...) = folium.MultiPolyLine(locations=[location], color='blue', weight=1.5, opacity=0.5) for each variable of location
11) Creates c1.add_to(mapa) for each c1,c2,c3... listed above
12) mapa.save
Any help would be tremendously appreciated!
UPDATE:
Working Solution
import folium
import pandas as pd
# Pulls CSV file from this location and adds headers to the columns
df = pd.read_csv(EXAMPLE.CSV',names=['Longitude', 'Latitude'])
lat = (df.Latitude / 10 ** 7) # Converting Lat/Lon into decimal degrees
lon = (df.Longitude / 10 ** 7)
zoom_start = 17 # Zoom level and starting location when map is opened
mapa = folium.Map(location=[lat[1], lon[1]], zoom_start=zoom_start)
i = 1
location = []
while i < (len(lat)-1):
location.append((lat[i], lon[i]))
i += 1
j = (lat[i] - lat[i - 1])
if abs(j) > 0.01:
c1 = folium.MultiPolyLine(locations=[location], color='blue', weight=1.5, opacity=0.5)
c1.add_to(mapa)
location = []
mapa.save(outfile="Example.html")
Your while loop looks wonky. You only set j once, outside the loop. Also, I think you want a list of line segments. Did you want something like this;
i = 0
segment = 0
locations = []
while i < len(lat):
locations[segment] = [] # start a new segment
# add points to the current segment until all are
# consumed or a disconnect is detected
while i < len(lat):
locations[segment].append((lat[i], lon[i]))
i += 1
j = (lat[i] - lat[i - 1])
if abs(j) > 0.01:
break
segment += 1
When this is done locations will be a list of segments, e.g.;
[ segment0, segment1, ..... ]
each segment will be a list of points, e.g.;
[ (lat,lon), (lan,lon), ..... ]
I need to handle some hourly weather data from CSV files with 8,760 values per column. For example I need to plot a histogram with the longest coherent calms of wind speed, which means less than 3 m/s.
I have already created a histogram with the wind speed distribution but this one is way harder. So I need some kind of string which count the serial hours less than 3 m/s and count them together and plot in the end.
My idea is to apply a string which ask every value "less than 3?", if yes it needs to create a new calm and continue until the answer is no, then finish the calm and so on. In the end it should have a lot of calms from one hour to approx. 48 hours. The output is a histogram of these calms sorted by frequency.
I didn't expect somebody would write the code for me, sorry if it seems like that. I just asked for an idea but I think I almost got it.
Here is my code so far, it should create a vector for every calm and put it into a dictionary. It works but every key is filled by the same vector and I'm not sure how to fix this? (the vector itself is fine, starts at =<3 and count till =>3)
#read column v_wind
saved_column = df.v_wind
fig, ax = plt.subplots()
#collecting vectors in empty dictionary
# array range 100
vector_coll = {}
a = np.array(range(100))
#for loop create vector
#set calm to zero
#i = calm vectors
#b = empty array
calm = 0
i = -1
b = []
for t in range(0, 8760, 1):
if df.v_wind[t] <= 3:
if calm == 0:
b = []
b = np.append(b, [df.v_wind[t]])
calm = 1
else:
b = np.append(b, [df.v_wind[t]])
else:
calm = False
calm = 0
i = i + 1
for i in np.array(range(100)):
vector_coll[str(a[i])] = b
#print(vector_coll.keys())
#print(vector_coll['1'])
for i in vector_coll.keys():
if vector_coll[i] == []:
print('empty')
else:
print('full')
I'm trying to get the uvIndex of all the lat,lng present in a grib2 file.
This is the link from where I'm getting the file. The problem is I'm not able to understand the structure of the file so that I can get the data. I'm using pygrib to read the file.
Here's the code I've tried:
grbs = pygrib.open('uv.t12z.grbf01.grib2')
grb = grbs.select(name='UV index')[0]
print grb.data(23.5,55.5)
What I'm trying to achieve is either iterate over all the lat longs and print the corresponding uvIndex value or enter a lat long and get the corresponding value. Read the docs of pygrib but couldn't find any suitable command that will serve my purpose. Please help.
You have to iterate though GRIB file and find desirable record, then get data, like here:
for g in grbs:
print g.shortName, g.typeOfLevel, g.level # print info about all GRIB records and check names
if (g.shortName == shortName and g.typeOfLevel == typeOfLevel and g.level == level):
tmp = np.array(g.values)
# now work with tmp as numpy array
To get lat and lon arrays use: lt, ln = g.latlons(), g - element of grbs.
Read the examples in Section python at https://software.ecmwf.int/wiki/display/GRIB/GRIB+API+examples (pygrib use this library to read GRIB).
The fastest way to get data from large GRIB file is to make index:
# use attributes what you want to build index
indx = pygrib.index(gribfile,'typeOfLevel','level','parameterName')
# important: msg is an array and may have more then one record
# get U wind component on 10 m above ground
msg = indx.select(level = 10, typeOfLevel = "heightAboveGround",
parameterName = "U U-component of wind m s**-1")
u10 = np.array(msg[0].values)
# get V wind component on 10 m above ground
msg = indx.select(level = 10, typeOfLevel = "heightAboveGround",
parameterName = "V V-component of wind m s**-1")
v10 = np.array(msg[0].values)