My target is to create some AreaChart3D plots in an automatically way.
Precisely, for example I have the following picture:
This table is automatically outputed by a tool.
I can have only one graph, maybe 2 graphs or even 100 graphs (does not matter so much), it is important every time I will have this kind of behavior with Location, Speed, and some times inside.
Now, I would like to have in the second sheet(ws2_obj) 4 graphs or maybe 2 graphs depends how many graphs will be outputed by the tool.
If I would have had a fixed number of graph it would have been easier.
Because this graphs are not fixed i have to cover the entire sheet and I do not know how to do it.
Also, there is another question: how to handle Depth (% of base) using Python?
from openpyxl.chart import (
AreaChart3D,
Reference,
)
wb_obj = xl.load_workbook('Plots.xlsx')
ws_obj = wb_obj.active
ws2_obj = wb_obj.create_sheet("Graphs")
c1 = AreaChart3D()
c1.legend = None
c1.style = 15
cats = Reference(ws_obj, min_col=1, min_row=7, max_row=200)
data = Reference(ws_obj, min_col=2, min_row=6, max_col=8, max_row=200)
c1.add_data(data, titles_from_data=True)
c1.set_categories(cats)
ws2_obj.add_chart(c1, "A1")
wb_obj.save("Plots.xlsx")
The Code above produces only one graph, but how should I proceed to create 2 or 4 or 100 graphs?
Later edit 1:
I tried something like this and it is almost working:
for i in range(1, 4):
c1 = AreaChart3D()
cats = Reference(ws_obj, min_col=1, min_row=7, max_row=200)
data = Reference(ws_obj, min_col=2, min_row=6, max_col=i * int(step), max_row=200)
c1.title = ws_obj.cell(row=1, column=i * int(step)).value
c1.legend = None
c1.style = 15
c1.y_axis.title = 'Fire Time'
c1.x_axis.title = 'Temperature'
c1.z_axis.title = "Velocity"
c1.add_data(data, titles_from_data=True)
c1.set_categories(cats)
ws2_obj.add_chart(c1, "A2")
For me the last ws2_obj.add_chart(c1, "A2") seems to be the problematic one.
Instead of A2 I would like to use something like ws2_obj.add_chart(c1, cell(row=2, column=i)).value but does not working.
Later Edit 2
I have observed if you want to add a chart to a certain cell, you have to use something like: ws2_obj.add_chart(my_chart, "R2")
In order to use the for loop I tried to find out a way to get this value R2.
Please, see below:
my_cells = []
for i in range(1, 4):
my_cell = ws2_obj.cell(row=1, column=i * int(step) - (int(step) - 1))
my_cells.append(my_cell)
print("My_Cell:", my_cells)
new_cells = []
for i in my_cells:
new_cells.append(re.findall("\W\w\d", str(i)))
new_new_cells = []
for i in new_cells:
new_new_cells.append(i[0])
print("new_new_cells:", new_new_cells)
final_list = [re.sub('[^a-zA-Z0-9]+', '', _) for _ in new_new_cells]
print("final list:", final_list)
And the output will be ['A1', 'H1', 'O1']
and then I can output the graph:
for i in range(1, 4):
c1 = AreaChart3D()
# my_cell = ws2_obj.cell(row=i, column=i * int(step))
cats = Reference(ws_obj, min_col=1, min_row=7, max_row=255)
data = Reference(ws_obj, min_col=2, min_row=6, max_col=i * int(step), max_row=255)
c1.title = ws_obj.cell(row=1, column=i * int(step)).value
c1.legend = None
c1.style = 20
c1.y_axis.title = 'Time'
c1.x_axis.title = 'Location'
c1.z_axis.title = "Velocity"
c1.add_data(data, titles_from_data=True)
c1.set_categories(cats)
c1.x_axis.scaling.max = 75
c1.y_axis.scaling.max = 50
c1.z_axis.scaling.max = 25
ws2_obj.add_chart(c1, str(final_list[i - 1]))
You can create a list of the series data (position where the data series starts). The list has 1 element per series. Iterate the list creating a chart for each and ensure you have some means to place the chart in a unique position.
Example code with comments below.
import openpyxl as xl
from openpyxl.chart import (
AreaChart3D,
Reference,
)
def create_chart(tl, maxr, hdr, x_ax):
"""
Creates a standard Area 3D Chart
"""
cht = AreaChart3D()
cht.legend = None
cht.style = 15
cht.title = hdr + " Chart"
cht.x_axis.title = x_ax
cht.y_axis.title = 'Something' # Some text for the y axis
data = Reference(ws_obj, min_col=tl[0], min_row=tl[1], max_col=tl[0]+1, max_row=maxr-1)
cht.add_data(data, titles_from_data=True)
return cht
## Sheet constants
chart_header = 'Speed' # It is assumed this is located in a merged cell
x_axis_header = 'Location'
series_topleft_header = 25
## Load Workbook and Sheet of Excel with data series
wb_obj = xl.load_workbook('Plots.xlsx')
ws_obj = wb_obj.active
## Get the total used rows in the sheet (end of the series table)
maxrows = ws_obj.max_row
speed_row = ''
speed_col_start = ''
speed_col_end = ''
speed_col_letter = ''
## Get a list of Merged cell in the sheet these contain the Headers for position referencing
merge_list = [m.coord for m in ws_obj.merged_cells.ranges]
## Search for the row with Header name 'Speed' to use as reference for series data postioning
for merge_element in ws_obj.merged_cells:
merge_cell_val = merge_element.start_cell.internal_value
if merge_cell_val.lower() == chart_header.lower():
speed_row = merge_element.max_row
speed_col_start = merge_element.min_col
speed_col_end = merge_element.max_col
speed_col_letter = merge_element.start_cell.column_letter
series_header_row = speed_row + 1
series1_start = speed_col_letter + str(series_header_row+1)
"""
Obtain the location of the top left cell where the series data exists
This searches the row below the header (containing the text 'Speed') for the first
series header (i.e. 25 in the example) and adds each position to the series_postion_list
"""
series_position_list = []
for row in ws_obj.iter_rows(min_row=series_header_row,
max_row=series_header_row,
min_col=speed_col_start,
max_col=speed_col_end):
for cell in row:
if cell.value == series_topleft_header:
series_position_list.append([cell.column, series_header_row])
## Create the Charts
"""
With the series_position_list indicating the top left cell of the series data
and the number of rows in the series determined be the maxrows - 1. This data
can be passed to the create_chart function to create the chart.
Charts are placed below the series data table from Column A with two charts
per row. First row for chart location is 2 rows below the series table.
"""
chart_start_row = maxrows + 2
chart_col = 'A'
"""
The series_position_list is used to create 1 chart per series
The chart creation function takes the top left coordinate and max rows along
with Chart header name and x axis header name
"""
for enum, top_left in enumerate(series_position_list, 1):
chart_obj = create_chart(top_left,
maxrows,
chart_header + ' ' + str(enum),
x_axis_header)
## This sets the position the chart will be placed. Based on standard size
## of plot area the charts are 16 rows and 10 columns apart
if enum == 1:
pass
elif enum % 2 == 1:
chart_col = 'A'
chart_start_row += 16
else:
chart_col = 'J'
## Adds chart to the Excel sheet
print(f"Adding chart {chart_header + ' ' + str(enum)} to Excel:")
print(f"Series Data Start; Row:{str(top_left[1]+1)} Column:{top_left[0]}")
ws_obj.add_chart(chart_obj, chart_col + str(chart_start_row))
print("--------------\n")
wb_obj.save("Plots.xlsx")
-----------------Additional Information--------------
add_chart is a method that accepts two arguments; the chart object and optionally an anchor point (i.e the top left cell where the chart is placed in the sheet). Use of .value at the end of
ws2_obj.add_chart(c1, cell(row=2, column=i)).value
is invalid as you are not entering the method into the cell you are using the method to add the chart object c1 at position cell(row=2, column=i). Using cell(row=2, column=i) is also an invalid syntax. You may have meant to use ws2_obj.cell(row=2, column=i) as the anchor. This would be accepted by the add_chart method however when saving the worksheet there would be an error on checking the anchor point as this expects the anchor to be an "Excel style coordinate" i.e. a string like 'A2' rather than a cell object like ws2_obj.cell(row=2, column=i). Even using (2, 1) would fail the same check.
To set the anchor points I will show how to do two options; All charts on the same row and X charts across the row then start next X charts on the next row etc.
Place all charts on same row;
If you are going to put all charts on the same row then the row coord will not change and only the column position needs adjustment for each chart.
You can generate the anchor points like below, the example code uses a for loop with 18 elements;
from openpyxl.utils.cell import coordinate_to_tuple
from openpyxl.utils import get_column_letter
anchor = 'A2' # Position of anchor, first anchor point is 'A2'
column_separation = 9 # Number of columns to separate each chart
for i in range(0, 18):
coord_tuple = coordinate_to_tuple(anchor)
row = coord_tuple[0]
col_offset = column_separation if i > 0 else 0
col_new = get_column_letter(coord_tuple[1] + col_offset)
anchor = f'{col_new}{row}'
print(f'Adding chart at Anchor point {anchor}')
ws2_obj.add_chart(c1, anchor)
This will put the chart at the following achor points;
A2, J2, S2, AB2, AK2, AT2, BC2, BL2, BU2, CD2, CM2, CV2, DE2, DN2, DW2, EF2, EX2, EO2
Placing the charts is a pattern.
Placing the charts is a pattern of rows and columns is similar to the previous code however when the number of charts reaches your limit the 'row' value has to change and the column resets back to 'A'.
The example code again uses a for loop with 18 elements and splits the charts into rows of max_chart_row, set to 5 in this case;
from openpyxl.utils.cell import coordinate_to_tuple
from openpyxl.utils import get_column_letter
anchor = 'A2'
column_separation = 9
max_chart_row = 5
for i in range(0, 18):
coord_tuple = coordinate_to_tuple(anchor)
row = coord_tuple[0]
col_offset = column_separation if i > 0 else 0
# When the number of charts across the row is reached, set the row to 16 more than the current
# and reset the column offset to 0
if i % (max_chart_row) == 0 and i != 0:
row = row + 16
col_offset = 0
col_new = get_column_letter(col_offset+1)
else:
col_new = get_column_letter(coord_tuple[1] + col_offset)
anchor = f'{col_new}{row}'
print(f'Adding chart at Anchor point {anchor}')
ws2_obj.add_chart(c1, anchor)
This will put the chart at the following achor points;
A2, J2, S2, AB2, AK2,
A18, J18, S18, AB18, AK18,
A34, J34, S34, AB34, AK34,
A50, J50, S50
I receive timeseries data from a broker and want to implement condition monitoring on this data. I want to analyze the data in a window of size 10. The window size must always stay the same. When the 11th data comes, I need to check its value against two thresholds which are calculated from the 10 values inside a window. If the 11th data is outsider, I must delete the data from the list and if it is within the range, I must delete the first element and add the 11th data to the last element. So this way the size of window stays the same. The code is simplified. data comes each 1 second.
temp_list = []
window_size = 10
if len(temy_list) <= window_size :
temp_list.append(data)
if len(temp_list) == 10:
avg = statistics.mean(temp_list)
std = statistics.stdev(temp_list)
u_thresh = avg + 3*std
l_thresh = avg - 3*std
temp_list.append(data)
if temp_list[window_size] < l_thresh or temp_list[window_size] > u_thresh:
temp_list.pop(-1)
else:
temp_list.pop(0)
temp_list.append(data)
With this code the list does not get updated and 11th data is stored and then no new data. I don't know how to correctly implement it. Sorry, if it is a simple question. I am still not very comfortable with python list. Thank you for your hint/help.
With how your code currently is if you plan to keep the last data point you add it twice instead. You can simplify your code down to make it a bit more clear and straightforward.
##First setup your initial variables
temp_list = []
window_size = 10
Then -
While(True):
data = ##Generate/Get data here
## If less than 10 data points add them to list
if len(temp_list) < window_size :
temp_list.append(data)
## If already at 10 check if its within needed range
else:
avg = statistics.mean(temp_list)
std = statistics.stdev(temp_list)
u_thresh = avg + 3*std
l_thresh = avg - 3*std
## If within range add point to end of list and remove first element
if(data >= l_thresh and data <= u_thresh):
temp_list.pop(0)
temp_list.append(data)
I have a python code that is collecting information from a dataframe (df1) like this
for ind, data in enumerate(df1.Link):
print(data)
result = getInformation(driver, links)
for i in result['information']:
df1.loc[ind, "numOfWorkers"] = i["numOfWorkers"]
the output is saved to a dataframe like the one shown in the photo:
Is there anyway to update my code before it returns the dataframe with this condition:
if noOfWorkers >=30, once we have 2 links that have this condition, the code will break and return the result
can someone help ?
It would be better to put the logic in the code you already have. I would keep a count of the number of records matching the conditions, then exit out of the loop using break (rather than a while loop):
...
workers_threshold = 30
records_matching_threshold = 0
max_records_for_matching_records = 2
for i in result['information']:
df1.loc[ind, "numOfWorkers"] = i["numOfWorkers"]
if i["numOfWorkers"] > workers_threshold:
records_matching_threshold += 1
if records_matching_threshold > max_records_for_matching_records:
break
Note that the variable names above are purposefully long to make their purposes clear in my example.
i = 2
while i != 0:
if numOfWorkers >= 30:
i- = 1
I am trying to filter CSV file where I need to store prices of different commodities that are > 1000 in different arrays, I can able to get only 1 commodity values perfectly but other commodity array just a duplicate of the 1st commodity.
CSV file looks like below figure:
CODE
import matplotlib.pyplot as plt
import csv
import pandas as pd
import numpy as np
# csv file name
filename = "CommodityPrice.csv"
# List gold price above 1000
gold_price_above_1000 = []
palladiun_price_above_1000 = []
gold_futr_price_above_1000 = []
cocoa_future_price_above_1000 = []
df = pd.read_csv(filename)
commodity = df["Commodity"]
price = df['Price']
for gold_price in price:
if (gold_price <= 1000):
break
else:
for gold in commodity:
if ('Gold' == gold):
gold_price_above_1000.append(gold_price)
break
for palladiun_price in price:
if (palladiun_price <= 1000):
break
else:
for palladiun in commodity:
if ('Palladiun' == palladiun):
palladiun_price_above_1000.append(palladiun_price)
break
for gold_futr_price in price:
if (gold_futr_price <= 1000):
break
else:
for gold_futr in commodity:
if ('Gold Futr' == gold_futr):
gold_futr_price_above_1000.append(gold_futr_price)
break
for cocoa_future_price in price:
if (cocoa_future_price <= 1000):
break
else:
for cocoa_future in commodity:
if ('Cocoa Future' == cocoa_future):
cocoa_future_price_above_1000.append(cocoa_future_price)
break
print(gold_price_above_1000)
print(palladiun_price_above_1000)
print(gold_futr_price_above_1000)
print(cocoa_future_price_above_1000)
plt.ylim(1000, 3000)
plt.plot(gold_price_above_1000)
plt.plot(palladiun_price_above_1000)
plt.plot(gold_futr_price_above_1000)
plt.plot(cocoa_future_price_above_1000)
plt.title('Commodity Price(>=1000)')
y = np.array(gold_price_above_1000)
plt.ylabel("Price")
plt.show()
print("SUCCESS")
Here is my question in detail,
Please use pandas and matplotlib to sort out the data in the csv and output and store the sorted data into the process chart. The output results are shown in the following figures.
Figure 1 The upper picture is to take all the products with Price> = 1000 in csv, mark all their prices in April and May and draw them into a linear graph. When outputting, the year in the date needs to be removed. The label name is marked and displayed. The title names of the chart, x-axis, and y- axis need to be marked. The range of the y-axis falls within 1000 ~ 3000, and the color of the line is not specified.
Figure 1 The picture below is from all the products with Price> = 1000 in csv. Mark their Change% in April and May and draw them into a dotted line graph. The dots need to be in a dot style other than '.' And 'o'. To mark, please mark the line with a line other than a solid line. When outputting, you need to remove the year from the date. You need to mark and display the label name of each line. The title names of the chart, x-axis, and y-axis must be marked. You need to add a grid line, the y-axis range falls from -15 to +15, and the color of the line is not specified.
The upper and lower two pictures in Figure 2 are changed to 1000> Price> = 500. The other conditions are basically the same as in Figure 1, except that the points and lines of the dot and line diagrams below Figure 2 need to use different styles from Figure 1.
The first and second pictures in Figure 1 must be displayed in the same window, as is the picture in Figure 2.
All of your blocks of code are doing the exact same thing. Changing the same of the iterator doesn't change what it does.
for gold_price in price:
for palladiun_price in price:
for gold_futr_price in price:
for cocoa_future_price in price:
This is going through the exact same data. You haven't subsetted for specific commodities.
Using the break statement in that loop doesn't make sense either. It should be a pass.
Basically for every number above 1000, you iterate through your entire Commodities column and add number to the list for every time you see a specific commodity.
Read how to index and select data in pandas.
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
gold_price_above_1000 = df[(df.Commodity=='Gold') & (df.Price>1000)]
I need to handle some hourly weather data from CSV files with 8,760 values per column. For example I need to plot a histogram with the longest coherent calms of wind speed, which means less than 3 m/s.
I have already created a histogram with the wind speed distribution but this one is way harder. So I need some kind of string which count the serial hours less than 3 m/s and count them together and plot in the end.
My idea is to apply a string which ask every value "less than 3?", if yes it needs to create a new calm and continue until the answer is no, then finish the calm and so on. In the end it should have a lot of calms from one hour to approx. 48 hours. The output is a histogram of these calms sorted by frequency.
I didn't expect somebody would write the code for me, sorry if it seems like that. I just asked for an idea but I think I almost got it.
Here is my code so far, it should create a vector for every calm and put it into a dictionary. It works but every key is filled by the same vector and I'm not sure how to fix this? (the vector itself is fine, starts at =<3 and count till =>3)
#read column v_wind
saved_column = df.v_wind
fig, ax = plt.subplots()
#collecting vectors in empty dictionary
# array range 100
vector_coll = {}
a = np.array(range(100))
#for loop create vector
#set calm to zero
#i = calm vectors
#b = empty array
calm = 0
i = -1
b = []
for t in range(0, 8760, 1):
if df.v_wind[t] <= 3:
if calm == 0:
b = []
b = np.append(b, [df.v_wind[t]])
calm = 1
else:
b = np.append(b, [df.v_wind[t]])
else:
calm = False
calm = 0
i = i + 1
for i in np.array(range(100)):
vector_coll[str(a[i])] = b
#print(vector_coll.keys())
#print(vector_coll['1'])
for i in vector_coll.keys():
if vector_coll[i] == []:
print('empty')
else:
print('full')