So this is a bit of a open question and any feedback would be appreciated. Essentially, I want to create the below chart (possibly in highcharts, highcharter, python or just R). What is the name of this specific chart (ranked heatmap)
The issue I keep running into on each of my attempts is that there is no fixed y-axis. As the chart above shows, each year is ranked from best performing asset to worst with each asset having a specific colour.
I have tried to create a heatmap but due to the y-axis still being fixed, the ranked aspect does not work. Below is a draft version of what I tried to create in highcharter.
JS fiddle reference: https://www.highcharts.com/demo/heatmap
So pretty much, if someone can point me in the right direction or share their thoughts ito creating a chart like the first one that would be usefull
Thank you in advance.
You can simply calculate y value based on your data. Assuming that you have similar data format as below:
const columnsData = [{
year: '2001',
data: [{
name: 'A',
value: 55
}, {
name: 'B',
value: 45
}, ...]
}, {
year: '2002',
data: [...]
}, {
year: '2003',
data: [...]
}];
You can loop through data and build data structure required by Highcharts:
columnsData.forEach(column => {
column.data.sort((a, b) => a.value - b.value);
column.data.forEach((dataEl, index) => {
processedData.push({
name: column.year,
y: index,
value: dataEl.value,
dataLabels: {
format: dataEl.name + ': ' + dataEl.value
}
});
});
});
Highcharts.chart('container', {
...,
series: [{
data: processedData,
...
}]
});
Live demo: https://jsfiddle.net/BlackLabel/jkzsbv4c/
API Reference: https://api.highcharts.com/highcharts/series.heatmap.data
Related
I am trying to use the google sheets api for python to format only a specific columns results to a "NUMBER" type but am struggling to get it to work properly. Am I doing something wrong with the "range" block? There are values that are getting appended to the column and when they get appended (via a different api set) they do not come back as formatted numbers that, when highlighting the entire column, result in a numbered sum.
id_sampleforstackoverflow = 'abcdefg123xidjadsfh192810'
cost_sav_body = {
"requests": [
{
"repeatCell": {
"range": {
"sheetId": 0,
"startRowIndex": 2,
"endRowIndex": 6,
"startColumnIndex": 0,
"endColumnIndex": 6
},
"cell": {
"userEnteredFormat": {
"numberFormat": {
"type": "NUMBER",
"pattern": "#.0#;#.0#"
}
}
},
"fields": "userEnteredFormat.numberFormat"
}
}
]
}
cost_sav_sum = service.spreadsheets().batchUpdate(spreadsheetId=id_sampleforstackoverflow, body=cost_sav_body).execute()
So when I run the above with the rest of my code, the values get appended, however, when highlighting the column, it simply gives me a count of the objects, and not a formatted number summing the total of the values (i.e. there are three values of -24, but only see a "Count" of 3 instead of -72).
I am using the GCP recommendations api for machineType to append the cost projection -> costs -> units value to the column (they append for example like i.e. -24).
Can someone help?
Documentation I have already gone through:
https://cloud.google.com/blog/products/application-development/formatting-cells-with-the-google-sheets-api
https://developers.google.com/sheets/api/guides/formats
https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/other#GridRange
#all
I was able to figure out the problem. When doing straight reporting of the values for the cost (as explained above as an objective) I was converting the output to string using the str() python method. I removed that str() method and kept the rest of the code you see above and now things are posting correctly:
#spend = str(element.primary_impact.cost_projection.cost.units)
spend = element.primary_impact.cost_projection.cost.units
So FYI for anyone else wondering, make sure that str() method is not used if you need to do a custom formatting code to those particular cells!
I am working on a litte stock-market project in Python.
Every week, a status.xlsx file is generated that tells me what sectors make up my portfolio, e.g.
xls = pd.ExcelFile('Status_week_9.xlsx')
df1=pd.read_excel(xls)
print(df1)
I am looping over all files, so I get a dataframe similar to this for every file:
sector pct
Consumer Cyclical 0.319638
Industrials 0.203268
Financial Services 0.200217
...
Now I would like to loop through these weekly reports and pass the data to a Chart.js template in order to get a stacked bar for each week that shows the increase / decrease of sectors over time, e.g. https://jsfiddle.net/qg1ya5vk/4/ (demo only, does not add up to 1)
The template file looks like this:
The idea was to use a template with placeholders for the chart and use
from string import Template
to replace the placeholders with the corresponding values.
The variables are "labels" and "dataset". "Labels" consists of the calendar weeks. This can be extracted from the filenames. So far, so good ;-)
Now for the "dataset", I'm not sure how to get information from the n dataframes.
One dataset would have to look like this:
{label: 'Energy',
data: [0.037975, 0.038512, 0.039817, 0.065010],}
So this would mean that the Energy sector had a share of 3.7975% in week 1, 3.8512% in week 2 etc. Complicating things even further, it's also possible that one sector is not present in one week, so I would have to add a 0 then, because the sector is not part of the original dataframe.
There can be n (comma-separated) datasets.
I probably have to transpose the dataframe but it doesn't quite do the trick (yet).
So far my code looks like this:
import pandas as pd
import glob
import os
from string import Template
labels=[]
dataset=[]
files =...
for i in files:
cw = i.split('_')[3].split('.')[0] ## extract calendar week from filename
xls = pd.ExcelFile(i)
df1 = pd.read_excel(xls, 'Sectors') ## select sheet 'Sectors'
df1['CW'] = cw ## add cw to df1
df1_t = df1.T. ## transpose df1
sectors = df1.sector.to_list()
share = df1.pct.to_list()
labels.append(cw)
dataset.append(df1_t) ##??
# {
# label: 'Energy',
# data: [0.037975, 0.038512, 0.039817, 0.065010],
# }
d = {'label' : labels, 'datasets' : dataset}
## open Chart template and put in values:
with open('template.txt', 'r') as f:
src = Template(f.read())
result = src.substitute(d)
print(result)
How would you generate the datasets?
I am also thinking this is a little bit long-winded and error-prone. Maybe there is another way to tackle this?
Template for chart:
var label = $label;
var ctx = document.getElementById("myChart4").getContext('2d');
var myChart = new Chart(ctx, {
type: 'bar',
data: {
labels: label,
datasets: [
$datasets //**this is the culprit**
],
},
options: {
tooltips: {
displayColors: true,
callbacks:{
mode: 'x',
},
},
scales: {
xAxes: [{
stacked: true,
gridLines: {
display: false,
}
}],
yAxes: [{
stacked: true,
ticks: {
beginAtZero: true,
},
type: 'linear',
}]
},
responsive: true,
maintainAspectRatio: false,
legend: { position: 'bottom' },
}
});
What you are looking for is the json module. You can simply arrange your data in the correct form in python and write to a JSON that Chart.js can understand.
import json
data = {'label' : labels, 'datasets' : dataset}
with open('data.json', 'w') as f:
json.dump(data, f)
This question is about how to get your JSON into Chart.js.
I need to count viewers by program for a streaming channel from a json logfile.
I identify the programs by their starttimes, such as:
So far I have two Dataframes like this:
The first one contains all the timestamps from the logfile
viewers_from_log = pd.read_json('sqllog.json', encoding='UTF-8')
# Convert date string to pandas datetime object:
viewers_from_log['time'] = pd.to_datetime(viewers_from_log['time'])
Source JSON file:
[
{
"logid": 191605,
"time": "0:00:17"
},
{
"logid": 191607,
"time": "0:00:26"
},
{
"logid": 191611,
"time": "0:01:20"
}
]
The second contains the starting times and titles of the programs
programs_start_time = pd.DataFrame.from_dict('programs.json', orient='index')
Source JSON file:
{
"2019-05-29": [
{
"title": "\"Amiről a kövek mesélnek\"",
"startTime_dt": "2019-05-29T00:00:40Z"
},
{
"title": "Koffer - Kedvcsináló Kul(t)túrák Külföldön",
"startTime_dt": "2019-05-29T00:22:44Z"
},
{
"title": "Gubancok",
"startTime_dt": "2019-05-29T00:48:08Z"
}
]
}
So what I need to do is to count the entries / program in the log file and link them to the program titles.
My approach is to slice log data for each date range from program data and get the shape. Next add column for program data with results:
import pandas as pd
# setup test data
log_data = {'Time': ['2019-05-30 00:00:26', '2019-05-30 00:00:50', '2019-05-30 00:05:50','2019-05-30 00:23:26']}
log_data = pd.DataFrame(data=log_data)
program_data = {'Time': ['2019-05-30 00:00:00', '2019-05-30 00:22:44'],
'Program': ['Program 1', 'Program 2']}
program_data = pd.DataFrame(data=program_data)
counts = []
for index, row in program_data.iterrows():
# get counts on selected range
try:
log_range = log_data[(log_data['Time'] > program_data.loc[index].values[0]) & (log_data['Time'] < program_data.loc[index+1].values[0])]
counts.append(log_range.shape[0])
except:
log_range = log_data[log_data['Time'] > program_data.loc[index].values[0]]
counts.append(log_range.shape[0])
# add aditional column with collected counts
program_data['Counts'] = counts
Output:
Time Program Counts
0 2019-05-30 00:00:00 Program 1 3
1 2019-05-30 00:22:44 Program 2 1
A working (but maybe a little quick and dirty) method:
Use the .shift(-1) method on the timestamp column of programs_start_time dataframe, to get an additional column with a name date_end indicating the timestamp of end for each TV program.
Then for each example_timestamp in the log file, you can query the TV programs dataframe like this: df[(df['date_start']=<example_timestamp) & (df['date_end']>example_timestamp)] (make sure you substitute df with your dataframe's name: programs_start_time) which will give you exactly one dataframe row and extract from it the name of the TV programm.
Hope this helps!
Solution with histogram, using numpy:
import pandas as pd
import numpy as np
df_p = pd.DataFrame([
{
"title": "\"Amiről a kövek mesélnek\"",
"startTime_dt": "2019-05-29T00:00:40Z"
},
{
"title": "Koffer - Kedvcsináló Kul(t)túrák Külföldön",
"startTime_dt": "2019-05-29T00:22:44Z"
},
{
"title": "Gubancok",
"startTime_dt": "2019-05-29T00:48:08Z"
}
])
df_v = pd.DataFrame([
{
"logid": 191605,
"time": "2019-05-29 0:00:17"
},
{
"logid": 191607,
"time": "2019-05-29 0:00:26"
},
{
"logid": 191611,
"time": "2019-05-29 0:01:20"
}
])
df_p.startTime_dt = pd.to_datetime(df_p.startTime_dt)
df_v.time = pd.to_datetime(df_v.time)
# here's part where I convert datetime to timestamp in seconds - astype(int) casts it to nanoseconds, hence there's // 10**9
programmes_start = df_p.startTime_dt.astype(int).values // 10**9
viewings_starts = df_v.time.astype(int).values // 10**9
# make bins for histogram
# add zero to the beginning of the array
# add value that is time an hour after the start of the last given programme to the end of the array
programmes_start = np.pad(programmes_start, (1, 1), mode='constant', constant_values=(0, programmes_start.max()+3600))
histogram = np.histogram(viewings_starts, bins=programmes_start)
print(histogram[0]
# prints [2 1 0 0]
Interpretation: there were 2 log entries before 'Amiről a kövek mesélnek' started, 1 log entry between starts of 'Amiről a kövek mesélnek' and 'Koffer - Kedvcsináló Kul(t)túrák Külföldön', 0 log entries between starts of 'Koffer - Kedvcsináló Kul(t)túrák Külföldön' and 'Gubancok' and 0 entries after start od 'Gubancok'. Which, looking at the data you provided, seems correct :) Hope this helps.
NOTE: I assume, that you have the date of the viewings. You don't have them in the example log file, but they appear in the screenshot - so I assumed that you can compute/get them somehow and added them by hand to the input dict.
I have a collection with documents like this:
{
"_id" : "1234567890",
"area" : "Zone 63",
"last_state" : "Cloudy",
"recent_indices" : [
21,
18,
33,
...
38
41
],
"Report_stats" : [
{
"date_hour" : "2017-01-01 01",
"count" : 31
},
{
"date_hour" : "2017-01-01 02",
"count" : 20
},
...
{
"date_hour" : "2018-08-26 13",
"count" : 3
}
]
}
which is supposed to be updated based on some online real-time reports
and assume each report looks like this:
{
'datetime' : '2018-08-26 13:48:11.677635',
'areas' : 'Zone 3; Zone 45; Zone 63',
'status' : 'Clear',
'index' : '33'
}
Now I have to update the collection in way that:
Each time that a new 'area' (say Zone 1025) shows up on the report, a new document adds to keep the related data
New 'index' adds to list "recent_indices" while "last_state" updates to 'status'
based on what the 'datetime' is, the respective "Report_stats.count" increments by 1 or a new "Report_stats" document ('datetime' with an hour resolution, where its 'count' is 1) inserted.
The way to do each of these updates separately, is somehow obvious, the problem is: How can I do all these simultaneously in a single update/upsert task?
I tried to use update_one and find_one_and_update(as well as update and find_and_modify) using pyMongo, but it was not possible (for me at least) to resolve the problem.
So I started to wonder if there possibly is a simple/single task to do so, or I should start trying to fix it in a different way altogether.
Can you please help me how to do this or (since there is a lot of data being gathered and therefore should be processed) suggest a low-cost alternative?
Thank you!
I am unsure if I understand your question, but if your problem revolves around upsert i.e update it or add the record if it is not there.
You can do it by adding one parameter like this:
update_one( {'_id':1}, {$set:{}}, upsert=True )
If you want to update multiple fields you can simply do it like setting your updated document:
{
name: 'Kanika',
age: 19
},
//set document
{
name: 'Andy',
age: 30
}
Please try looking into: https://docs.mongodb.com/manual/reference/method/db.collection.update/ , if it helps.
Thanks, Kanika
The best solution I have reached so far, is this:
if mycollection.find_one({'area': 'zone 45', 'Report_stats.date_hour': '2018-08-26 13'}):
mycollection.update_one({'area': 'zone 45', 'Report_stats.date_hour': '2018-08-26 13'},
{
'$inc': {
'Report_stats.$.count': 1
},
'$set': {
'last_state': 'Clear'
},
'$push': {
'recent_indices': 33,
}
},
)
else:
mycollection.update_one({'area': 'zone 45'},
{
'$set': {
'last_state': 'Clear'
},
'$push': {
'recent_indices': 33,
'Report_stats':{'date_hour':'2018-08-26 13', 'count':1}
}
},
upsert = True
)
However, it still is performing one query twice to update one document based on one request, which is not quite satisfactory.
Any better suggestions?
What if I figured out from your above reply is that if Report_stats.date_hour exists in your document, then you increment the counter or else you just push a new document.
I believe we can do it using $cond or $switch. Can you please take a look.
https://docs.mongodb.com/manual/reference/operator/aggregation/cond/#exp._S_cond
Meanwhile, I am trying to write the whole query for you and lets see if it works.
Thanks, Kanika
I would like to ask for a small guideline regarding the following code structure.
For example, suppose I have the following lists already created and filled with data:
movie_name
movie_description
movie_poster
movie_year
movie_duration
Is this possible to make a dictionary with the following structure with data taken from the lists:
{ movie_name : [
{description : movie_description},
{poster : movie_poster},
{year : movie_year},
{duration : movie_duration}
]
}
Thank you.
If the indices of these lists align you can just zip them together to get tuples containing these values. You could then with a simple dictionary comprehension turn it in the datastructure you want.
res = {
name: {
'description': description,
'poster': poster,
'year': year,
'duration': duration
}
for name, description, poster, year, duration in zip(
movie_name,
movie_description,
movie_poster,
movie_year,
movie_duration
)
}