I'm currently struggling with a basic function of the influx_client in Python. I have a set of time series data which I want to add into an influxdb on a different client. My current code looks kinda like this:
client = InfluxDBClient(url=f"http://{ip}:{port_db}", token=token, org=org)
write_api = client.write_api(write_options=ASYNCHRONOUS)
p = Point("title_meas").field("column_data", value_data)
write_api.write(bucket=bucket, org=org, record=p)
Now I got a specific timestamp for each point I want to use as the InfluxDB keys/timestamps but whatever I try - it keeps on adding the system time of my host device (But as I'm working with historical data I need to adjust the timespecs). How I can achieve my custom timestamps or is there a easier way instead of using the Point method adding my data line by line... something like a Pandas dataframe maybe?
Thankful for every advice.
You can write via line protocol, Point objects, Pandas Dataframe, or json Dictionary. All are viable methods.
If you care about throughput line protocol is the fastest, but if tiny speed differences are not important just use whatever you want. I highly recommend reading this. The "tag" you are looking to modify on the influx datapoint is called "_time".
To add it to a Point do:
p = Point("title_meas").field("column_data", value_data).time('2021-08-09T18:04:56.865943'))
or json dictionary protocol:
p = {'measurement':'title_meas', 'time': '2021-08-09T18:04:56.865943',
'tags':{'sometag': 'sometag'},
'fields':{'column_data': value_data}
}
Easiest way to ensure timestamps are what you expect is to use UTC/ISO format.
Related
im studying Array database management systems a bit, in particular Rasdaman, i understand superficially the architecture and how the system works with sets and multidimensional arrays instead of tables as it is usual in relational dbms, im trying to save my own type of data to check if this type of databases can give me better performance to my specific problem(geospatial data in a particular format: DGGS), to do so i have created my own basic type based on a structure as indicated by the documentation, created my array type, set type and finally my collection for testing, i'm trying to insert data into this collection with the following idea:
query_executor.execute_update_from_file("insert into test_json_dict values decode($1, 'json', '{\"formatParameters\": {\"domain\": \"[0:1000]\",\"basetype\": struct { char k, long v } } })'", "...path.../rasdapy-demo/dggs_sample.json")
I'm using the library rasdapy to work from python instead of using rasql only(i use it anyways to validate small things), but i have been fighting with error messages that give little to no information:
Internal error: RasnetClientComm::executeQuery(): illegal status value 5
My source file has this type of data into it:
{
"N1": 6
}
A simple dict with a key and a value, i wanna save both things, i also tried to have a bigger dict with multiples keys and values on it but as the rasdaman decode function expects a basetype definition if i understand correctly i tried to change my data source format as a simple dict. It is obvious to see that i'm not doing the appropriate definition for decoding or that my source file has the wrong format but i haven't been able to find any examples on the web, any ideas on how to proceed? maybe i am even doing this whole thing from the wrong perspective and maybe i should try to use the OGC Web Coverage Service (WCS) standard ? i don't understand this yet so i have been avoiding it, anyways any advice or direction is greatly appreciated. Thanks in advance.
Edit:
I have been trying to load CSV data with the following format:
1 930
2 461
..
and the following query
query_executor.execute_update_from_file("insert into test_json_dict values decode($1, 'csv', '{\"formatParameters\": {\"domain\": \"[1:255]\",\"basetype\": struct { char key, long value } } })'", "...path.../rasdapy-demo/dggs_sample_4.csv")
but still no results, even tho it looks quite similar to the documentation example in Look for the CSV/JSON examples but no results still. What could be the issue?
It seems that my problem was trying to use the rasdapy library, this lib works fine but when working with data formats like csv and json it is best to use the rasql command line option, it states in the documentation :
filePaths - An array of absolute paths to input files to be decoded, e.g. ["/path/to/rgb.tif"]. This improves ingestion performance if the data is on the same machine as the rasdaman server, as the network transport is bypassed and the data is read directly from disk. Supported only for GDAL, NetCDF, and GRIB data formats.
and also it says:
As a first parameter the data to be decoded must be specified. Technically this data must be in the form of a 1D char array. Usually it is specified as a query input parameter with $1, while the binary data is attached with the --file option of the rasql command-line client tool, or with the corresponding methods in the client API.
It would be interesting to note if rasdapy takes this into account. Anyhow use of rasql gives way better response errors so i recommend that to anyone having a similar problem.
An example command could be:
rasql -q 'insert into test_basic values decode($1, "csv", "{ \"formatParameters\": {\"domain\": \"[0:1,0:2]\",\"basetype\": \"long\" } }")' --out string --file "/home/rasdaman/Documents/TFM/include/DGGS-Comparison/rasdapy-demo/dggs_sample_6.csv" --user rasadmin --passwd rasadmin
using this data:
1,2,3,2,1,3
After that you just got to start making it more and more complex as you need.
I am absolutely new to Python or coding for that matter, hence, any help would be greatly appreciated. I have around 21 Salesforce orgs and am trying to get some information from each of the org into one place to send out in an email.
import pandas as pd
df = pd.read_csv("secretCSV.csv", usecols = ['client','uname','passw','stoken'])
username = df.loc[[1],'uname'].values[0]
password = df.loc[[1],'passw'].values[0]
sectocken = df.loc[[1],'stoken'].values[0]
I have saved all my username, password, security tokens in secretCSV.csv file and with the above code I can get the data for 1 row as the index value I have given is 0. I would like to know how can I loop through this and after each loop, how to increase the index value until all rows from the CSV file is read.
Thank you in advance for any assistance you all can offer.
Adil
--
You can iterate on the dataframe but it's highly not recommend (not efficient, looks bad, too much code etc)
df = pd.read_csv("secretCSV.csv", usecols = ['client','uname','passw','stoken'])
so DO NOT DO THIS EVEN IF IT WORKS:
for i in range (0, df.shape[0]):
username = df.loc[[i],'uname'].values[0]
password = df.loc[[i],'passw'].values[0]
sectocken = df.loc[[i],'stoken'].values[0]
Instead, do this:
sec_list = [(u,p,s) for _,u,p,s in df.values]
now you have a sec_list with tuples (username, password, sectocken)
access example: sec_list[0][1] - as in row=0 and get the password (located at [1]).
Pandas is great when you want to apply operations to a large set of data, but is usually not a good fit when you want to manipulate individual cells in python. Each cell would need to be converted to a python object each time its touched.
For your goals, I think the standard csv module is what you want
import csv
with open("secretCSV.csv", newline='') as f:
for username, password, sectoken in csv.reader(f):
# do all the things
Thank you everyone for your responses. I think I will first start with python learning and then get back to this. I should have learnt coding before coding. :)
Also, I was able to iterate (sorry, most of you said not to iterate the dataframe) and get the credentials from the file.
I actually have 21 salesforce orgs and am trying to get License information from each of them and email to certain people on a daily basis. I didn't want to expose salesforce credentials, hence, went with a flat file option.
I have build the code to get the salesforce license details and able to pull the same in the format I want for 1 client. However, I have to do this for 21 clients and thought of iterating the credentials so I can run the getLicense function on loop until all 21 client's data is fetched.
I will learn Python or at least learn a little bit more than what I know now and come back to this again. Until then, Informatica and batch script would have to do.
Thank you again to each one of you for your help!
Adil
--
I am trying to get all data from view(Lotus Notes) with lotusscript and Python(noteslib module) and export it to csv, but problem is that this takes too much time. I have tried two ways with loop through all documents:
import noteslib
db = noteslib.Database('database','file.nsf')
view = db.GetView('My View')
doc = view.GetFirstDocument()
data = list()
while doc:
data.append(doc.ColumnValues)
doc = view.GetNextDocument(doc)
To get about 1000 lines of data it took me 70 seconds, but view has about 85000 lines so get all data will be too much time, because manually when I use File->Export in Lotus Notes it is about 2 minutes to export all data to csv.
And I tried second way with AllEntries, but it was even slower:
database = []
ec = view.AllEntries
ent = ec.Getfirstentry()
while ent:
row = []
for v in ent.Columnvalues:
row.append(v)
database.append(row)
ent = ec.GetNextEntry(ent)
Everything that I found on the Internet is based on "NextDocument" or "AllEntries". Is there any way to do it faster?
It is (or at least used to be) very expensive from a time standpoint to open a Notes document, like you are doing in your code.
Since you are saying that you want to export the data that is being displayed in the view, you could use the NotesViewEntry class instead. It should be much faster.
Set col = view.AllEntries
Set entry = col.GetFirstEntry()
Do Until entry Is Nothing
values = entry.ColumnValues '*** Array of column values
'*** Do stuff here
Set entry = col.GetNextEntry(entry)
Loop
I wrote a blog about this back in 2013:
http://blog.texasswede.com/which-is-faster-columnvalues-or-getitemvalue/
Something is going on with your code "outside" the view navigation: You already chose the most performant way to navigate a view using "GetFirstDocument" and "GetNextDocument". Using the NotesViewNavigator as mentioned in the comments will be slightly better, but not significant.
You might get a little bit of performance out of your code by setting view.AutoUpdate = False to prohibit the view object to refresh when something in the backend changes. But as you only read data and not change view data that will not give you much of a performance boost.
My suggestion: Identify the REAL bottleneck of your code by commenting out single sections to find out when it starts to get slower:
First attempt:
while doc:
doc = view.GetNextDocument(doc)
Slow?
If not then next attempt:
while doc:
arr = doc.ColumnValues
doc = view.GetNextDocument(doc)
Slow?
If yes: ColumnValues is your enemy...
If not then next attempt:
while doc:
arr = doc.ColumnValues
data.append(arr)
doc = view.GetNextDocument(doc)
I would be very interested to get your results of where it starts to become slow.
I would suspect the performance issue is using COM/ActiveX in Python to access Notes databases. Transferring data via COM involves datatype 'marshalling', possibly at every step, and especially for 'out-of-process' method/property calls.
I don't think there is any way around this in COM. You should consider arranging a Notes 'agent' to do this for you instead (LotusScript or Java maybe). Even a basic LotusScript agent can export 000's of docs per minute. A further alternative may be to look at the Notes C-API (not an easy option and requires API calls from Python).
I am setting up a weather camera which will provide a live stream of the current conditions outside, but I also would like to overlay continuously updated weather conditions (temperature, wind speed/direction, current weather) from a local National Weather Service weather station, from a browser API source provided in JSON format.
I have had success extracting the desired values from a different API source using a Python script I wrote; however long story short that API source is unreliable. Therefore I am using API from the official National Weather Service ASOS station at my nearby airport. The output from the new API source I am polling from is rather complicated, however, with various tiers of indentation. I have not worked with Python very long and tutorials and guides online have either been for other languages (Java or C++ mostly) or have not worked for my specific case.
First off, here is the structure of the JSON that I am receiving:
I underlined the values I am trying to extract. They are listed under the OBSERVATIONS section, associated with precip_accum_24_hour_value_1, wind_gust_value_1, wind_cardinal_direction_value_1d, and so on. The issue is there are two values underneath each observation so the script I have tried isn't returning the values I want. Here is the code I have tried:
import urllib.request
import json
f = urllib.request.urlopen('https://api.synopticdata.com/v2/stations/latest?token=8c96805fbf854373bc4b492bb3439a67&stid=KSTC&complete=1&units=english&output=json')
json_string = f.read()
parsed_json = json.loads(json_string)
for each in parsed_json['STATION']:
observations = each['OBSERVATIONS']
print(observations)
This prints out everything underneath the OBSERVATIONS in the JSON as expected, as one long string.
{'precip_accum_24_hour_value_1': {'date_time': '2018-12-06T11:53:00Z', 'value': 0.01}, 'wind_gust_value_1': {'date_time': '2018-12-12T01:35:00Z', 'value': 14.0},
to show a small snippet of the output I am receiving. I was hoping I could individually extract the values I want from this string, but everything I have attempted is not working. I would really appreciate some guidance for finishing this piece of code so I can return the values I am looking for. I realize it may be some kind of loop or special syntax.
Try something like this:
for each in parsed_json['STATION']:
observations = each['OBSERVATIONS']
for k, v in observations.items():
print(k, v["value"])
JSON maps well into python's dictionary and list types, so accessing substructures can be done with a[<index-or-key>] syntax. Iteration over key-value pairs of a dictionary can be done as I've shown above. If you're not familiar with dictionaries in python yet, I'd recommend reading about them. Searching online should yield a lot of good tutorials.
Does this help?
When you say the JSON is complicated, it really is just nested dictionaries within the main JSON response. You would access them in the same way as you would the initial JSON blob:
import urllib.request
import json
f = urllib.request.urlopen('https://api.synopticdata.com/v2/stations/latest?token=8c96805fbf854373bc4b492bb3439a67&stid=KSTC&complete=1&units=english&output=json')
json_string = f.read()
parsed_json = json.loads(json_string)
for each in parsed_json['STATION']:
for value in each:
print(value, each[value])
I'm using Instagram-API-python to create an application. I'm getting a JSON response with below value.
'device_timestamp': 607873890651
I tried to convert this value to readable using python.
import time
readable = time.ctime(607873890651)
print(readable)
It gives following result and seems it is not correct.
Sun Oct 3 16:00:51 21232
I'm not much familiar with the Instagram-API-python. Please someone can help me to solve this problem.
The data is very likely to be incorrect.
Timestamp is a very standard way to store a date-time. Counting the seconds that passed since January 1st, 1970, also known as the UNIX Epoch.
I looked for "Instagram 'device_timestamp'" on Google and all the user-provided values made sense, but yours doesn't.
This is probably an error from the database, it happens.
Use the mentioned ctime conversion, but take the 'taken_at' field if available.
Don't use device_timestamp but use taken_at field. Then taken_at need multiply to 1000.
In Java it looks like this
Date data = new Date(taken_at * 1000);