Python - I want to increase the Row-Index automatically - python

I am absolutely new to Python or coding for that matter, hence, any help would be greatly appreciated. I have around 21 Salesforce orgs and am trying to get some information from each of the org into one place to send out in an email.
import pandas as pd
df = pd.read_csv("secretCSV.csv", usecols = ['client','uname','passw','stoken'])
username = df.loc[[1],'uname'].values[0]
password = df.loc[[1],'passw'].values[0]
sectocken = df.loc[[1],'stoken'].values[0]
I have saved all my username, password, security tokens in secretCSV.csv file and with the above code I can get the data for 1 row as the index value I have given is 0. I would like to know how can I loop through this and after each loop, how to increase the index value until all rows from the CSV file is read.
Thank you in advance for any assistance you all can offer.
Adil
--

You can iterate on the dataframe but it's highly not recommend (not efficient, looks bad, too much code etc)
df = pd.read_csv("secretCSV.csv", usecols = ['client','uname','passw','stoken'])
so DO NOT DO THIS EVEN IF IT WORKS:
for i in range (0, df.shape[0]):
username = df.loc[[i],'uname'].values[0]
password = df.loc[[i],'passw'].values[0]
sectocken = df.loc[[i],'stoken'].values[0]
Instead, do this:
sec_list = [(u,p,s) for _,u,p,s in df.values]
now you have a sec_list with tuples (username, password, sectocken)
access example: sec_list[0][1] - as in row=0 and get the password (located at [1]).

Pandas is great when you want to apply operations to a large set of data, but is usually not a good fit when you want to manipulate individual cells in python. Each cell would need to be converted to a python object each time its touched.
For your goals, I think the standard csv module is what you want
import csv
with open("secretCSV.csv", newline='') as f:
for username, password, sectoken in csv.reader(f):
# do all the things

Thank you everyone for your responses. I think I will first start with python learning and then get back to this. I should have learnt coding before coding. :)
Also, I was able to iterate (sorry, most of you said not to iterate the dataframe) and get the credentials from the file.
I actually have 21 salesforce orgs and am trying to get License information from each of them and email to certain people on a daily basis. I didn't want to expose salesforce credentials, hence, went with a flat file option.
I have build the code to get the salesforce license details and able to pull the same in the format I want for 1 client. However, I have to do this for 21 clients and thought of iterating the credentials so I can run the getLicense function on loop until all 21 client's data is fetched.
I will learn Python or at least learn a little bit more than what I know now and come back to this again. Until then, Informatica and batch script would have to do.
Thank you again to each one of you for your help!
Adil
--

Related

Write custom timestamps to InfluxDB with Python

I'm currently struggling with a basic function of the influx_client in Python. I have a set of time series data which I want to add into an influxdb on a different client. My current code looks kinda like this:
client = InfluxDBClient(url=f"http://{ip}:{port_db}", token=token, org=org)
write_api = client.write_api(write_options=ASYNCHRONOUS)
p = Point("title_meas").field("column_data", value_data)
write_api.write(bucket=bucket, org=org, record=p)
Now I got a specific timestamp for each point I want to use as the InfluxDB keys/timestamps but whatever I try - it keeps on adding the system time of my host device (But as I'm working with historical data I need to adjust the timespecs). How I can achieve my custom timestamps or is there a easier way instead of using the Point method adding my data line by line... something like a Pandas dataframe maybe?
Thankful for every advice.
You can write via line protocol, Point objects, Pandas Dataframe, or json Dictionary. All are viable methods.
If you care about throughput line protocol is the fastest, but if tiny speed differences are not important just use whatever you want. I highly recommend reading this. The "tag" you are looking to modify on the influx datapoint is called "_time".
To add it to a Point do:
p = Point("title_meas").field("column_data", value_data).time('2021-08-09T18:04:56.865943'))
or json dictionary protocol:
p = {'measurement':'title_meas', 'time': '2021-08-09T18:04:56.865943',
'tags':{'sometag': 'sometag'},
'fields':{'column_data': value_data}
}
Easiest way to ensure timestamps are what you expect is to use UTC/ISO format.

Store list elements in single variable for query

I am currently facing a probably very simple problem and think too complicated to solve.
I got a excel-file with city names and postal codes.
I read the file and export the postal codes (PLZ) with
zipfile = pd.read_excel("file.xlsx")
zipcode = pd.DataFrame(data, columns=['PLZ']).values
Output is: [80331][80333] ....
Each ZIP code is later used to conduct a query on a website.
For that I use bs4 and request and the follwing line of code (is not the complete code, just the relevant line):
data = {'tx_ybpn_storefinder[searchReq][term]': zip}
The process is:
Enter the ZIP code from the list (in "zip")
Query on the website
Save the results (data) of the website-query
Query with the next ZIP code
Save data from query
Repeat for every zip code in the list
I think I have to work here with a for/while-loop-combination, but actually I dont know how. Is it necessary to store each zip code in a unique variable?
Thanks in advance!
I think I have to work here with a for/while-loop-combination
Right. Loop over the values in the PLZ column:
zipcode = pd.read_excel("file.xlsx")
for zip in zipcode['PLZ']:
data = {'tx_ybpn_storefinder[searchReq][term]': zip}
# query the website, etc.

Entrez eFetch Accession Number

We are currently working on a project where we need to access the 'NP_' accession number from ClinVar. However, when we use the Entrez.eFetch( ) function, this information appears to be missing in the result. Here is a link to the website page where the NP_ number is listed:
https://www.ncbi.nlm.nih.gov/clinvar/variation/558834/
And here is the Python sample script code that fetches the XML result:
handle = Entrez.efetch(db="clinvar", id=558834, rettype='variation', retmode="text")
print(handle.read())
Interestingly enough, this used to return the NP number in the results, however, it seems to the website formatting/style changed from when we last developed our Python script and we cannot seem to figure out how to retrieve the NP number now.
Any help would be greatly appreciated! Thank you for your time and input!
You need to format it like a new query not an old one:
handle = Entrez.efetch(db="clinvar", id=558834, rettype='vcv', is_varationid="true", from_esearch="true")
print(handle.read())
See also: https://www.ncbi.nlm.nih.gov/clinvar/docs/maintenance_use/

Efficient way to get data from lotus notes view

I am trying to get all data from view(Lotus Notes) with lotusscript and Python(noteslib module) and export it to csv, but problem is that this takes too much time. I have tried two ways with loop through all documents:
import noteslib
db = noteslib.Database('database','file.nsf')
view = db.GetView('My View')
doc = view.GetFirstDocument()
data = list()
while doc:
data.append(doc.ColumnValues)
doc = view.GetNextDocument(doc)
To get about 1000 lines of data it took me 70 seconds, but view has about 85000 lines so get all data will be too much time, because manually when I use File->Export in Lotus Notes it is about 2 minutes to export all data to csv.
And I tried second way with AllEntries, but it was even slower:
database = []
ec = view.AllEntries
ent = ec.Getfirstentry()
while ent:
row = []
for v in ent.Columnvalues:
row.append(v)
database.append(row)
ent = ec.GetNextEntry(ent)
Everything that I found on the Internet is based on "NextDocument" or "AllEntries". Is there any way to do it faster?
It is (or at least used to be) very expensive from a time standpoint to open a Notes document, like you are doing in your code.
Since you are saying that you want to export the data that is being displayed in the view, you could use the NotesViewEntry class instead. It should be much faster.
Set col = view.AllEntries
Set entry = col.GetFirstEntry()
Do Until entry Is Nothing
values = entry.ColumnValues '*** Array of column values
'*** Do stuff here
Set entry = col.GetNextEntry(entry)
Loop
I wrote a blog about this back in 2013:
http://blog.texasswede.com/which-is-faster-columnvalues-or-getitemvalue/
Something is going on with your code "outside" the view navigation: You already chose the most performant way to navigate a view using "GetFirstDocument" and "GetNextDocument". Using the NotesViewNavigator as mentioned in the comments will be slightly better, but not significant.
You might get a little bit of performance out of your code by setting view.AutoUpdate = False to prohibit the view object to refresh when something in the backend changes. But as you only read data and not change view data that will not give you much of a performance boost.
My suggestion: Identify the REAL bottleneck of your code by commenting out single sections to find out when it starts to get slower:
First attempt:
while doc:
doc = view.GetNextDocument(doc)
Slow?
If not then next attempt:
while doc:
arr = doc.ColumnValues
doc = view.GetNextDocument(doc)
Slow?
If yes: ColumnValues is your enemy...
If not then next attempt:
while doc:
arr = doc.ColumnValues
data.append(arr)
doc = view.GetNextDocument(doc)
I would be very interested to get your results of where it starts to become slow.
I would suspect the performance issue is using COM/ActiveX in Python to access Notes databases. Transferring data via COM involves datatype 'marshalling', possibly at every step, and especially for 'out-of-process' method/property calls.
I don't think there is any way around this in COM. You should consider arranging a Notes 'agent' to do this for you instead (LotusScript or Java maybe). Even a basic LotusScript agent can export 000's of docs per minute. A further alternative may be to look at the Notes C-API (not an easy option and requires API calls from Python).

Creating a simple webpage with Python, where template content is populated from a database (or a pandas dataframe) based on query

I use python mainly for data analysis, so I'm pretty used to pandas. But apart from basic HTML, I've little experience with web development.
For work I want to make a very simple webpage that, based on the address/query, populates a template page with info from an SQL database (even if it has to be in a dataframe or CSV first that's fine for now). I've done searches but I just don't know the keywords to ask (hence sorry if this a duplicate or the title isn't as clear as it could be).
What I'm imagining (most simple example, excuse my lack of knowledge here!). Example dataframe:
import pandas as pd
df = pd.DataFrame(index=[1,2,3], columns=["Header","Body"], data=[["a","b"],["c","d"],["e","f"]])
Out[1]:
Header Body
1 a b
2 c d
3 e f
User puts in page, referencing the index 2:
"example.com/database.html?id=2" # Or whatever the syntax is.
Output-page: (Since id=2, takes data row data from index = 2, so "c" and "d")
<html><body>
Header<br>
c<p>
Body<br>
d<p>
</body></html>
It should be pretty simple right? But where do I start? Which Python library? I hear about Django and Flask, but are they overkill for this? Is there an example I could follow? And lastly, how does the syntax work for the webpage address?
Cheers!
PS: I realise I should probably just query the SQL database directly and cut out the pandas middle-man, just I'm more familiar with pandas hence the example above.
Edit: I a word.
You can start with flask, It is easy to setup and lots of good resources online,
Start with this minimal web app http://flask.pocoo.org/docs/1.0/quickstart/
Example snippet
#app.route('/database')
def database():
id = request.args.get('id') #if key doesn't exist, returns None
df = pd.DataFrame(index=[1,2,3], columns=["Header","Body"], data=[["a","b"],["c","d"],["e","f"]])
header = df[id].get("Header")
body = df[id].get("Body")
return '''<html><body>Header<br>{}<p>Body<br>d<p></body></html>'''.format(header, body)
For more detailed webpage add a template.
Good luck

Categories