I'm connected to my APIs client, sent the credentials, I made the request, I asked the API for data and put it to a DF.
Then, I have to upload this data to a sheet, so then this sheet is gonna be connected to PowerBI as a datasource in order to develop a dashboard and monitor some KPIs and so on..
Simple and common ETL process. BUT: to be honest, I'm a rookie and I'm doing my best.
Above here is just code to connect to the API, here is where the "extraction" begins
if response_page.status_code == 200:
if page == 1 :
df = pd.DataFrame(json.loads(response_page.content)["list"])
else :
df2 = pd.DataFrame(json.loads(response_page.content)["list"])
df = df.append(df2)
Then I just pick up but I need:
columnas = ['orderId','totalValue','paymentNames']
df2 = df[columnas]
df2
This is what the DF looks like:
example: this df is which I need to append the new data
Then I've just connected to Sheets here, send the credentials, open the sheet("carrefourMetodosDePago") and the page("transacciones")
sa = gspread.service_account(filename="service_account.json")
sh = sa.open("carrefourMetodosDePago")
wks = sh.worksheet("transacciones")
The magic begins:
wks.update([df2.columns.values.tolist()] + df2.values.tolist())
With this sentence I upload what the picture shows, to the sheet!
I need the new data that generates the API to be appended/merged/concatenated to the current data so the code upload the current data PLUS the new everytime I run it and so forth.
How can I do that? Should I use a for loop and iterate over every new data en append it to the sheet?
This is the best I could have done, I think I reached my turning point here...
If I explained myself wrong just let me know.
If you reach up to here just let me thank you to give me some time :)
Related
I'm a novice when it comes to Python and in order to learn it, I was working on a side project. My goal is to track card prices of my YGO cards using the yu-gi-oh prices API https://yugiohprices.docs.apiary.io/#
I am attempting to manually enter the print tag for each card and then have the API pull the data and populate the spreadsheet, such as the name of the card and its trait, in addition to the price data. So anytime I run the code, it is updated.
My idea was to use a for loop to get the API to search up each print tag and store the information in an empty dictionary and then post the results onto the excel file. I added an example of the spreadsheet.
Please let me know if I can clarify further. Any suggestions to the code that would help me achieve the goal for this project would be appreciated. Thanks in advance
import requests
import response as rsp
import urllib3
import urlopen
import json
import pandas as pd
df = pd.read_excel("api_ygo.xlsx")
print(df[:5]) # See the first 5 columns
response = requests.get('http://yugiohprices.com/api/price_for_print_tag/print_tag')
print(response.json())
data = []
for i in df:
print_tag = i[2]
request = requests.get('http://yugiohprices.com/api/price_for_print_tag/print_tag' + print_tag)
data.append(print_tag)
print(data)
def jprint(obj):
text = json.dumps(obj, sort_keys=True, indent=4)
print(text)
jprint(response.json())
Example Spreadsheet
Iterating over a pandas dataframe can be done using df.apply(). This has the added advantage that you can store the results directly in your dataframe.
First define a function that returns the desired result. Then apply the relevant column to that function while assigning the output to a new column:
import requests
import pandas as pd
import time
df = pd.DataFrame(['EP1-EN002', 'LED6-EN007', 'DRL2-EN041'], columns=['print_tag']) #just dummy data, in your case this is pd.read_excel
def get_tag(print_tag):
request = requests.get('http://yugiohprices.com/api/price_for_print_tag/' + print_tag) #this url works, the one in your code wasn't correct
time.sleep(1) #sleep for a second to prevent sending too many API calls per minute
return request.json()
df['result'] = df['print_tag'].apply(get_tag)
You can now export this column to a list of dictionaries with df['result'].tolist(). Or even better, you can flatten the results into a new dataframe with pd.json_normalize:
df2 = pd.json_normalize(df['result'])
df2.to_excel('output.xlsx') # save dataframe as new excel file
I'm writing data in a Google Sheet using this function :
def Export_Data_To_Sheets(df):
response_date = service.spreadsheets.values().update(
spreadsheetId=SAMPLE_SPREADSHEET_ID_input,
valueInputOption='RAW',
range=SAMPLE_RANGE_NAME,
body=dict(
majorDimension='ROWS',
values=df.T.reset_index().T.values.tolist()[1:])
).execute()
print('Sheet successfully Updated')
It works well, but I have two tabs in my Google Sheet and I would like to choose in which one I want to write data. I don't know how can I do this.
In this point in the code:
range=SAMPLE_RANGE_NAME
You can replace this value with a sheet and cell reference, something like:
range="Sheet1!A1:D5"
Reference
Writing a Single Range
My work here is to get data from google sheets and put those values in the webpage text box - using python
In my google sheet, I have 450 rows which are comma-separated values.
I need put all the 450 rows data into the webpage text box using selenium send.key().
##getting data from google sheets.
scope = ['https://www.googleapis.com/auth/drive']
credentials = ServiceAccountCredentials.from_json_keyfile_name('json',scope)
client = gspread.authorize(credentials)
workbook = client.open_by_url("https://docs.googlesheet") # using google sheet here
sheet1 = workbook.worksheet("Sheet1")
##converted sheet1 data into a dataframe called dera.
dera = gd.get_as_dataframe(sheet1, evaluate_formulas=True, skiprows=0, has_header=True)
##from dera dataframe reading 'names' column and removing null values.
del = dera[['names']].dropna()
##Converted my dataframe into list- I have read it will be easy to put list(z) values in send keys
z = del['names'].values.tolist()
Selenium code:
driver = webdriver.Chrome(executable_path="/Users/naveenbabudadla/Documents/automation/chromedriver")
driver.get("https://google.com/") # using google.com as example
driver.find_element_by_xpath("//div[text()='Maximum 5000 names'] /..//textarea").send_keys(z) ## got stuck here.
time.sleep(2)
not able to define "z" to selenium send keys correctly.
Can someone help me with this?
So let's forget about whole code before z = del['names'].values.tolist() and assume it works, you could change del name to some other name as del is a build in symbol in python. But if it works then it wokrs.
So it seems that z is some list, let's say it's a list of strings then it would look like ['value1', 'value2']. If you want to send it to your browser as is you should change it to string with str(z). Perhaps you want to send some value of your list then you chould send z[0]. But still all of these are my quesses, you should provide exact stacktrace of your errror.
So I have this chunk script
def alltime_tracker(sheet_url,sheet_name,concatfile):
temp2 = pd.read_csv(sheet_url)
temp2 = temp2[(temp2['Time Complete'] != today)]
temp2 = pd.concat([temp2,concatfile])
temp2.to_csv(sheet_name,encoding='utf-8', index=False)
sheet1_url = 'https://docs.google.com...'
sheet1_name = 'sheet1.csv'
alltime_tracker(sheet1_url,sheet1_name,damagedclaim)
This intended output is that the tracker will be updated everyday. The sheet1_url is a blank spreadsheet with headers. While, concatfile is a different spreadsheet filled with headers and data from today. I've connected this script to google gdocs/google API so it automates/run the code everyday. However, when the script runs on the second day, it produced multiple headers (three headers in total and the script has been running for two days). Can anybody help me with this issue so that it only produces 1 header whenever the script runs?. Thank you.
I am a beginner in python. I have written few DBQ statements in excel to fetch
result in excel which should be refreshed whenever the excel is opened. Have given the correct setting in connection properties.
Below is my python code for refreshall:-
import win32com.client
import time
xl = win32com.client.DispatchEx("Excel.Application")
wb = xl.workbooks.open("D:\\Excel sheets\\Test_consolidation.xlsx")
xl.Visible = True
time.sleep(10)
wb.Refreshall()
I have 3 sheets in the excel file, which has 3 different connections. I want to refresh one after the other.
Can someone help me with the python code to refresh the connections individually ? I would be really grateful for your help.
So if you want to refresh all of them but one after the other, instead of wb.Refreshall(), the command would be:
for conn in wb.connections:
conn.Refresh()
If you want to link (in a dictionary for example) a connection to a sheet:
dict_conn_sheet = {} # create a new dict
for conn in wb.connections: # iterate over each connection in your excel file
name_conn = conn.Name # get the name of the connection
sheet_conn = conn.Ranges(1).Parent.Name # get the name of the sheet linked to this connection
# add a key (the name of the sheet) and the value (the name of the connection) into the dictionary
dict_conn_sheet[sheet_conn] = name_conn
Note: if one sheet has more than one connection, this is not a good way.
Then, if you want to update only one connection on a specific sheet (in my example it is called Sheet1):
sheet_name = 'Sheet1'
# refresh the connection linked to the sheet_name
# if existing in the dictionnary dict_conn_sheet
wb.connections(dict_conn_sheet[sheet_name]).Refresh()
Finally, if you know directly the name of the connection you want to update (let's say connection_Raj), just enter:
name_conn = 'connection_Raj'
wb.connections(name_conn).Refresh()
I hope it's clear even if it does not answer exactly to your question as I'm not sure I understood what you want to do.