Python - adding multiple tables into a single CSV with Panda - python

I'm wondering how to get parsed tables from panda into a single CSV, I have managed to get each table into a separate CSV for each one, but would like them all on one CSV. This is my current code to get multiple CSVs:
import pandas as pd
import csv
url = "https://fasttrack.grv.org.au/RaceField/ViewRaces/228697009?
raceId=318809897"
data = pd.read_html(url, attrs = {'class': 'ReportRaceDogFormDetails'} )
for i, datas in enumerate(data):
datas.to_csv("new{}.csv".format(i), header = False, index = False)

I think need concat only, because data is list of DataFrames:
df = pd.concat(data, ignore_index=True)
df.to_csv(file, header=False, index=False)

You have 2 options:
You can tell pandas to append data while writing to the CSV file.
data = pd.read_html(url, attrs = {'class': 'ReportRaceDogFormDetails'} )
for datas in data:
datas.to_csv("new.csv", header=False, index=False, mode='a')
Merge all the tables into one DataFrame and then write that into the CSV file.
data = pd.read_html(url, attrs = {'class': 'ReportRaceDogFormDetails'} )
df = pd.concat(data, ignore_index=True)
df.to_csv("new.csv", header=False, index=False)
Edit
To still separate the dataframes on the csv file, we shall have to stick with option #1 but with a few additions
data = pd.read_html(url, attrs = {'class': 'ReportRaceDogFormDetails'} )
with open('new.csv', 'a') as csv_stream:
for datas in data:
datas.to_csv(csv_stream, header=False, index=False)
csv_stream.write('\n')

all_dfs = []
for i, datas in enumerate(data):
all_dfs.append(datas.to_csv("new{}.csv".format(i), header = False, index = False))
result = pd.concat(all_dfs)

Related

Python Pandas Read from column A & B instead of column name

I'm relativley new to python
I have a excel file where i can read,Column A "url" and Column B "name".
In the future the columns will have no "column name" so i need it to read from Column A directly and column B and start iterating from cell 1.
I tried using index_col(0) but can't really seem to get the hang of it.
This is a simple download image script.
import requests
import pandas as pd
df = pd.read_excel(r'C:\Users\exdata1.xlsx')
for index, row in df.iterrows():
url = row['url']
file_name = url.split('/')
r = requests.get(url)
file_name=(row['name']+".jpeg")
if r.status_code == 200:
with open(file_name, "wb") as f:
f.write(r.content)
print (file_name)
I tried this below without any good result.
url = row['index_col(0)'] #0 for excel column "A"
file_name=(row['index_col(1)']+".jpeg") #1 for excel Column "B"
Apreciate any support!
You can set header=None as an argument of pandas.read_excel and give names to your columns.
Try this :
import requests
import pandas as pd
df = pd.read_excel(r'C:\Users\exdata1.xlsx', header=None, names=['url', 'name'])
for index, row in df.iterrows():
url = row['url']
file_name = url.split('/')
r = requests.get(url)
file_name=(row['name']+'.jpeg')
if r.status_code == 200:
with open(file_name, 'wb') as f:
f.write(r.content)
print(file_name)
If your files had no columns name pandas assign values to each column such as Unnamed: 0, you can check that py printing df.info or df.head()
you can assign columns names when reading from your file so you df always has columns name:
df.rename( columns={"Unnamed: 0" :'url', Unnamed: 0: 'name'}, inplace=True )
then you are good to go.

Append Values to CSV and retain the old data

I have to append the data in CSV, the problem I am facing is intead of appending I am overwriting the data, not able to retain the old data, example :
finalDf = pd.DataFrame(columns=['sourcez', 'tergetz', 'TMP'])
df = pd.DataFrame()
df["sourcez"] = ["str(source_Path)"]
df["tergetz"] = ["str(target_path)"]
df["TMP"] = ["total_matching_points"]
finalDf = finalDf.append(df)
finalDf.to_csv('Testing.csv', index=False)
Now if I add a new value
finalDf = pd.DataFrame(columns=['sourcez', 'tergetz', 'TMP'])
df = pd.DataFrame()
df["sourcez"] = ["str(source_Path)_New"]
df["tergetz"] = ["str(target_path)_New"]
df["TMP"] = ["total_matching_points_New"]
finalDf = finalDf.append(df)
finalDf.to_csv('Testing.csv', index=False)
It is keeping the latest data in csv instead I want both data to be updated in csv. any idea?
I have tried to create a new csv with pandas dataframe and I want to append the values instead overwriting
I have tried:
finalDf = pd.DataFrame(columns=['sourcez', 'tergetz', 'TMP'])
df = pd.DataFrame()
df["sourcez"] = ["str(source_Path)"]
df["tergetz"] = ["str(target_path)"]
df["TMP"] = ["total_matching_points"]
finalDf = finalDf.append(df)
finalDf.to_csv('Testing.csv', index=False, mode='a+')
But the problem is heading is repeating csv:
sourcez,tergetz,TMP
str(source_Path),str(target_path),total_matching_points
sourcez,tergetz,TMP
str(source_Path)_New,str(target_path)_New,total_matching_points_New
How to remove repeated headings sourcez,tergetz,TMP

How to copy info from one excel to a template excel?

I am working with Excel and I have to export some columns to another one but this second one is a template having some colors, the logo of a company and stuff.
Is there any way to preserve the look and functionality that template.xlsx has?
My code:
import pandas as pd
#variables for source file, worksheets, and empty dictionary for dataframes
spreadsheet_file = pd.ExcelFile('example.xlsx')
worksheets = spreadsheet_file.sheet_names
appended_data = {}
cat_dic = {"Part Number":"CÓDIGO", "QTY":"QT", "Description":"DESCRIÇÃO", "Material":"MATERIAL", "Company":"MARCA","Category":"OPERAÇÃO"}
d = {}
for sheet_name in worksheets:
df = pd.read_excel(spreadsheet_file, sheet_name)
#Getting only the columns asked: "Part Number","QTY","Description","Material","Company","Category"
df = df[["Part Number","QTY","Description","Material","Company","Category"]]
#Organizing info:
#1º By Category
#2º By Description
df = df.sort_values(['Category', 'Description'], ascending = [False, False])
appended_data = df.to_dict()
#Change Key names
d = dict((cat_dic[key], value) for (key, value) in appended_data.items())
#Exporting Data
df2 = pd.DataFrame(d)
df2.to_excel('template2.xlsx',sheet_name='Projeto',index=False)
Example:
Template:
My output:
Thanks in advance for any help.
You will need to use openpyxl if you want to only update the text and keep the format, color, etc. as-is in the template. Updated code below. Note that
I have not taken your df2 code as the template file already has the new headers. Only updating the data from each worksheet into the file
You can read each worksheet using read_excel, but writing will need to be using the openpyxl.load_workbook and finally saving the file once all worksheets are read
Open the template file shown in pic above using load_workbook before the FOR loop and save to a new file template2 after the FOR loop is complete
spreadsheet_file = pd.ExcelFile('example.xlsx')
worksheets = spreadsheet_file.sheet_names
#cat_dic = {"Part Number":"CÓDIGO", "QTY":"QT", "Description":"DESCRIÇÃO", "Material":"MATERIAL", "Company":"MARCA","Category":"OPERAÇÃO"}
#d = {}
import openpyxl
from openpyxl.utils.dataframe import dataframe_to_rows
wb=openpyxl.load_workbook('Template.xlsx') ##Your Template file
ws=wb['Sheet1']
rownumber=2 ##Skip 2 rows and start writing from row 3 - first two are headers in template file
for sheet_name in worksheets:
df = pd.read_excel(spreadsheet_file, sheet_name)
#Getting only the columns asked: "Part Number","QTY","Description","Material","Company","Category"
df = df[["Part Number","QTY","Description","Material","Company","Category"]]
#Organizing info:
#1º By Category
#2º By Description
df = df.sort_values(['Category', 'Description'], ascending = [False, False])
rows = dataframe_to_rows(df, index=False, header=False) ## Read all rows from df, but don't read index or header
for r_idx, row in enumerate(rows, 1):
for c_idx, value in enumerate(row, 1):
ws.cell(row=r_idx+rownumber, column=c_idx, value=value) Write to cell, but after rownumber + row index
rownumber += len(df) ##Move the rownumber to end, so next worksheet data comes after this sheet's data
wb.save('template2.xlsx')

To merge more than one list of table data and save as csv format using pandas

From the following code, when I iterate and print, then I get all table data, but but when I store as csv format using pandas, then I get only the first list of table data. How to store all of them into a single CSV file?
import requests
import pandas as pd
isins = ['LU0526609390:EUR','IE00BHBX0Z19:EUR']
for isin in isins:
html = requests.get(f'https://markets.ft.com/data/funds/tearsheet/historical?s={isin}').content
df_list = pd.read_html(html)
dfs = df_list
#print(dfs)
for df in dfs:
df.to_csv('data.csv', header=False, index=True)
#print(df)
The idea is to collect the data frames in dfs, loop over it and generate csv files.
import requests
import pandas as pd
isins=['LU0526609390:EUR','IE00BHBX0Z19:EUR']
dfs = []
for isin in isins:
html = requests.get(f'https://markets.ft.com/data/funds/tearsheet/historical?s={isin}').content
dfs.extend(pd.read_html(html))
df = pd.concat(dfs)
df.to_csv('data.csv' , header=False, index=True)
Loop was overwited file. This not saves it as one file bur each file for each iteration, but you get what was wrong
isins=['LU0526609390:EUR','IE00BHBX0Z19:EUR']
k = 0
for isin in isins:
html = requests.get(f'https://markets.ft.com/data/funds/tearsheet/historical?s={isin}').content
df_list = pd.read_html(html)
dfs = df_list
#print(dfs)
for df in dfs:
df.to_csv(str(k)+'data.csv' , header=False, index=True)
#print(df)
k = k+1
an easy answer would be to use pd.concat() to create a new df and save this. What do you want the csv to look like though, as the result of this concatenation would be.
["CSV"]: https://i.stack.imgur.com/BvZ1X.png
I don't know whether this is sufficient, as the data is not really labelled (might be problem, if you plan to search for more than two funds).
import requests
import pandas as pd
funds = ['LU0526609390:EUR', 'IE00BHBX0Z19:EUR']
for fund in funds:
html = requests.get(f'https://markets.ft.com/data/funds/tearsheet/historical?s={fund}').content
df_list = pd.read_html(html)
df_final = pd.concat(df_list)
# print(df_final)
df_final.to_csv('data.csv', header=False, index=True)
(I replaced isin with fund, as isin is already used in python.)

Pulling contents of div tags with beautifulsoup and creating a pandas dataframe

date = '2017-08-04'
writer = pd.ExcelWriter('MLB Daily Data.xlsx')
url_4 = 'http://www.baseballpress.com/lineups/'+date
resp_4 = requests.get(url_4)
soup_4 = BeautifulSoup(resp_4.text, "lxml")
lineups = soup_4.findAll('div', attrs = {'class': 'players'},limit=None)
row_lineup = 0
for lineup in lineups:
lineup1 = lineup.prettify()
lineup2 = lineup1.replace('>'&&'<',',')
df4 = pd.DataFrame(eval(lineup2))
df4.to_excel(writer, sheet_name='Starting Lineups', startrow=row_lineups, startcol=0)
row_lineups = row_lineups + len(df.index) + 3
writer.save()
I am trying to get the starting lineups from the webpage, convert it them into a pandas data frame, and then save it to an excel file. I'm having an issue with turning it into a data frame. I replaced the brackets with commas because I figured that would turn it into csv format.
This may get you moving in the right direction, where each line is a line up
data = [[x.text for x in y.findAll('a')] for y in lineups]
df = pd.DataFrame(data)

Categories