How to convert the output I get from a pretty table to pandas dataframe and save it as an excel file.
My code which gets the pretty table output
from prettytable import PrettyTable
prtab = PrettyTable()
prtab.field_names = ['Item_1', 'Item_2']
for item in Items_2:
prtab.add_row([item, difflib.get_close_matches(item, Items_1)])
print(prtab)
I'm trying to convert this to a pandas dataframe however I get an error saying DataFrame constructor not properly called! My code to convert this is shown below
AA = pd.DataFrame(prtab, columns = ['Item_1', 'Item_2']).reset_index()
I found this method recently.
pretty_table.get_csv_string()
this will convert it to a csv string where you could write to a csv file.
I use it like this:
tbl_as_csv = pretty_table.get_csv_string().replace('\r','')
text_file = open("output_path.csv", "w")
n = text_file.write(tbl_as_csv)
text_file.close()
Load the data into a DataFrame first, then export to PrettyTable and Excel:
import io
import difflib
import pandas as pd
import prettytable as pt
data = []
for item in Items_2:
data.append([item, difflib.get_close_matches(item, Items_1)])
df = pd.DataFrame(data, columns=['Item_1', 'Item_2'])
# Export to prettytable
# https://stackoverflow.com/a/18528589/190597 (Ofer)
# Use io.StringIO with Python3, use io.BytesIO with Python2
output = io.StringIO()
df.to_csv(output)
output.seek(0)
print(pt.from_csv(output))
# Export to Excel file
filename = '/tmp/output.xlsx'
writer = pd.ExcelWriter(filename)
df.to_excel(writer,'Sheet1')
Related
I have been working on automating a series of reports in python. I have been trying to create a series of pivot tables from an imported csv (binlift.csv). I have found the Pandas library very useful for this however, I cant seem to find anything that helps me write the Panda created pivot tables to my excel document (Template.xlsx) and was wondering if anyone can help. So far I have the written the following code
import openpyxl
import csv
from datetime import datetime
import datetime
import pandas as pd
import numpy as np
file1 = "Template.xlsx" # template file
file2 = "binlift.csv" # raw data csv
wb1 = openpyxl.load_workbook(file1) # opens template
ws1 = wb1.create_sheet("Raw Data") # create a new sheet in template called Raw Data
summary = wb1.worksheets[0] # variables given to sheets for manipulation
rawdata = wb1.worksheets[1]
headings = ["READER","BEATID","LIFTYEAR","LIFTMONTH","LIFTWEEK","LIFTDAY","TAGGED","UNTAGGEDLIFT","LIFT"]
df = pd.read_csv(file2, names=headings)
pivot_1 = pd.pivot_table(df, index=["LIFTYEAR", "LIFTMONTH","LIFTWEEK"], values=["TAGGED","UNTAGGEDLIFT","LIFT"],aggfunc=np.sum)
pivot_2 = pd.pivot_table(df, index=["LIFTYEAR", "LIFTMONTH"], values=["TAGGED","UNTAGGEDLIFT"],aggfunc=np.sum)
pivot_3 = pd.pivot_table(df, index=["READER"], values=["TAGGED","UNTAGGEDLIFT","LIFT"],aggfunc=np.sum)
print(pivot_1)
print(pivot_2)
print(pivot_3)
wb1.save('test.xlsx')enter code here
There is an option in pandas to write the 'xlsx' files.
Here basically we get all the indices (at level 0) of the pivot table, and then one by one we go over these indices to subset the table and write that part of the table.
writer = pd.ExcelWriter('output.xlsx')
for manager in pivot_1.index.get_level_values(0).unique():
temp_df = pivot_1.xs(manager, level=0)
temp_df.to_excel(writer, manager)
writer.save()
I want to know if is there a simple way to get a dataframe from a xlsm file, I tried just pandas with pd.Excelfile, but it doesn't read the data correctly
so... for now I have this:
import xlrd
import pandas as pd
cartera_improd = xlrd.open_workbook("CARTERA IMPRODUCTIVA - FORMATOV1.xlsm")
base_ici = cartera_improd.sheet_by_name("BASE ICI")
print (base_ici.row_values(1))
print (base_ici.nrows)
data_ici = list()
for i in range(base_ici.nrows):
data_ici.append(base_ici.row_values(i))
data_ici = pd.DataFrame(data_ici)
To read a xlsm file you just have to use :
import pandas as pd
df=pd.read_excel('CARTERA IMPRODUCTIVA - FORMATOV1.xlsm')
print(df.head())
I am trying to load data from the web source and save it as a Excel file but not sure how to do it. What should I do?
import requests
import pandas as pd
import xmltodict
url = "https://www.kstan.ua/sitemap.xml"
res = requests.get(url)
raw = xmltodict.parse(res.text)
data = [[r["loc"], r["lastmod"]] for r in raw["urlset"]["url"]]
print("Number of sitemaps:", len(data))
df = pd.DataFrame(data, columns=["links", "lastmod"])
df.to_csv("output.csv", index=False)
OR
df.to_excel("output.xlsx")
You can write the dataframe to excel using the pandas ExcelWriter, such as this:
import pandas as pd
with pd.ExcelWriter('path_to_file.xlsx') as writer:
dataframe.to_excel(writer)
If you want to create multiple sheets in the same file
with pd.ExcelWriter('csv_s/results.xlsx') as writer:
same_res.to_excel(writer, sheet_name='same')
diff_res.to_excel(writer, sheet_name='sheet2')
I have a 1million line CSV file. I want to do call a lookup function on each row's 1'st column, and append its result as a new column in the same CSV (if possible).
What I want is this is something like this:
for each row in dataframe
string=row[1]
result=lookupFunction(string)
row.append[string]
I Know i could do it using python's CSV library by opening my CSV, read each row, do my operation, write results to a new CSV.
This is my code using Python's CSV library
with open(rawfile, 'r') as f:
with open(newFile, 'a') as csvfile:
csvwritter = csv.writer(csvfile, delimiter=' ')
for line in f:
#do operation
However I really want to do it with Pandas because it would be something new to me.
This is what my data looks like
77,#oshkosh # tannersville pa,,PA,US
82,#osithesakcom ca,,CA,US
88,#osp open records or,,OR,US
89,#ospbco tel ord in,,IN,US
98,#ospwmnwithn return in,,IN,US
99,#ospwmnwithn tel ord in,,IN,US
100,#osram sylvania inc ma,,MA,US
106,#osteria giotto montclair nj,,NJ,US
Any help and guidance will be appreciated it. THanks
here is a simple example of adding 2 columns to a new column from you csv file
import pandas as pd
df = pd.read_csv("yourpath/yourfile.csv")
df['newcol'] = df['col1'] + df['col2']
create df and csv
import pandas as pd
df = pd.DataFrame(dict(A=[1, 2], B=[3, 4]))
df.to_csv('test_add_column.csv')
read csv into dfromcsv
dfromcsv = pd.read_csv('test_add_column.csv', index_col=0)
create new column
dfromcsv['C'] = df['A'] * df['B']
dfromcsv
write csv
dfromcsv.to_csv('test_add_column.csv')
read it again
dfromcsv2 = pd.read_csv('test_add_column.csv', index_col=0)
dfromcsv2
How do I prevent Python from automatically writing objects into csv as a different format than originally? For example, I have list object such as the following:
row = ['APR16', '100.00000']
I want to write this row as is, however when I use writerow function of csv writer, it writes into the csv file as 16-Apr and just 10. I want to keep the original formatting.
EDIT:
Here is the code:
import pandas as pd
dates = ['APR16', 'MAY16', 'JUN16']
numbers = [100.00000, 200.00000, 300.00000]
for i in range(3):
row = []
row.append(dates[i])
row.append(numbers[i])
prow = pd.DataFrame(row)
prow.to_csv('test.csv', index=False, header=False)
And result:
Using pandas:
import pandas as pd
dates = ['APR16', 'MAY16', 'JUN16']
numbers = [100.00000, 200.00000, 300.00000]
data = zip(dates,numbers)
fd = pd.DataFrame(data)
fd.to_csv('test.csv', index=False, header=False) # csv-file
fd.to_excel("test.xls", header=False,index=False) # or xls-file
Result in my terminal:
➜ ~ cat test.csv
APR16
100.00000
Result in LibreOffice: