Adding data frame to excel sheet - python

I am trying to write a dataframe to excel using panda.ExcelWriter after reading it from a huge csv file.
This code updates the excel sheet but it doesn't appends the data to the excel which I want
import pandas as pd
reader = pd.read_csv("H:/ram/temp/1.csv", delimiter = '\t' ,chunksize = 10000, names = ['neo_user_id',
'gender',
'age_range',
'main_geolocation', # (user identifier of the client)
'interest_category_1',
'interest_category_2',
'interest_category_3',
'first_day_identifier'
], encoding="utf-8")
ew = pd.ExcelWriter('H:/ram/Formatted/SynthExport.xlsx', engine='xlsxwriter', options={'encoding':'utf-8'})
for chunks in reader:
chunks.to_excel(ew, 'Sheet1' , encoding = 'utf-8')
print len(chunks)
ew.save()
I also tried to use data.append() and data.to_excel doing this result is memory error. Since I am reading data in chunks is there any way to write the data to excel
I got it working by this code
import pandas as pd
import xlsxwriter
reader = pd.read_csv("H:/ram/user_action_export.2014.01.csv", delimiter = '\t', chunksize = 1000, names = ['day_identifier',
'user_id',
'site_id',
'device', # (user identifier of the client)
'geolocation',
'referrer',
'pageviews',
], encoding="utf-8")
startrows = 0
ew = pd.ExcelWriter('H:/ram/Formatted/ActionExport.xlsx', engine='xlsxwriter', options={'encoding':'utf-8'})
for chunks in reader:
chunks.to_excel(ew, 'Sheet1' , encoding = 'utf-8', startrow = startrows)
startrows = startrows + len(chunks)
print startrows
ew.save()
But still take so much time

I don't know if it is causing the main issue but you shouldn't be calling save() between chunks since a single call to save() closes an xlsxwriter file.

Related

xlsx to txt column data formatting

I am trying to turn a series of Excel Sheets into .txt files. The data I'm working with has some specific formatting I want to keep (decimal places and scientific notation specifically), but I can't seem to get it to work. Am I missing something with .format? The code below works for the most part (except for the final 3 lines, the ones I'm working on).
import pandas as pd
file_names = ["Example.xlsx"]
for xl_file in file_names:
xl = pd.ExcelFile("Example.xlsx")
sheet_names = xl.sheet_names
for k in range(len(sheet_names)):
txt_name = xl_file.split(".")[0] + str(sheet_names[k])+".txt"
df = pd.read_excel("Example.xlsx", sheet_name = sheet_names[k])
with open(txt_name, 'w', encoding="utf-8") as outfile:
df.to_string(outfile, index=False)
col0 = [0]
df0 = pd.read_excel("Example.xlsx", usecols=col0)
"El": "{:<}".format(df0)'''

Pandas to_excel has no data

I have issue with my code, and not sure what goes wrong.
The background of my issue:
I use pandas to query the data from the web for share price (multiple stocks).
Then, export the data into existing excel file.
The data frame indeed has data.
But, the file has no data after completion (I use both ExcelWriter and itertuples, but not successful).
Please help, much appreciated.Please see code below:
wb = op.load_workbook(file_location)
full_path = os.path.abspath(file_location)
for stock in stocklist:
if stock in avail_sheets:
#Delete existing tabs for having fresh start.
wb.remove(wb[stock])
wb.create_sheet(stock)
symbol = stock+".AX" #to specify ASX stock
url = get_url(symbol, start_date, end_date)
stock_data = pd.read_csv(url)
writer = pd.ExcelWriter(full_path)
stock_data.to_excel(writer, sheet_name =stock ,index = False, header = True)
writer.save()
# current_sheet = wb[stock]
# for row in stock_data.itertuples(index=False):
# current_sheet.append(row)
wb.save(file_location)
as per pandas documentation pandas documentation
you should use context manager when using ExcelWriter object specially if you want to save to multiple sheets and you have to specify the mode for writing the file :
'a' = append.
'r' = read.
'w' = write.
,if only one sheet just pass the output.xlsx file to the .to_excel() method and specify the sheet name.
`
# for single sheet
stock_data.to_excel('output.xlsx', sheet_name=stock, index=False, header=True)
# for multiple sheets or even single sheet
with pd.ExcelWriter('output.xlsx', mode='a') as writer:
stock_data.to_excel(writer, sheet_name=stock, index=False, header=True)
`

How to write on existing excel files without losing previous information using python?

I need to write a program to scrap daily quote from a certain web page and collect them into a single excel file. I wrote something which finds next empty row and starts writing new quotes on it but deletes previous rows too:
wb = openpyxl.load_workbook('gold_quote.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
.
.
.
z = 1
x = sheet['A{}'.format(z)].value
while x != None:
x = sheet['A{}'.format(z)].value
z += 1
writer = pd.ExcelWriter('quote.xlsx')
df.to_excel(writer, sheet_name='Sheet1',na_rep='', float_format=None,columns=['Date', 'Time', 'Price'], header=True,index=False, index_label=None, startrow=z-1, startcol=0, engine=None,merge_cells=True, encoding=None, inf_rep='inf', verbose=True, freeze_panes=None)
writer.save()
Question: How to write on existing excel files without losing previous information
openpyxl uses append to write after last used Row:
wb = openpyxl.load_workbook('gold_quote.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
rowData = ['2017-08-01', '16:31', 1.23]
sheet.append(rowData)
wb.save('gold_quote.xlsx')
writer.book = wb
writer.sheets = dict((ws.title, ws) for ws in wb.worksheets)
I figured it out, first we should define a reader to read existing data of excel file then concatenate recently extracted data from web with a defined writer, and we should drop duplicates otherwise any time the program is executed there will be many duplicated data. Then we can write previous and new data altogether:
excel_reader = pd.ExcelFile('gold_quote.xlsx')
to_update = {"Sheet1": df}
excel_writer = pd.ExcelWriter('gold_quote.xlsx')
for sheet in excel_reader.sheet_names:
sheet_df = excel_reader.parse(sheet)
append_df = to_update.get(sheet)
if append_df is not None:
sheet_df = pd.concat([sheet_df, df]).drop_duplicates()
sheet_df.to_excel(excel_writer, sheet, index=False)
excel_writer.save()

stripping data from two XLXS cells to csv

im having an issue where im trying to take data from two cells in an excel spread sheet and put them into a csv file. the data is lat and lon coordinates so they have to be side by side to be read by the program. here is what i have:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import xlwt
import xlrd
import csv
import os, openpyxl, glob
from openpyxl import Workbook
with open ('test.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',')
for file in glob.glob ("/test"):
wb = openpyxl.load_workbook('test-data.xlsx')
ws = wb.active
def lat():
for row in ws.iter_rows('Q2:Q65536'):
for cell in row:
lat = cell.value
return lat
def lon():
for row in ws.iter_rows('R2:R65536'):
for cell in row:
lon = cell.value
return lon
cord=lat()+","+lon()
print (lat()+","+lon()) #just to see if its working
#spamwriter.writerow([cord]) uncomment to write to file
however it only gives me the first row of data not the rest of the rows (test-data has around 1500 rows). how would i make it to finish going through the file?
This may not be the most dynamic way, but I would use pandas for this task. It has built in pd.read_excel() and pd.to_csv() functions.
import pandas as pd
import string
latColumn = string.lowercase.index('q') # determine index that corresponds to Excel Column letter (user lower case)
longColumn = string.lowercase.index('r') # Does not work for AA, BB, ...
data = pd.read_excel('test-data.xlsx', 'Sheet1', parse_cols=[latColumn,longColumn])
# Total number of rows being read in 65536 - 2 = 65534
csvOut = "foo.csv"
data[:65534].to_csv(csvOut, index=False, header=False)
If you need append to the file and not replace it, change the data[:65534].to_csv(....) to
open(csvOut, 'a') as f: #append to the .csv file of your likings
data[:65534].to_csv(f, index=False, header=False)

csv Writer using Datafields returned by Pandas

Hello I'm working on a project that reads an excel worksheet, collects columns of data based on header title, and then writes that data to a much leaner csv file which I'll be using for more fun later.
I'm getting a syntax error while trying to write my new csv file, I think it has something to do with the datafields I'm using to get my columns in pandas.
I'm new to Python so any help you can provide would be great, thanks!
import pandas
import xlrd
import csv
def csv_from_excel():
wb = xlrd.open_workbook("C:\\Python27\\Work\\spreadsheet.xlsx")
sh = wb.sheet_by_name('Sheet1')
spoofingFile = open('spoofing.csv', 'wb')
wr = csv.writer(spoofingFile, quoting=csv.QUOTE_ALL)
for rownum in xrange(sh.nrows):
wr.writerow(sh.row_values(rownum))
spoofingFile.close()
csv_from_excel()
df = pandas.read_csv('C:\\Python27\\Work\\spoofing.csv')
time = df["InviteTime (Oracle)"]
orignum = df["Orig Number"]
origip = df["Orig IP Address"]
destnum = df["Dest Number"]
sheet0bj = csv.writer(open("complete.csv", "wb")
sheet0bj.writerow([time,orignum,origip,destnum])
The syntax error is thus:
file c:\python27\work\formatsheettest.py, line36
sheet0bj.writerow([time, orignum, origip, destnum])
^
Syntax error: Invalid syntax
You're missing a closing paren on the second to last line.
sheet0bj = csv.writer(open("complete.csv", "wb")
should be
sheet0bj = csv.writer(open("complete.csv", "wb"))
I assume you've figured that out by now, though.

Categories