Delete an entire row in an excel file using Python - python

I am currently using work_sheet.delete_rows() using openpyxl module, but it only seems to delete the values in the row not the actual row itself..
How can I delete the entire row ??
This is the actual Excel Data I am currently accessing..
This is now the output after deleting the rows.. The process only deletes the values of each rows not the entire row itself...

I'm guessing there's something wrong with the way you're calling it. While you've stated what you're using, you haven't given us the actual code that you're using.
The actual source code for OpenPyXl shows that it does actually move the rows below the area being deleted up to the start of that area.
You can actually see this by using the following simple spreadsheet:
A B
--+-------+-------+
1 | 1 | a |
2 | 3 | b |
3 | 5 | c |
4 | 7 | d |
5 | 9 | e |
The following script shows the deletion of the second, third, and fourth row:
import openpyxl
def dumpWs(desc, ws):
print(f'{desc}: ', end = '')
for row in ws.values:
print(end = '<')
sep = ''
for value in row:
print(f'{sep}{value}', end = '')
sep = ', '
print(end = '> ')
print()
wb = openpyxl.load_workbook('version1.xlsx')
ws = wb.active
dumpWs('As loaded ', ws)
ws.delete_rows(2, 3)
dumpWs('After delete', ws)
wb.save('version2.xlsx')
Running this script results in:
As loaded : <1, a> <3, b> <5, c> <7, d> <9, e>
After delete: <1, a> <9, e>
The version saved with the wb.save line is as expected:
A B
--+-------+-------+
1 | 1 | a |
5 | 9 | e |
One thing you may want to check is the version of OpenPyXl that you're using, since there are several bug reports on various iterations of delete_rows. I'm using version 3.0.3 (with Python 3.6.8, for completeness).

Related

using data from one column as part of an update for another column using Pandas from a very large csv file

I have a very large CSV file, where I generate a URL by using data from one column [c] and update the corresponding column cell [f] with the new information. Although I program a lot in Python, I don't use Pandas that often, so I am unsure as to where to handle this problem.
F is the final output, so I am using the C column as an image name, the of the URL is the same.
| c | f |
| ------- | -------------------------- |
| 2134 | http://url.com/2134.jpg |
| 3e32 | http://url.com/3e32.jpg |
| jhknh | http://url.com/jhknh.jpg |
| 12.12.3 | http://url.com/12.12.3.jpg |
I have searched but I have not been able to find an implementable solution. I know, I probably would have to use chunksize for this, as there could be upward of 20000 records.
Any assistance with this would be greatly appreciated. I have looked and tried a few things but I am unable to come up with a solution.
Thank you in advance
~ E
Load your CSV file into a dataframe and update column 'f'
df = pd.read_csv('yourdatafile.csv')
df['f'] = 'http://url.com/' + df.c + '.jpg'
df
Output
c f
0 2134 http://url.com/2134.jpg
1 3e32 http://url.com/3e32.jpg
2 jhnkhk http://url.com/jhnkhk.jpg
3 12.12.1 http://url.com/12.12.1.jpg
If your records don't fit in memory you can chunk your data and append every chunk to a new file.
header = True
for chunk in pd.read_csv('yourdatafile.csv', chunksize=1000):
chunk['f'] = 'http://newurl.com/' + chunk.c + '.jpg'
chunk.to_csv('newdata.csv', mode='a+', index=False, header=header)
header = False

Pandas: Why are my headers being inserted into the first row of my dataframe?

I have a script that collates sets of tags from other dataframes, converts them into comma-separated string and adds all of this to a new dataframe. If I use pd.read_csv to generate the dataframe, the first entry is what I expect it to be. However, if I use the df_empty script (below), then I get a copy of the headers in that first row instead of the data I want. The only difference I have made is generating a new dataframe instead of loading one.
The resultData = pd.read_csv() reads a .csv file with the following headers and no additional information:
Sheet, Cause, Initiator, Group, Effects
The df_empty script is as follows:
def df_empty(columns, dtypes, index=None):
assert len(columns)==len(dtypes)
df = pd.DataFrame(index=index)
for c,d in zip(columns, dtypes):
df[c] = pd.Series(dtype=d)
return df
# https://stackoverflow.com/a/48374031
# Usage: df = df_empty(['a', 'b'], dtypes=[np.int64, np.int64])
My script contains the following line to create the dataframe:
resultData = df_empty(['Sheet','Cause','Initiator','Group','Effects'],[np.str,np.int64,np.str,np.str,np.str])
I've also used the following with no differences:
resultData = df_empty(['Sheet','Cause','Initiator','Group','Effects'],['object','int64','object','object','object'])
My script to collate the data and add it to my dataframe is as follows:
data = {'Sheet': sheetNum, 'Cause': causeNum, 'Initiator': initTag, 'Group': grp, 'Effects': effectStr}
count = len(resultData)
resultData.at[count,:] = data
When I run display(data), I get the following in Jupyter:
{'Sheet': '0001',
'Cause': 1,
'Initiator': 'Tag_I1',
'Group': 'DIG',
'Effects': 'Tag_O1, Tag_O2,...'}
What I want to see with both options / what I get when reading the csv:
+-------+-------+-----------+-------+--------------------+
| Sheet | Cause | Initiator | Group | Effects |
+-------+-------+-----------+-------+--------------------+
| 0001 | 1 | Tag_I1 | DIG | Tag_O1, Tag_O2,... |
| 0001 | 2 | Tag_I2 | DIG | Tag_O2, Tag_04,... |
+-------+-------+-----------+-------+--------------------+
What I see when generating a dataframe with df_empty:
+-------+-------+-----------+-------+--------------------+
| Sheet | Cause | Initiator | Group | Effects |
+-------+-------+-----------+-------+--------------------+
| Sheet | Cause | Initiator | Group | Effects |
| 0001 | 2 | Tag_I2 | DIG | Tag_O2, Tag_04,... |
+-------+-------+-----------+-------+--------------------+
Any ideas on what might be causing the generated dataframe to copy my headers into the first row and if it possible for me to not have to read an otherwise empty csv?
Thanks!
Why? Because you've inserted the first row as data. The magic behaviour of using the first row as header is in read_csv(), if you create your dataframe without using read_csv, the first row is not treated specially.
Solution? Skip the first row when inserting to the data frame generate by df_empty.

How to generate table using Python

I am quite struggling with as I tried many libraries to print table but no success - so I thought to post here and ask.
My data is in a text file (resource.txt) which looks like this (the exact same way it prints)
pipelined 8 8 0 17 0 0
nonpipelined 2 2 0 10 0 0
I want my data print in the following manner
Design name LUT Lut as m Lut as I FF DSP BRAM
-------------------------------------------------------------------
pipelined 8 8 0 17 0 0
Non piplined 2 2 0 10 0 0
Some time data may be more line column remain same but rows may increase.
(i have python 2.7 version)
I am using this part in my python code all code working but am couldn't able print data which i extracted to text file in tabular form. As I can't use panda library as it won't support for python 2.7, but I can use tabulate and all library. Can anyone please help me?
I tried using tabulate and all but I keep getting errors.
I tried at end simple method to print but its not working (same code works if I put at top of code but at the end of code this won't work). Does anyone have any idea?
q11=open( "resource.txt","r")
for line in q11:
print(line)
Here's a self contained function that makes a left-justified, technical paper styled table.
def makeTable(headerRow,columnizedData,columnSpacing=2):
"""Creates a technical paper style, left justified table
Author: Christopher Collett
Date: 6/1/2019"""
from numpy import array,max,vectorize
cols = array(columnizedData,dtype=str)
colSizes = [max(vectorize(len)(col)) for col in cols]
header = ''
rows = ['' for i in cols[0]]
for i in range(0,len(headerRow)):
if len(headerRow[i]) > colSizes[i]: colSizes[i]=len(headerRow[i])
headerRow[i]+=' '*(colSizes[i]-len(headerRow[i]))
header+=headerRow[i]
if not i == len(headerRow)-1: header+=' '*columnSpacing
for j in range(0,len(cols[i])):
if len(cols[i][j]) < colSizes[i]:
cols[i][j]+=' '*(colSizes[i]-len(cols[i][j])+columnSpacing)
rows[j]+=cols[i][j]
if not i == len(headerRow)-1: rows[j]+=' '*columnSpacing
line = '-'*len(header)
print(line)
print(header)
print(line)
for row in rows: print(row)
print(line)
And here's an example using this function.
>>> header = ['Name','Age']
>>> names = ['George','Alberta','Frank']
>>> ages = [8,9,11]
>>> makeTable(header,[names,ages])
------------
Name Age
------------
George 8
Alberta 9
Frank 11
------------
Since the number of columns remains the same, you could just print out the first line with ample spaces as required. Ex-
print("Design name",' ',"LUT",' ',"Lut as m",' ',"and continue
like that")
Then read the csv file. datafile will be
datafile = open('resource.csv','r')
reader = csv.reader(datafile)
for col in reader:
print(col[0],' ',col[1],' ',col[2],' ',"and
continue depending on the number of columns")
This is not he optimized solution but since it looks like you are new, therefore this will help you understand better. Or else you can use row_format print options in python 2.7.
Here is code to print table in nice table, you trasfer all your data to sets then you can data or else you can trasfer data in text file line to one set and print it
from beautifultable import BeautifulTable
h0=["jkgjkg"]
h1=[2,3]
h2=[2,3]
h3=[2,3]
h4=[2,3]
h5=[2,3]
h0.append("FPGA resources")
table = BeautifulTable()
table.column_headers = h0
table.append_row(h1)
table.append_row(h2)
table.append_row(h3)
table.append_row(h4)
table.append_row(h5)
print(table)
Out Put:
+--------+----------------+
| jkgjkg | FPGA resources |
+--------+----------------+
| 2 | 3 |
+--------+----------------+
| 2 | 3 |
+--------+----------------+
| 2 | 3 |
+--------+----------------+
| 2 | 3 |
+--------+----------------+
| 2 | 3 |
+--------+----------------+

Making Table without using Texttable

I am writing Python code to show items in a store .... as I am still learning I want to know how to make a table which looks exactly like a table made by using Texttable ....
My code is
Goods = ['Book','Gold']
Itemid= [711001,711002]
Price= [200,50000]
Count= [100,2]
Category= ['Books','Jewelry']
titles = ['', 'Item Id', 'Price', 'Count','Category']
data = [titles] + list(zip(Goods, Itemid, Price, Count, Category))
for i, d in enumerate(data):
line = '|'.join(str(x).ljust(12) for x in d)
print(line)
if i == 0:
print('=' * len(line))
My Output:
|Item Id |Price |Count |Category
================================================================
Book |711001 |200 |100 |Books
Gold |711002 |50000 |2 |Jewelry
Output I want:
+------+---------+-------+-------+-----------+
| | Item Id | Price | Count | Category |
+======+=========+=======+=======+===========+
| Book | 711001 | 200 | 100 | Books |
+------+---------+-------+-------+-----------+
| Gold | 711002 | 50000 | 2 | Jewelry |
+------+---------+-------+-------+-----------+
You code is building your output by hand, using string.join(). You can do it that way but it is very tedious. Use string formatting instead.
To help you along here is one line:
content_format = "| {Goods:4.4s} | {ItemId:<7d} | {Price:<5d} | {Count:<5d} | {Category:9s} |"
output_line = content_format.format(Goods="Book",ItemId=711001,Price=200,Count=100,Category="Books")
Texttable adjusts its cell widths to fit the data. If you want to do the same, then you will have to put computed field widths in content_format instead of using numeric literals the way I have done in the example above. Again, here is one example to get you going:
content_format = "| {Goods:4.4s} | {ItemId:<7d} | {Price:<5d} | {Count:<5d} | {Category:{CategoryWidth}s} |"
output_line = content_format.format(Goods="Book",ItemId=711001,Price=200,Count=100,Category="Books",CategoryWidth=9)
But if you already know how to do this using Texttable, why not use that? Your comment says it's not available in Python: not true, I just downloaded version 0.9.0 using pip.

Python Output Length

I'm attempting to output my database table data, which works aside from long table rows. The columns need to be as large as the longest database row. I'm having trouble implementing a calculation to correctly output the table proportionally instead of a huge mess when long rows are outputted (without using a third party library e.g. Print results in MySQL format with Python). Please let me know if you need more information.
Database connection:
connection = sqlite3.connect("test_.db")
c = connection.cursor()
c.execute("SELECT * FROM MyTable")
results = c.fetchall()
formatResults(results)
Table formatting:
def formatResults(x):
try:
widths = []
columns = []
tavnit = '|'
separator = '+'
for cd in c.description:
widths.append(max(cd[2], len(cd[0])))
columns.append(cd[0])
for w in widths:
tavnit += " %-"+"%ss |" % (w,)
separator += '-'*w + '--+'
print(separator)
print(tavnit % tuple(columns))
print(separator)
for row in x:
print(tavnit % row)
print(separator)
print ""
except:
showMainMenu()
pass
Output problem example:
+------+------+---------+
| Date | Name | LinkOrFile |
+------+------+---------+
| 03-17-2016 | hi.com | Locky |
| 03-18-2016 | thisisitsqq.com | None |
| 03-19-2016 | http://ohiyoungbuyff.com\69.exe?1 | None |
| 03-20-2016 | http://thisisitsqq..com\69.exe?1 | None |
| 03-21-2016 | %Temp%\zgHRNzy\69.exe | None |
| 03-22-2016 | | None |
| 03-23-2016 | E52219D0DA33FDD856B2433D79D71AD6 | Downloader |
| 03-24-2016 | microsoft.com | None |
| 03-25-2016 | 89.248.166.132 | None |
| 03-26-2016 | http://89.248.166.131/55KB5js9dwPtx4= | None |
If your main problem is making column widths consistent across all the lines, this python package could do the job: https://pypi.python.org/pypi/tabulate
Below you find a very simple example of a possible formatting approach.
The key point is to find the largest length of each column and then use format method of the string object:
#!/usr/bin/python
import random
import string
from operator import itemgetter
def randomString(minLen = 1, maxLen = 10):
""" Random string of length between 1 and 10 """
l = random.randint(minLen, maxLen)
return ''.join(random.choice(string.ascii_uppercase + string.digits) for _ in range(l))
COLUMNS = 4
def randomTable():
table = []
for i in range(10):
table.append( [randomString() for j in range(COLUMNS)] )
return table
def findMaxColumnLengs(table):
""" Returns tuple of max column lengs """
maxLens = [0] * COLUMNS
for l in table:
lens = [len(s) for s in l]
maxLens = [max(maxLens[e[0]], e[1]) for e in enumerate(lens)]
return maxLens
if __name__ == '__main__':
ll = randomTable()
ml = findMaxColumnLengs(ll)
# tuple of formatting statements, see format docs
formatStrings = ["{:<%s}" % str(m) for m in ml ]
fmtStr = "|".join(formatStrings)
print "=================================="
for l in ll:
print l
print "=================================="
for l in ll:
print fmtStr.format(*l)
This prints the initial table packed in the list of lists and the formatted output.
==================================
['2U7Q', 'DZK8Z5XT', '7ZI0W', 'A9SH3V3U']
['P7SOY3RSZ1', 'X', 'Z2W', 'KF6']
['NO8IEY9A', '4FVGQHG', 'UGMJ', 'TT02X']
['9S43YM', 'JCUT0', 'W', 'KB']
['P43T', 'QG', '0VT9OZ0W', 'PF91F']
['2TEQG0H6A6', 'A4A', '4NZERXV', '6KMV22WVP0']
['JXOT', 'AK7', 'FNKUEL', 'P59DKB8']
['BTHJ', 'XVLZZ1Q3H', 'NQM16', 'IZBAF']
['G0EF21S', 'A0G', '8K9', 'RGOJJYH2P9']
['IJ', 'SRKL8TXXI', 'R', 'PSUZRR4LR']
==================================
2U7Q |DZK8Z5XT |7ZI0W |A9SH3V3U
P7SOY3RSZ1|X |Z2W |KF6
NO8IEY9A |4FVGQHG |UGMJ |TT02X
9S43YM |JCUT0 |W |KB
P43T |QG |0VT9OZ0W|PF91F
2TEQG0H6A6|A4A |4NZERXV |6KMV22WVP0
JXOT |AK7 |FNKUEL |P59DKB8
BTHJ |XVLZZ1Q3H|NQM16 |IZBAF
G0EF21S |A0G |8K9 |RGOJJYH2P9
IJ |SRKL8TXXI|R |PSUZRR4LR
The code that you used is for MySQL. The critical part is the line widths.append(max(cd[2], len(cd[0]))) where cd[2] gives the length of the longest data in that column. This works for MySQLdb.
However, you are using sqlite3, for which the value cd[2] is set to None:
https://docs.python.org/2/library/sqlite3.html#sqlite3.Cursor.description
Thus, you will need to replace the following logic:
for cd in c.description:
widths.append(max(cd[2], len(cd[0])))
columns.append(cd[0])
with your own. The rest of the code should be fine as long as widths is computed correctly.
The easiest way to get the widths variable correctly, would be to traverse through each row of the result and find out the max width of each column, then append it to widths. This is just some pseudo code:
for cd in c.description:
columns.append(cd[0]) # Get column headers
widths = [0] * len(c.description) # Initialize to number of columns.
for row in x:
for i in range(len(row)): # This assumes that row is an iterable, like list
v = row[i] # Take value of ith column
widths[i] = max(len(v), widths[i]) # Compare length of current value with value already stored
At the end of this, widths should contain the maximum length of each column.

Categories