How to generate table using Python - python

I am quite struggling with as I tried many libraries to print table but no success - so I thought to post here and ask.
My data is in a text file (resource.txt) which looks like this (the exact same way it prints)
pipelined 8 8 0 17 0 0
nonpipelined 2 2 0 10 0 0
I want my data print in the following manner
Design name LUT Lut as m Lut as I FF DSP BRAM
-------------------------------------------------------------------
pipelined 8 8 0 17 0 0
Non piplined 2 2 0 10 0 0
Some time data may be more line column remain same but rows may increase.
(i have python 2.7 version)
I am using this part in my python code all code working but am couldn't able print data which i extracted to text file in tabular form. As I can't use panda library as it won't support for python 2.7, but I can use tabulate and all library. Can anyone please help me?
I tried using tabulate and all but I keep getting errors.
I tried at end simple method to print but its not working (same code works if I put at top of code but at the end of code this won't work). Does anyone have any idea?
q11=open( "resource.txt","r")
for line in q11:
print(line)

Here's a self contained function that makes a left-justified, technical paper styled table.
def makeTable(headerRow,columnizedData,columnSpacing=2):
"""Creates a technical paper style, left justified table
Author: Christopher Collett
Date: 6/1/2019"""
from numpy import array,max,vectorize
cols = array(columnizedData,dtype=str)
colSizes = [max(vectorize(len)(col)) for col in cols]
header = ''
rows = ['' for i in cols[0]]
for i in range(0,len(headerRow)):
if len(headerRow[i]) > colSizes[i]: colSizes[i]=len(headerRow[i])
headerRow[i]+=' '*(colSizes[i]-len(headerRow[i]))
header+=headerRow[i]
if not i == len(headerRow)-1: header+=' '*columnSpacing
for j in range(0,len(cols[i])):
if len(cols[i][j]) < colSizes[i]:
cols[i][j]+=' '*(colSizes[i]-len(cols[i][j])+columnSpacing)
rows[j]+=cols[i][j]
if not i == len(headerRow)-1: rows[j]+=' '*columnSpacing
line = '-'*len(header)
print(line)
print(header)
print(line)
for row in rows: print(row)
print(line)
And here's an example using this function.
>>> header = ['Name','Age']
>>> names = ['George','Alberta','Frank']
>>> ages = [8,9,11]
>>> makeTable(header,[names,ages])
------------
Name Age
------------
George 8
Alberta 9
Frank 11
------------

Since the number of columns remains the same, you could just print out the first line with ample spaces as required. Ex-
print("Design name",' ',"LUT",' ',"Lut as m",' ',"and continue
like that")
Then read the csv file. datafile will be
datafile = open('resource.csv','r')
reader = csv.reader(datafile)
for col in reader:
print(col[0],' ',col[1],' ',col[2],' ',"and
continue depending on the number of columns")
This is not he optimized solution but since it looks like you are new, therefore this will help you understand better. Or else you can use row_format print options in python 2.7.

Here is code to print table in nice table, you trasfer all your data to sets then you can data or else you can trasfer data in text file line to one set and print it
from beautifultable import BeautifulTable
h0=["jkgjkg"]
h1=[2,3]
h2=[2,3]
h3=[2,3]
h4=[2,3]
h5=[2,3]
h0.append("FPGA resources")
table = BeautifulTable()
table.column_headers = h0
table.append_row(h1)
table.append_row(h2)
table.append_row(h3)
table.append_row(h4)
table.append_row(h5)
print(table)
Out Put:
+--------+----------------+
| jkgjkg | FPGA resources |
+--------+----------------+
| 2 | 3 |
+--------+----------------+
| 2 | 3 |
+--------+----------------+
| 2 | 3 |
+--------+----------------+
| 2 | 3 |
+--------+----------------+
| 2 | 3 |
+--------+----------------+

Related

Importing csv formatted for excel into a dataframe

I am receiving datafiles from 2 different people and the files are coming through with different formats despite both users using the same system and the same browser.
I would like to be able to make my code smart enough to read either format but so far I have been unsuccessful.
The data coming through I am having issues with looks like this
+----------------+---------------+--------------+
| Customer Name | Customer code | File Ref |
+----------------+---------------+--------------+
| ACCOUNT SET UP | ="35" | R2I0025715 |
+----------------+---------------+--------------+
| Xenox | ="4298" | ="913500999" |
+----------------+---------------+--------------+
and the data that is importing cleanly looks like this
+----------------+---------------+------------+
| Customer Name | Customer code | File Ref |
+----------------+---------------+------------+
| ACCOUNT SET UP | 35 | R2I0025715 |
+----------------+---------------+------------+
| Xenox | 4298 | 913500999 |
+----------------+---------------+------------+
I am trying to import the data with the following code pd.read_csv(f, encoding='utf-8', dtype={"Customer Name": "string", "Customer code": "string", "File Ref": "string"})
A workaround that I am using is opening each csv in excel, and saving. But when this involves hundreds of files, it isn't really a workaround.
Can anyone help?
You could use the standard strip() function to remove leading and trailing = and " characters on all of your columns.
For example:
import pandas as pd
data = {
'Customer Name' : ['ACCOUNT SET UP', 'Xenox', 'ACCOUNT SET UP', 'Xenox'],
'Customer Code': ['="35"', '="4298"', '35', '4298'],
'File Ref': ['R2I0025715', '="913500999"', 'R2I0025715', '913500999']
}
df = pd.DataFrame(data)
for col in df.columns:
df[col] = df[col].str.strip('="')
print(df)
Giving you:
Customer Name Customer Code File Ref
0 ACCOUNT SET UP 35 R2I0025715
1 Xenox 4298 913500999
2 ACCOUNT SET UP 35 R2I0025715
3 Xenox 4298 913500999
If you just want to apply it to specific columns, use:
for col in ['Customer Code', 'File Ref']:
df[col] = df[col].str.strip('="')
My Solution:
import re
import pandas as pd
def removechar(x):
x = str(x)
out = re.sub('="', '', x)
return(out)
def removechar2(x):
x = str(x)
out = re.sub('"', '', x)
out = int(out) #could use float(), depends on what you want
return(out)
#then use applymap from pandas
Example:
datas = {'feature1': ['="23"', '="24"', '="23"', '="83"'], 'feature2': ['="23"', '="2"', '="3"', '="23"']}
test = pd.DataFrame(datas) # Example dataframe
test
Out[1]:
feature1 feature2
0 ="23" ="23"
1 ="24" ="2"
2 ="23" ="3"
3 ="83" ="23"
#applymap my functions
test = test.applymap(removechar)
test = test.applymap(removechar2)
test
Out[2]:
feature1 feature2
0 23 23
1 24 2
2 23 3
3 83 23
#fixed
Note you could probably do it just one line of applymap and one function running re.sub, try googling and reading the documentation for re.sub, this was something quick I whipped up.

using data from one column as part of an update for another column using Pandas from a very large csv file

I have a very large CSV file, where I generate a URL by using data from one column [c] and update the corresponding column cell [f] with the new information. Although I program a lot in Python, I don't use Pandas that often, so I am unsure as to where to handle this problem.
F is the final output, so I am using the C column as an image name, the of the URL is the same.
| c | f |
| ------- | -------------------------- |
| 2134 | http://url.com/2134.jpg |
| 3e32 | http://url.com/3e32.jpg |
| jhknh | http://url.com/jhknh.jpg |
| 12.12.3 | http://url.com/12.12.3.jpg |
I have searched but I have not been able to find an implementable solution. I know, I probably would have to use chunksize for this, as there could be upward of 20000 records.
Any assistance with this would be greatly appreciated. I have looked and tried a few things but I am unable to come up with a solution.
Thank you in advance
~ E
Load your CSV file into a dataframe and update column 'f'
df = pd.read_csv('yourdatafile.csv')
df['f'] = 'http://url.com/' + df.c + '.jpg'
df
Output
c f
0 2134 http://url.com/2134.jpg
1 3e32 http://url.com/3e32.jpg
2 jhnkhk http://url.com/jhnkhk.jpg
3 12.12.1 http://url.com/12.12.1.jpg
If your records don't fit in memory you can chunk your data and append every chunk to a new file.
header = True
for chunk in pd.read_csv('yourdatafile.csv', chunksize=1000):
chunk['f'] = 'http://newurl.com/' + chunk.c + '.jpg'
chunk.to_csv('newdata.csv', mode='a+', index=False, header=header)
header = False

Delete an entire row in an excel file using Python

I am currently using work_sheet.delete_rows() using openpyxl module, but it only seems to delete the values in the row not the actual row itself..
How can I delete the entire row ??
This is the actual Excel Data I am currently accessing..
This is now the output after deleting the rows.. The process only deletes the values of each rows not the entire row itself...
I'm guessing there's something wrong with the way you're calling it. While you've stated what you're using, you haven't given us the actual code that you're using.
The actual source code for OpenPyXl shows that it does actually move the rows below the area being deleted up to the start of that area.
You can actually see this by using the following simple spreadsheet:
A B
--+-------+-------+
1 | 1 | a |
2 | 3 | b |
3 | 5 | c |
4 | 7 | d |
5 | 9 | e |
The following script shows the deletion of the second, third, and fourth row:
import openpyxl
def dumpWs(desc, ws):
print(f'{desc}: ', end = '')
for row in ws.values:
print(end = '<')
sep = ''
for value in row:
print(f'{sep}{value}', end = '')
sep = ', '
print(end = '> ')
print()
wb = openpyxl.load_workbook('version1.xlsx')
ws = wb.active
dumpWs('As loaded ', ws)
ws.delete_rows(2, 3)
dumpWs('After delete', ws)
wb.save('version2.xlsx')
Running this script results in:
As loaded : <1, a> <3, b> <5, c> <7, d> <9, e>
After delete: <1, a> <9, e>
The version saved with the wb.save line is as expected:
A B
--+-------+-------+
1 | 1 | a |
5 | 9 | e |
One thing you may want to check is the version of OpenPyXl that you're using, since there are several bug reports on various iterations of delete_rows. I'm using version 3.0.3 (with Python 3.6.8, for completeness).

Finding the row number for the header row in a CSV file / Pandas Dataframe

I am trying to get an index or row number for the row that holds the headers in my CSV file.
The issue is, the header row can move up and down depending on the output of the report from our system (I have no control to change this)
code:
ht = pd.read_csv(file.csv)
test = ht.get_loc('Code') #Code being header im using to locate the header row
csv1 = read_csv(file.csv, header=test)
df1 = df1.append(csv1) #Appending as have many files
If I was to print test, I would expect a number around 4 or 5, and that's what I am feeding into the second read "read_csv"
The error I'm getting is that it's expecting 1 header column, but I have 26 columns. I am just trying to use the first header string to get the row number
Thanks
:-)
Edit:
CSV format
This file contains the data around the volume of items blablalbla
the deadlines for delivery of items a - z is 5 days
the deadlines for delivery of items aa through zz are 3 days
the deadlines for delivery of items aaa through zzz are 1 days
code,type,arrived_date,est_del_date
a/wrwgwr12/001,kids,12-dec-18,17-dec-18
aa/gjghgj35/030,pet,15-dec-18,18-dec-18
as you will see the "The deadlines" rows are the same, this can be 3 or 5 based on the code ids, thus the header row can change up or down.
I also did not write out all 26 column headers, not sure that matters.
Wanted DF format
index | code | type | arrived_date | est_del_date
1 | a/wrwgwr12/001 | kids | 12-dec-18 | 17-dec-18
2 | aa/gjghgj35/030 | Pet | 15-dec-18 | 18-dec-18
Hope this makes sense..
Thanks,
You can use the csv module to find the first row which contains a delimiter, then feed the index of this row as the skiprows parameter to pd.read_csv:
from io import StringIO
import csv
import pandas as pd
x = """This file contains the data around the volume of items blablalbla
the deadlines for delivery of items a - z is 5 days
the deadlines for delivery of items aa through zz are 3 days
the deadlines for delivery of items aaa through zzz are 1 days
code,type,arrived_date,est_del_date
a/wrwgwr12/001,kids,12-dec-18,17-dec-18
aa/gjghgj35/030,pet,15-dec-18,18-dec-18"""
# replace StringIO(x) with open('file.csv', 'r')
with StringIO(x) as fin:
reader = csv.reader(fin)
idx = next(idx for idx, row in enumerate(reader) if len(row) > 1) # 4
# replace StringIO(x) with 'file.csv'
df = pd.read_csv(StringIO(x), skiprows=idx)
print(df)
code type arrived_date est_del_date
0 a/wrwgwr12/001 kids 12-dec-18 17-dec-18
1 aa/gjghgj35/030 pet 15-dec-18 18-dec-18

Returning Max value grouping by N attributes

I am coming from a Java background and learning Python by applying it in my work environment whenever possible. I have a piece of functioning code that I would really like to improve.
Essentially I have a list of namedtuples with 3 numerical values and 1 time value.
complete=[]
uniquecomplete=set()
screenedPartitions = namedtuple('screenedPartitions'['feedID','partition','date', 'screeeningMode'])
I parse a log and after this is populated, I want to create a reduced set that is essentially the most recently dated member where feedID, partition and screeningMode are identical. So far I can only get it out by using a nasty nested loop.
for a in complete:
max = a
for b in complete:
if a.feedID == b.feedID and a.partition == b.partition and\
a.screeeningMode == b.screeeningMode and a.date < b.date:
max = b
uniqueComplete.add(max)
Could anyone give me advice on how to improve this? It would be great to work it out with whats available in the stdlib, as I guess my main task here is to get me thinking about it with the map/filter functionality.
The data looks akin to
FeedID | Partition | Date | ScreeningMode
68 | 5 |10/04/2017 12:40| EPEP
164 | 1 |09/04/2017 19:53| ISCION
164 | 1 |09/04/2017 20:50| ISCION
180 | 1 |10/04/2017 06:11| ISAN
128 | 1 |09/04/2017 21:16| ESAN
So
after the code is run line 2 would be removed as line 3 is a more recent version.
Tl;Dr, what would this SQL be in Python :
SELECT feedID,partition,screeeningMode,max(date)
from Complete
group by 'feedID','partition','screeeningMode'
Try something like this:
import pandas as pd
df = pd.DataFrame(screenedPartitions, columns=screenedPartitions._fields)
df = df.groupby(['feedID','partition','screeeningMode']).max()
It really depends on how your date is represented, but if you provide data I think we can work something out.

Categories