Save Pandas describe for human readibility [duplicate] - python

This question already has answers here:
How to save a pandas DataFrame table as a png
(13 answers)
Closed 5 years ago.
working on pandas describe function. simple enough code:
df['Revenue'].describe()
output is :
perfect. my issue is i want to be able to save this data as either a png or a table so that i can place in a single page. this is for my EDA (exploratory data analysis) i have 6 main charts or information that i want to evaluate on each feature. each chart will be a seperate png file. i will then combine into one pdf file. i iterate over 300 + features so doing one at a time is not an option especially seeing as it is done monthly.
if you know of a way to save this table as a png or other similar file format that would be great. thanks for the look

Saving as a csv or xlsx file
You may use to_csv("filename.csv") or to_excel("filename.xlsx") methods to save the file in a comma separated format and then manipulate/format it in Excel however you want.
Example:
df['Revenue'].describe().to_csv("my_description.csv")
Saving as a png file
As mentioned in the comments, this post explains how to save pandas dataframe to png file via matplot lib. In your case, this should work:
import matplotlib.pyplot as plt
from pandas.plotting import table
desc = df['Revenue'].describe()
#create a subplot without frame
plot = plt.subplot(111, frame_on=False)
#remove axis
plot.xaxis.set_visible(False)
plot.yaxis.set_visible(False)
#create the table plot and position it in the upper left corner
table(plot, desc,loc='upper right')
#save the plot as a png file
plt.savefig('desc_plot.png')

Related

Plotting multiple graphs from multiple text files in python

I have multiple text files in a directory. The 1st line of each text file is the header line. Rest of the lines are like columns containing different datas. I have to plot 7th column vs 5th column data graphs for each text file. I also want to plot all the graphs using a loop and a single code. Can anyone pls help me to do this? Thank you in advance.
You can use pandas and matplotlib.pyplot
import matplotlib.pyplot as plt
import pandas as pd
# sep= accepts the separator of your data i.e. ' ' space ',' comma etc
table = pd.read_csv('your_file_name.txt', sep=' ')
table.plot(x=['header_of_5th_col',y=['header_of_7th_col'])
I suggest also to check pandas documentations about loading data and plot them
You can then loop the table.plot line of code to plot every graph you need
code for getting all files in a specified directory:
import os
files = os.listdir("path/to/directory")
print(files)
for reading the files I would suggest the library pandas (here) and for plotting matplotlib (here).
for a more detailed solution more information on what exact data is given and what output is expected is needed.
for example sharing the first few lines of one of the files and a basic image created in paint or similar containing what things should roughly look like.

Can I save a table/dataframe to a file (like png/jpg) in python? [duplicate]

This question already has answers here:
How to save a pandas DataFrame table as a png
(13 answers)
Closed 1 year ago.
I found several ways to PRINT tables in nicer formatting but can I also SAVE those outputs to a file (not csv, excel etc.)? They don't even need to be changeable, an image-like representation would be great. I get presentation-ready dataframes that I have to reformat since I'm saving them in excel files at the moment.
Assuming this table is a pandas DataFrame, this library might help:
www.dexplo.org/dataframe_image/
This library would export pandas DataFrames in a jupyter notebook fashioned way.
Example usage:
import pandas as pd
import dataframe_image as dfi
df = pd.DataFrame({'key':[1,2,3],'val':['a','b','c']})
dfi.export(df, 'dataframe.png')

Can we access and replace a image in a xlsx

I am writing a script for automation of a process which requires updating an excel file and then plotting a graph based on some data present in Excel file and then insert the graph in the same Excel file .
I have used openpyxl for reading and writing excel file and then used matplotlib for drawing a graph for data and then inserted the graph to the same excel file. The data in the Excel file is being updated once or twice a week. Everytime the data is updated I need to plot a updated graph and insert the graph in the Excel file.Right now my script is updating the values in the Excel file Automatically and plotting the graph for the updated data but when I insert the graph it doesnot overwrites the previous graph it everytime appends the graph above the previous graph because of which the size of the Excel file will keep on increasing.
Right now the code that i am using for plotting and inserting the graph in Excel file is-
fig = plt.figure(figsize=(8,4))
PLT = fig.add_axes([0.04, 0.08, 0.79, 0.8])
plt.xlabel("WORKING WEEK (WW)",fontsize=7)
plt.ylabel("UTILIZATION [%]",fontsize=7)
plt.title("PATCH UTILIZATION",fontsize=9)
#PLT.subplots_adjust(right=0.8)
for i in range(len(p)):
PLT.plot(x,p[i],label = '%s'%row[0],marker="s",markersize=2)
PLT.legend(bbox_to_anchor=(1.21,1),borderaxespad=0,prop={'size':6})
PLT.tick_params(axis='both', which='major', labelsize=4)
plt.savefig("myplot.png",dpi=160)
wb=load_workbook('Excel.xlsm',read_only=False,keep_vba=True)
ws=wb['Patch Util']
img = openpyxl.drawing.image.Image("myplot.png")
img.anchor='D50'
ws.add_image(img)
wb.save('Excel.xlsm')
"x" and "p" are two lists (p is list of lists) which are containing data and will be updated when the data in the Excel file is updated.
What I want is to plot a graph and insert it once. Now whenever the data is updated I want to access the same graph from the Excel file, plot it for updated data and re-inserting it in Excel file,instead of inserting a new graph everytime at the top of previous graph,so that the size of Excel file remains same.
It will be great help if anyone can help me with this
Comment: No..I am using 2.5.6 version and in my case every graph and chart is retained
Show me the output of the following:
from openpyxl import load_workbook
wb = load_workbook(<your file path>)
ws = wb.worksheets[<your sheet index>]
for image in ws._images:
print("_id:{}, img.path:{}".format(image._id, image.path))
Comment: the output i got is- _id:1, img.path:/xl/media/image1.png
Question: Can we access and replace a image in a xlsx
You can do it replacing the Image object in ws._images.
For the first time , you have to initalise the ref data, doing it as usual using ws.add_image(...). If an image exists len(ws._images) == 1 you can replace it with a new Image object.
For example:
if len(ws._images) == 0:
# Initalise the `ref` data, do ws.add_image(...)
img = openpyxl.drawing.image.Image("myplot.png")
img.anchor='D50'
ws.add_image(img)
elif len(ws._images) == 1:
# Replace the first image do **only** the following:
ws._images[0] = openpyxl.drawing.image.Image("myplot.png")
# Update the default anchor `A1` to your needs
ws._images[0].anchor='D50'
else:
raise(ValueError, "Found more than 1 Image!")
Note: You are using a class private property, this could result in unexpected side effect.
Working with openpyxl Version 2.5.6

plotting using pandas in python

What i am trying to do is slightly basic, however i am very new to python, and am having trouble.
Goal: is to plot the yellow highlighted Row(which i have highlighted, however it will not be highlighted when i need to read the data) on the Y-Axis and plot the "Time" Column on the X-Axis.
Here is a photo of the Data, and then the code that i have tried along with its error.
Code
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
#Reading CSV and converting it to a df(Data_Frame)
df1 = pd.read_csv('Test_Sheet_1.csv', skiprows = 8)
#Creating a list from df1 and labeling it 'Time'
Time = df1['Time']
print(Time)
#Reading CSV and converting it to a df(Data_Frame)
df2 = pd.read_csv('Test_Sheet_1.csv').T
#From here i need to know how to skip 4 lines.
#I need to skip 4 lines AFTER the transposition and then we can plot DID and
Time
DID = df2['Parameters']
print(DID)
Error
As you can see from the code, right now i am just trying to print the Data so that i can see it, and then i would like to put it onto a graph.
I think i need to use the 'skiplines' function after the transposition, so that python can know where to read the "column" labeled parameters(its only a column after the Transposition), However i do not know how to use the skip lines function after the transposition unless i transpose it to a new Excel Document, but this is not an option.
Any help is very much appreciated,
Thank you!
Update
This is the output I get when I add print(df2.columns.tolist())

use python to generate graph in excel

I have been trying to generate data in Excel.
I generated .CSV file.
So up to that point it's easy.
But generating graph is quite hard in Excel...
I am wondering, is python able to generate data AND graph in excel?
If there are examples or code snippets, feel free to post it :)
Or a workaround can be use python to generate graph in graphical format like .jpg, etc or .pdf file is also ok..as long as workaround doesn't need dependency such as the need to install boost library.
Yes, Xlsxwriter[docs][pypi] has a lot of utility for creating excel charts in Python. However, you will need to use the xlsx file format, there is not much feedback for incorrect parameters, and you cannot read your output.
import xlsxwriter
import random
# Example data
# Try to do as much processing outside of initializing the workbook
# Everything beetween Workbook() and close() gets trapped in an exception
random_data = [random.random() for _ in range(10)]
# Data location inside excel
data_start_loc = [0, 0] # xlsxwriter rquires list, no tuple
data_end_loc = [data_start_loc[0] + len(random_data), 0]
workbook = xlsxwriter.Workbook('file.xlsx')
# Charts are independent of worksheets
chart = workbook.add_chart({'type': 'line'})
chart.set_y_axis({'name': 'Random jiggly bit values'})
chart.set_x_axis({'name': 'Sequential order'})
chart.set_title({'name': 'Insecure randomly jiggly bits'})
worksheet = workbook.add_worksheet()
# A chart requires data to reference data inside excel
worksheet.write_column(*data_start_loc, data=random_data)
# The chart needs to explicitly reference data
chart.add_series({
'values': [worksheet.name] + data_start_loc + data_end_loc,
'name': "Random data",
})
worksheet.insert_chart('B1', chart)
workbook.close() # Write to file
You have 2 options:
If you are on windows, you can use pywin32 (included in ActivePython) library to automate Excel using OLE automation.
from win32com.client import Dispatch
ex = Dispatch("Excel.Application")
# you can use the ex object to invoke Excel methods etc.
If all you want to just generate basic plots etc. you can use matplotlib.
I suggest you to try gnuplot while drawing graph from data files.
If you do decide to use matplotlib, check out my excel to python class PyWorkbooks to get the data. It lets you retrieve data efficiently and easily as numpy arrays (the native datatype for matplotlib).
https://sourceforge.net/projects/pyworkbooks/
#David Gao, I am looking at doing something similar. Currently I am looking at using the raw csv or converting it to json and just dropping it in a folder that is being read by jqplot.jquery plotting and graphing library. Then all I need to do is have the user or myself display the plot in any web browser.

Categories