plotting using pandas in python - python

What i am trying to do is slightly basic, however i am very new to python, and am having trouble.
Goal: is to plot the yellow highlighted Row(which i have highlighted, however it will not be highlighted when i need to read the data) on the Y-Axis and plot the "Time" Column on the X-Axis.
Here is a photo of the Data, and then the code that i have tried along with its error.
Code
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
#Reading CSV and converting it to a df(Data_Frame)
df1 = pd.read_csv('Test_Sheet_1.csv', skiprows = 8)
#Creating a list from df1 and labeling it 'Time'
Time = df1['Time']
print(Time)
#Reading CSV and converting it to a df(Data_Frame)
df2 = pd.read_csv('Test_Sheet_1.csv').T
#From here i need to know how to skip 4 lines.
#I need to skip 4 lines AFTER the transposition and then we can plot DID and
Time
DID = df2['Parameters']
print(DID)
Error
As you can see from the code, right now i am just trying to print the Data so that i can see it, and then i would like to put it onto a graph.
I think i need to use the 'skiplines' function after the transposition, so that python can know where to read the "column" labeled parameters(its only a column after the Transposition), However i do not know how to use the skip lines function after the transposition unless i transpose it to a new Excel Document, but this is not an option.
Any help is very much appreciated,
Thank you!
Update
This is the output I get when I add print(df2.columns.tolist())

Related

Creating a table in python and printing to a PDF

I know some similar questions have been asked but none have been able to answer my question or maybe my python programming skills are not that great(they are not). Ultimately I'm trying to creating a table to look like the one below, all of the "Some values" will be filled with JSON data which I do know how to import but creating the table to then export it to a PDF using FPDF is what is stumping me. I've tried pretty table and wasn't able to achieve this I tried using html but really I dont know too much html to build a table from scratch like this. so if some one could help or point in the right direction it would be appreciated.
I would recommend the using both the Pandas Library and MatplotLib
Firstly, with Pandas you can load data in from a JSON, either from a JSON file or string with the read_json(..) function documented here.
Something like:
import pandas as pd
df = pd.read_json("/path/to/my/file.json")
There is plenty of functionality withing the pandas library to manipulate your dataframe (your table) however you need.
Once you're done, you can then use MatplotLib to generate your PDF without needing any HTML
This would then become something like
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
df = pd.read_json("/path/to/my/file.json")
# Manipulate your dataframe 'df' here as required.
# Now use Matplotlib to write to a PDF
# Lets create a figure first
fig, ax = plt.subplots(figsize=(20, 10))
ax.axis('off')
the_table = ax.table(
cellText=df.values,
colLabels=df.columns,
loc='center'
)
# Now lets create the PDF and write the table to it
pdf_pages = PdfPages("/path/to/new/table.pdf")
pdf_pages.savefig(fig)
pdf_pages.close()
Hope this helps.

Plotting multiple graphs from multiple text files in python

I have multiple text files in a directory. The 1st line of each text file is the header line. Rest of the lines are like columns containing different datas. I have to plot 7th column vs 5th column data graphs for each text file. I also want to plot all the graphs using a loop and a single code. Can anyone pls help me to do this? Thank you in advance.
You can use pandas and matplotlib.pyplot
import matplotlib.pyplot as plt
import pandas as pd
# sep= accepts the separator of your data i.e. ' ' space ',' comma etc
table = pd.read_csv('your_file_name.txt', sep=' ')
table.plot(x=['header_of_5th_col',y=['header_of_7th_col'])
I suggest also to check pandas documentations about loading data and plot them
You can then loop the table.plot line of code to plot every graph you need
code for getting all files in a specified directory:
import os
files = os.listdir("path/to/directory")
print(files)
for reading the files I would suggest the library pandas (here) and for plotting matplotlib (here).
for a more detailed solution more information on what exact data is given and what output is expected is needed.
for example sharing the first few lines of one of the files and a basic image created in paint or similar containing what things should roughly look like.

Saving columns from csv

I am trying to write a code that reads a csv file and can save each columns as a specific variable. I am having difficulty because the header is 7 lines long (something I can control but would like to just ignore if I can manipulate it in code), and then my data is full of important decimal places so it can not change to int( or maybe string?) I've also tried just saving each column by it's placement in the file but am struggling to run it. Any ideas?
Image shows my current code that I have slimmed to show important parts and circles data that prints in my console.
save each columns as a specific variable
import pandas as pd
pd.read_csv('file.csv')
x_col = df['X']
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
If what you are looking for is how to iterate through the columns, no matter how many there are. (Which is what I think you are asking.) Then this code should do the trick:
import pandas as pd
import csv
data = pd.read_csv('optitest.csv', skiprows=6)
for column in data.columns:
# You will need to define what this save() method is.
# Just placing it here as an example.
save(data[column])
The line about formatting your data as a number or a string was a little vague. But if it's decimal data, then you need to use float. See #9637665.

Plotting a geopandas dataframe using plotly

I have a geopandas dataframe, which consists of the region name(District), the geometry column, and the amount column. My goal is to plot a choropleth map using the method mentioned below
https://plotly.com/python/choropleth-maps/#using-geopandas-data-frames
Here’s a snippet of my dataframe
I also checked that my columns were in the right format/type.
And here's the code I used to plot the map
fig = px.choropleth(merged,
geojson=merged.geometry,
locations=merged.index,
color="Amount")
fig.update_geos(fitbounds="locations", visible=False)
fig.show()
It produced the below figure
which is obviously not the right figure. For some reasons, it doesn't show the map, instead it shows a line and when I zoom in, I am able to see the map but it has lines running through it. Like this
Has anyone ran into a similar problem? If so how were you able to resolve it?
The Plotly version I am using is 4.7.0. I have tried upgrading to a most recent version but it still didn’t work.
Any help is greatly appreciated. Please find my code and the data on my github.
I'll give you the answer to #tgrandje's comment that solved the problem. Thanks to #Poopah and #tgrandje for the opportunity to raise the answer.
import pandas as pd
import plotly.express as px
import geopandas as gpd
import pyproj
# reading in the shapefile
fp = "./data/"
map_df = gpd.read_file(fp)
map_df.to_crs(pyproj.CRS.from_epsg(4326), inplace=True)
df = pd.read_csv("./data/loans_amount.csv")
# join the geodataframe with the cleaned up csv dataframe
merged = map_df.set_index('District').join(df.set_index('District'))
#merged = merged.reset_index()
merged.head()
fig = px.choropleth(merged, geojson=merged.geometry, locations=merged.index, color="Amount")
fig.update_geos(fitbounds="locations", visible=False)
fig.show()
Another possible source of the problem (when using Plotly graph_objects) is mentioned in this answer over at gis.stackexchange.com:
The locations argument has to point to a column that matches GeoJSON's 'id's.
The geojson argument expects a dictionary.
To solve your problem, you should: (i) point locations to the dataframe's index, and (ii) turn your GeoJSON string to a dictionary.
It's not exactly the answer to your question, but I thought my problem was the same as yours and this helped me. So I am including the answer here.

How do I make the ploy show in my df analysis

I have a dataframe of emails that has three columns: From, Message and Received (which is a date format).
I've written the below script to show how many messages there are per month in a bar plot.
But the plot doesn't show and I can't work out why, it's no doubt very simple. Any help understanding why is much appreciated!
Thanks!
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('XXX')
df = df[df['Message'].notna()]
df['Received'] = pd.to_datetime(df['Received'], format='%d/%m/%Y')
df['Received'].groupby(df['Received'].dt.month).count().plot
A pyplot object (commonly plt) is not shown until you call plt.show(). It is designed that way so you can create your plot and then modify it as needed before showing or saving.
Also checkout plt.savefig().

Categories