I have multiple text files in a directory. The 1st line of each text file is the header line. Rest of the lines are like columns containing different datas. I have to plot 7th column vs 5th column data graphs for each text file. I also want to plot all the graphs using a loop and a single code. Can anyone pls help me to do this? Thank you in advance.
You can use pandas and matplotlib.pyplot
import matplotlib.pyplot as plt
import pandas as pd
# sep= accepts the separator of your data i.e. ' ' space ',' comma etc
table = pd.read_csv('your_file_name.txt', sep=' ')
table.plot(x=['header_of_5th_col',y=['header_of_7th_col'])
I suggest also to check pandas documentations about loading data and plot them
You can then loop the table.plot line of code to plot every graph you need
code for getting all files in a specified directory:
import os
files = os.listdir("path/to/directory")
print(files)
for reading the files I would suggest the library pandas (here) and for plotting matplotlib (here).
for a more detailed solution more information on what exact data is given and what output is expected is needed.
for example sharing the first few lines of one of the files and a basic image created in paint or similar containing what things should roughly look like.
Related
I am trying to write a code that reads a csv file and can save each columns as a specific variable. I am having difficulty because the header is 7 lines long (something I can control but would like to just ignore if I can manipulate it in code), and then my data is full of important decimal places so it can not change to int( or maybe string?) I've also tried just saving each column by it's placement in the file but am struggling to run it. Any ideas?
Image shows my current code that I have slimmed to show important parts and circles data that prints in my console.
save each columns as a specific variable
import pandas as pd
pd.read_csv('file.csv')
x_col = df['X']
https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html
If what you are looking for is how to iterate through the columns, no matter how many there are. (Which is what I think you are asking.) Then this code should do the trick:
import pandas as pd
import csv
data = pd.read_csv('optitest.csv', skiprows=6)
for column in data.columns:
# You will need to define what this save() method is.
# Just placing it here as an example.
save(data[column])
The line about formatting your data as a number or a string was a little vague. But if it's decimal data, then you need to use float. See #9637665.
I'm new on this site so be indulgent if i make a mistake :)
I recently imported a csv file on my Jupyter notebook for a student work. I want use some of data of specific column of this file. The problem is that after import, the file appear as a table with 5286 lines (which represent dates and hours of measures) in a single column (that compiles all variables separated by ; that i want use for my work).
I don't know how to do to put this like a regular table.
I used this code to import my csv from my board :
import pandas as pd
data = pd.read_csv('/work/Weather_data/data 1998-2003.csv','error_bad_lines = false')
Output:
Desired output: the same data in multiple columns, separated on ;.
You can try this:
import pandas as pd
data = pd.read_csv('<location>', sep=';')
I am importing a CSV file that contains data which is all in a single column (the TXT file has the data separated by ";"
Is there anyway to get the data to load into Anaconda (using Panda) so that it is in separate columns, or can it be manipulated afterwards into columns?
The data can be found at the following web-address (this is data about sunspots):
http://www.sidc.be/silso/INFO/snmtotcsv.php
From this website http://www.sidc.be/silso/datafiles
I have managed to do this so far:
Start code by loading the Panda command set
from pandas import *
#Initial setup commands
import warnings
warnings.simplefilter('ignore', FutureWarning)
import matplotlib
matplotlib.rcParams['axes.grid'] = True # show gridlines by default
%matplotlib inline
from scipy.stats import spearmanr
#load data from CSV file
startdata = read_csv('SN_m_tot_V2.0.csv',header=None)
startdata = startdata.reset_index()
I received an answer elsewhere; the lines of code that takes into account the lack of column headings AND the separator being s semi-colon is:
colnames=['Year','Month','Year (fraction)','Sunspot number','Std dev.','N obs.','Provisional']
ssdata=read_csv('SN_m_tot_V2.0.csv',sep=';',header=None,names=colnames)
This question already has answers here:
How to save a pandas DataFrame table as a png
(13 answers)
Closed 5 years ago.
working on pandas describe function. simple enough code:
df['Revenue'].describe()
output is :
perfect. my issue is i want to be able to save this data as either a png or a table so that i can place in a single page. this is for my EDA (exploratory data analysis) i have 6 main charts or information that i want to evaluate on each feature. each chart will be a seperate png file. i will then combine into one pdf file. i iterate over 300 + features so doing one at a time is not an option especially seeing as it is done monthly.
if you know of a way to save this table as a png or other similar file format that would be great. thanks for the look
Saving as a csv or xlsx file
You may use to_csv("filename.csv") or to_excel("filename.xlsx") methods to save the file in a comma separated format and then manipulate/format it in Excel however you want.
Example:
df['Revenue'].describe().to_csv("my_description.csv")
Saving as a png file
As mentioned in the comments, this post explains how to save pandas dataframe to png file via matplot lib. In your case, this should work:
import matplotlib.pyplot as plt
from pandas.plotting import table
desc = df['Revenue'].describe()
#create a subplot without frame
plot = plt.subplot(111, frame_on=False)
#remove axis
plot.xaxis.set_visible(False)
plot.yaxis.set_visible(False)
#create the table plot and position it in the upper left corner
table(plot, desc,loc='upper right')
#save the plot as a png file
plt.savefig('desc_plot.png')
What i am trying to do is slightly basic, however i am very new to python, and am having trouble.
Goal: is to plot the yellow highlighted Row(which i have highlighted, however it will not be highlighted when i need to read the data) on the Y-Axis and plot the "Time" Column on the X-Axis.
Here is a photo of the Data, and then the code that i have tried along with its error.
Code
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import style
style.use('ggplot')
#Reading CSV and converting it to a df(Data_Frame)
df1 = pd.read_csv('Test_Sheet_1.csv', skiprows = 8)
#Creating a list from df1 and labeling it 'Time'
Time = df1['Time']
print(Time)
#Reading CSV and converting it to a df(Data_Frame)
df2 = pd.read_csv('Test_Sheet_1.csv').T
#From here i need to know how to skip 4 lines.
#I need to skip 4 lines AFTER the transposition and then we can plot DID and
Time
DID = df2['Parameters']
print(DID)
Error
As you can see from the code, right now i am just trying to print the Data so that i can see it, and then i would like to put it onto a graph.
I think i need to use the 'skiplines' function after the transposition, so that python can know where to read the "column" labeled parameters(its only a column after the Transposition), However i do not know how to use the skip lines function after the transposition unless i transpose it to a new Excel Document, but this is not an option.
Any help is very much appreciated,
Thank you!
Update
This is the output I get when I add print(df2.columns.tolist())