Hello I cannot understand why this code does not select rows between dates. It shows me same dataset from first date 2004. Here is my code below:
import pandas as pd
from pandas import DataFrame
import datetime
from matplotlib import pyplot as plt
df1 = pd.read_csv('time_series_15min_singleindex.csv',header=0,index_col=0,parse_dates=True)
df=DataFrame(df1,columns['utc_timestamp','DE_solar_generation_actual','DE_wind_onshore_generation_actual']
df['utc_timestamp'] = pd.to_datetime(df['utc_timestamp'],utc=True)
start_date=pd.to_datetime('2008-12-31',utc=True)
end_date=pd.to_datetime('2009-01-01',utc=True)
df[df['utc_timestamp'].between(start_date,end_date)]
df.plot()
You forget assign back, use:
df = df[df['utc_timestamp'].between(start_date,end_date)]
Related
I am using a dataset that can be found on Kaggle website (https://www.kaggle.com/claytonmiller/lbnl-automated-fault-detection-for-buildings-data).
I am trying to write a code that can specify based on Timestamp to look for those specific rows and apply a condition (In the context of this dataset the time between 10:01 PM to 6:59 AM) and fill all the columns corresponding to those specific rows with zero.
I have tried the following code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
%matplotlib inline
df = pd.read_csv('RTU.csv')
def fill_na(row):
if dt.time(22, 1) <= pd.to_datetime(row['Timestamp']).time() <= dt.time(6, 59):
row.fillna(0)
### df = df.apply(fill_na, axis=1) ###
df= df.apply(lambda row : fill_na(row), axis=1)
#### df.fillna(0, inplace=True) ###
df.head(2000)
However after changing the axis of the dataset it seems it can no longer work as intended.
I don't think you need a function to do that. Just filter the rows using a condition and then fillna.
import datetime as dt
import pandas as pd
df = pd.read_csv('RTU.csv',parse_dates=['Timestamp'])
df.head()
cond = (df.Timestamp.dt.time > dt.time(22,0)) | ((df.Timestamp.dt.time < dt.time(7,0)))
df[cond] = df[cond].fillna(0,axis=1)
Shows that the na before 7am fill with 0
How can I pick out just July-month of these time series? My time series goes from 1985-2018 with runoff values on the right side. I need to get some help with further code to pick out the July-values and then plot it.
my code:
from pandas import read_csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
import cartopy
from datetime import date,datetime
dir1 = "mystiations/"
files = os.listdir(dir1)
files = np.sort(files)
files_txt = [i for i in files if i.endswith('.txt_')]
df = pd.read_csv(dir1+files_txt[0],skiprows=6,header=None, index_col=0,sep=" ",na_values=-9999)
df.index = pd.to_datetime(df.index,format="%Y%m%d/%H%M")
parse_dates=True
index_col=0
myperiod = df["1985":"2018"]
myperiod
runoff
I'm trying to make a graph of the first column ('Time') of a csv file plotted against the the second column ('Bid').
Here's what I have so far.
import pandas as pd
import datetime
import csv
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
headers = ['Time','Bid','Ask']
df = pd.read_csv('quotes_format.csv')
x = df['Time']
y = df['Bid']
plt.plot(x,y)
plt.gcf().autofmt_xdate()
plt.show()
The csv file looks something like this
This fails and returns exit code 1. How would I fix this so it would generate the graph I'm looking for?
You can specify what the names of each column in the dataframe are with the parameter names.
headers = ['Time','Bid','Ask']
df = pd.read_csv('quotes_format.csv', names=headers)
Here is the documentation for the pandas read_csv function.
import sys
import ConfigParser
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as DT
import bokeh
sys.path.extend(['..\..\myProj\SOURCE'])
fullfilepath = "../../myProj/SOURCE/" + 'myparts.txt'
ohg_df = pd.read_csv(fullfilepath, sep="\t" )
temp_df = temp_df[['as_on_date', 'ohg_qty']]
temp_df = temp_df.sort(['as_on_date'], ascending=[1])
temp_df.set_index('as_on_date')
plt.plot(temp_df.index, temp_df.ohg_qty)
plt.show()
This is my dataframe after importing.
I am trying to plot the line graph with x axis as date mentioned in the dataframe.
Can someone guide me... I am new to pandas.
dataframe picture
output pitcure
Easier:
# Set index directly
ohg_df = pd.read_csv(fullfilepath, sep="\t", index='as_on_date')
# Convert string index to dates
ohg_df.index = pd.to_datetime(ohg_df.index)
# Get a column and plot it (taking a column keeps the index)
plt.plot(ohg_df.ohg_qty)
sample code:
import pandas as pd
import numpy as np
sample = pd.DataFrame({"a":[1,2,3,1,2,3,1,2,3], "b":np.random.uniform(0,1,9)})
sample.boxplot(column="b", by=pd.cut(sample.a, bins=2))
Apart from the box plot picture, some text appears around the plot. How can I remove the text from the plot?
You can try create new column c by cut, because in DataFrame.boxplot parameter by can be column:
by : string or sequence
Column in the DataFrame to group by
import pandas as pd
import numpy as np
sample = pd.DataFrame({"a":[1,2,3,1,2,3,1,2,3], "b":np.random.uniform(0,1,9)})
sample['c'] = pd.cut(sample.a, bins=2)
sample.boxplot(column="b", by='c')