How to plot date at x axis - python

So I have this df with the first column called "Week":
0 2018-01-07
1 2018-01-14
2 2018-01-21
3 2018-01-28
4 2018-02-04
5 2018-02-11
6 2018-02-18
7 2018-02-25
8 2018-03-04
9 2018-03-11
10 2018-03-18
11 2018-03-25
12 2018-04-01
13 2018-04-08
14 2018-04-15
15 2018-04-22
16 2018-04-29
17 2018-05-06
Name: Week, dtype: object
And other three columns with different names and intergers as values.
My ideia is to plot these dates at X axis and the other 3 columns with ints at Y.
I've tried everything I found but nothing have worked yet...
I did:
df.set_index('Week')
df.plot()
plt.show()
Which worked very well, but X axis stil a float in range(0, 17)...
I also tried:
df['Week'] = pd.to_datetime(df['Week'])
df.set_index('Week')
df.plot()
plt.show()
But I got this error:
Traceback (most recent call last):
File "C:\Users\mar\Desktop\Web Dev\PJ E\EZA.py", line 33, in <module>
df.plot()
File "C:\Users\mar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\plotting\_core.py", line 2677, in __call__
sort_columns=sort_columns, **kwds)
File "C:\Users\mar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\plotting\_core.py", line 1902, in plot_frame
**kwds)
File "C:\Users\mar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\plotting\_core.py", line 1729, in _plot
plot_obj.generate()
File "C:\Users\mar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\plotting\_core.py", line 258, in generate
self._post_plot_logic_common(ax, self.data)
File "C:\Users\mar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\plotting\_core.py", line 397, in _post_plot_logic_common
self._apply_axis_properties(ax.yaxis, fontsize=self.fontsize)
File "C:\Users\mar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\plotting\_core.py", line 470, in _apply_axis_properties
labels = axis.get_majorticklabels() + axis.get_minorticklabels()
File "C:\Users\mar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\matplotlib\axis.py", line 1188, in get_majorticklabels
ticks = self.get_major_ticks()
File "C:\Users\mar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\matplotlib\axis.py", line 1339, in get_major_ticks
numticks = len(self.get_major_locator()())
File "C:\Users\mar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\matplotlib\dates.py", line 1054, in __call__
self.refresh()
File "C:\Users\mar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\matplotlib\dates.py", line 1074, in refresh
dmin, dmax = self.viewlim_to_dt()
File "C:\Users\mar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\matplotlib\dates.py", line 832, in viewlim_to_dt
return num2date(vmin, self.tz), num2date(vmax, self.tz)
File "C:\Users\mar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\matplotlib\dates.py", line 441, in num2date
return _from_ordinalf(x, tz)
File "C:\Users\mar\AppData\Local\Programs\Python\Python36-32\lib\site-packages\matplotlib\dates.py", line 256, in _from_ordinalf
dt = datetime.datetime.fromordinal(ix).replace(tzinfo=UTC)
ValueError: ordinal must be >= 1
Thanks in advance.

you can do something like this below:
df['Week'] = pd.to_datetime(df['Week'])
df.set_index('Week', inplace=True)
df.plot()

Related

Xarray drop sel with MultiIndex

I want to calculate the anomaly of climate data. The code is shown as follow:
import pandas as pd
import numpy as np
import xarray as xr
date = pd.date_range('2000-01-01','2010-12-31') #4018 days
data = np.random.rand(len(date))
da = xr.DataArray(data=data,
dims='date',
coords=dict(date=date))
monthday = pd.MultiIndex.from_arrays([da['date.month'].values, da['date.day'].values])
da = da.assign_coords(monthday=('date',monthday)).groupby('monthday').mean(dim='date')
print(da)
<xarray.DataArray (monthday: 366)>
array([0.38151556, 0.46306277, 0.46148326, 0.35894069, 0.48318011,
0.44736969, 0.46828286, 0.44927365, 0.59294693, 0.61940206,
0.54264219, 0.51797117, 0.46200014, 0.50356122, 0.49371135,
...
0.44668478, 0.32583885, 0.36537256, 0.64087588, 0.56546472,
0.5021695 , 0.42450777, 0.49071572, 0.39639316, 0.53538823,
0.48345995, 0.46290486, 0.75160507, 0.4945804 , 0.52283262,
0.45320128])
Coordinates:
* monthday (monthday) MultiIndex
- monthday_level_0 (monthday) int64 1 1 1 1 1 1 1 1 ... 12 12 12 12 12 12 12
- monthday_level_1 (monthday) int64 1 2 3 4 5 6 7 8 ... 25 26 27 28 29 30 31
The monthday contains (2,29), i.e., the leap day. So how can I drop the leap day. I have try but it seems to wroks wrong
da.drop_sel(monthday=(2,29))
Traceback (most recent call last):
File "/Users/osamuyuubu/anaconda3/envs/xesmf_env/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3441, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-65-caf7267f29a4>", line 11, in <module>
da.drop_sel(monthday=(2,29))
File "/Users/osamuyuubu/anaconda3/envs/xesmf_env/lib/python3.7/site-packages/xarray/core/dataarray.py", line 2374, in drop_sel
ds = self._to_temp_dataset().drop_sel(labels, errors=errors)
File "/Users/osamuyuubu/anaconda3/envs/xesmf_env/lib/python3.7/site-packages/xarray/core/dataset.py", line 4457, in drop_sel
new_index = index.drop(labels_for_dim, errors=errors)
File "/Users/osamuyuubu/anaconda3/envs/xesmf_env/lib/python3.7/site-packages/pandas/core/indexes/multi.py", line 2201, in drop
loc = self.get_loc(level_codes)
File "/Users/osamuyuubu/anaconda3/envs/xesmf_env/lib/python3.7/site-packages/pandas/core/indexes/multi.py", line 2922, in get_loc
loc = self._get_level_indexer(key, level=0)
File "/Users/osamuyuubu/anaconda3/envs/xesmf_env/lib/python3.7/site-packages/pandas/core/indexes/multi.py", line 3204, in _get_level_indexer
idx = self._get_loc_single_level_index(level_index, key)
File "/Users/osamuyuubu/anaconda3/envs/xesmf_env/lib/python3.7/site-packages/pandas/core/indexes/multi.py", line 2855, in _get_loc_single_level_index
return level_index.get_loc(key)
File "/Users/osamuyuubu/anaconda3/envs/xesmf_env/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 29
So, how could I achieve this using xr.drop_sel()?
Thanks in advance!
With drop_sel you need to give the exact value in the index:
da.drop_sel(dayofyear=60)
But for non leap year this would drop the 1st of March.
To drop safely all 29th of Feb, I would probably use something like:
mask = np.logical_and(da.time.dt.is_leap_year, da.time.dt.dayofyear==60)
result = da.where(~mask, drop=True)

ValueError: Length of values (1) does not match length of index index (12797) - Indexes are the same length

So this is driving me crazy now, cause I really don't see the problem.
I have the following code:
dataframe.to_csv(f"user_data/candle_data.csv")
print (dataframe)
st12 = self.supertrend(dataframe, 3, 12)
st12['ST'].to_csv(f"user_data/st12.csv")
print (st12)
print(dataframe.index.difference(st12.index))
dataframe.loc[:, 'st_12'] = st12['ST'],
Checking the csv files and I can see that the first index is 0 and the last index is 12796. The last row is also on line number 12798. This is true for both files.
The output from three print is as follows
date open high low close volume
0 2020-12-29 21:45:00+00:00 723.33 726.14 723.26 725.05 3540.48612
1 2020-12-29 22:00:00+00:00 725.17 728.77 723.78 726.94 3983.90892
2 2020-12-29 22:15:00+00:00 726.94 727.30 724.72 724.75 3166.57435
3 2020-12-29 22:30:00+00:00 724.94 725.99 723.80 725.91 2848.08122
4 2020-12-29 22:45:00+00:00 725.99 730.30 725.95 729.64 6288.69499
... ... ... ... ... ... ...
12792 2021-05-12 03:45:00+00:00 4292.42 4351.85 4292.35 4332.81 24410.30155
12793 2021-05-12 04:00:00+00:00 4332.12 4347.60 4300.07 4343.05 16545.66776
12794 2021-05-12 04:15:00+00:00 4342.84 4348.00 4305.87 4313.82 10048.32828
12795 2021-05-12 04:30:00+00:00 4313.82 4320.68 4273.35 4287.49 13201.88547
12796 2021-05-12 04:45:00+00:00 4287.49 4306.79 4276.87 4300.80 9663.73327
[12797 rows x 6 columns]
ST STX
0 0.000000 nan
1 0.000000 nan
2 0.000000 nan
3 0.000000 nan
4 0.000000 nan
... ... ...
12792 4217.075684 up
12793 4217.075684 up
12794 4217.260609 up
12795 4217.260609 up
12796 4217.260609 up
[12797 rows x 2 columns]
RangeIndex(start=0, stop=0, step=1)
Full Error Traceback:
Traceback (most recent call last):
File "/freqtrade/freqtrade/main.py", line 37, in main
return_code = args['func'](args)
File "/freqtrade/freqtrade/commands/optimize_commands.py", line 53, in start_backtesting
backtesting.start()
File "/freqtrade/freqtrade/optimize/backtesting.py", line 479, in start
min_date, max_date = self.backtest_one_strategy(strat, data, timerange)
File "/freqtrade/freqtrade/optimize/backtesting.py", line 437, in backtest_one_strategy
preprocessed = self.strategy.ohlcvdata_to_dataframe(data)
File "/freqtrade/freqtrade/strategy/interface.py", line 670, in ohlcvdata_to_dataframe
return {pair: self.advise_indicators(pair_data.copy(), {'pair': pair})
File "/freqtrade/freqtrade/strategy/interface.py", line 670, in <dictcomp>
return {pair: self.advise_indicators(pair_data.copy(), {'pair': pair})
File "/freqtrade/freqtrade/strategy/interface.py", line 687, in advise_indicators
return self.populate_indicators(dataframe, metadata)
File "/freqtrade/user_data/strategies/TrippleSuperTrendStrategy.py", line 94, in populate_indicators
dataframe.loc[:, 'st_12'] = st12['ST'],
File "/home/ftuser/.local/lib/python3.9/site-packages/pandas/core/indexing.py", line 692, in __setitem__
iloc._setitem_with_indexer(indexer, value, self.name)
File "/home/ftuser/.local/lib/python3.9/site-packages/pandas/core/indexing.py", line 1597, in _setitem_with_indexer
self.obj[key] = value
File "/home/ftuser/.local/lib/python3.9/site-packages/pandas/core/frame.py", line 3163, in __setitem__
self._set_item(key, value)
File "/home/ftuser/.local/lib/python3.9/site-packages/pandas/core/frame.py", line 3242, in _set_item
value = self._sanitize_column(key, value)
File "/home/ftuser/.local/lib/python3.9/site-packages/pandas/core/frame.py", line 3899, in _sanitize_column
value = sanitize_index(value, self.index)
File "/home/ftuser/.local/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 751, in sanitize_index
raise ValueError(
ValueError: Length of values (1) does not match length of index (12797)
ERROR: 1
So if both data frames have exactly the same amount of rows and the indexes are exactly the same, why am I getting this error?
There is a typo:
dataframe.loc[:, 'st_12'] = st12['ST']
The comma is a typo.

groupby multi colums and change it to dataFrame/array

Hi I have a dataFrame like this:
Value day hour min
Time
2015-12-19 10:08:52 1805 2015-12-19 10 8
2015-12-19 10:09:52 1794 2015-12-19 10 9
2015-12-19 10:19:51 1796 2015-12-19 10 19
2015-12-19 10:20:51 1806 2015-12-19 10 20
2015-12-19 10:29:52 1802 2015-12-19 10 29
2015-12-19 10:30:52 1800 2015-12-19 10 30
2015-12-19 10:40:51 1804 2015-12-19 10 40
2015-12-19 10:41:51 1798 2015-12-19 10 41
2015-12-19 10:50:51 1790 2015-12-19 10 50
2015-12-19 10:51:52 1811 2015-12-19 10 51
2015-12-19 11:00:51 1803 2015-12-19 11 0
2015-12-19 11:01:52 1784 2015-12-19 11 1
... ... ... ... ...
2016-07-15 17:30:13 1811 2016-07-15 17 30
2016-07-15 17:31:13 1787 2016-07-15 17 31
2016-07-15 17:41:13 1800 2016-07-15 17 41
2016-07-15 17:42:13 1795 2016-07-15 17 42
I want to group it by day and hour, and finally make it a multi-dimentional array for the "Value" column like this for example:
based on grouping of day and hour, I need to get each hour something like this:
2015-12-19 10 [1805, 1794, 1796, 1806, 1802, 1800, 1804, 179... ]
2015-12-20 11 [1803, 1793, 1795, 1801, 1796, 1796, 1788, 180... ]
...
2016-07-15 17 [1794, 1792, 1788, 1799, 1811, 1803, 1808, 179... ]
In the end, I wish I can have a dataframe like this:
Time_index hour value1 value2 value3 ........value20
2015-12-19 10 1805, 1794, 1796, 1806 ... 1804, 1791, 1788, 1812
2015-12-20 11 1803, 1793, 1795, 1801 ... 1796, 1796, 1788, 1800
...
2016-07-15 17 1794, 1792, 1788, 1799 ... 1811, 1803, 1808, 1790
OR a array like this:
[[1805, 1794, 1796, 1806, 1802, 1800, 1804, 179... ],[1803, 1793, 1795, 1801, 1796, 1796, 1788, 180... ]....[1794, 1792, 1788, 1799, 1811, 1803, 1808, 179... ]]
I was able to get groupby with one column works:
grouped_0 = train_df.groupby(['day'])
grouped = grouped_0.aggregate(lambda x: list(x))
grouped['grouped'] = grouped['Value']
The output of the dataFrame grouped's 'grouped' column is like:
2015-12-19 [1805, 1794, 1796, 1806, 1802, 1800, 1804, 179...
2015-12-20 [1790, 1809, 1809, 1789, 1807, 1804, 1790, 179...
2015-12-21 [1794, 1792, 1788, 1799, 1811, 1803, 1808, 179...
2015-12-22 [1815, 1812, 1798, 1808, 1802, 1788, 1808, 179...
2015-12-23 [1803, 1800, 1799, 1803, 1802, 1804, 1788, 179...
2015-12-24 [1803, 1795, 1801, 1798, 1799, 1802, 1799, 179...
However, when I tried this:
grouped_0 = train_df.groupby(['day', 'hour'])
grouped = grouped_0.aggregate(lambda x: list(x))
grouped['grouped'] = grouped['Value']
it threw this error:
Traceback (most recent call last):
File "<input>", line 3, in <module>
File "C:\Apps\Continuum\Anaconda2\envs\python36\lib\site-packages\pandas\core\groupby.py", line 4036, in aggregate
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
File "C:\Apps\Continuum\Anaconda2\envs\python36\lib\site-packages\pandas\core\groupby.py", line 3476, in aggregate
return self._python_agg_general(arg, *args, **kwargs)
File "C:\Apps\Continuum\Anaconda2\envs\python36\lib\site-packages\pandas\core\groupby.py", line 848, in _python_agg_general
result, counts = self.grouper.agg_series(obj, f)
File "C:\Apps\Continuum\Anaconda2\envs\python36\lib\site-packages\pandas\core\groupby.py", line 2180, in agg_series
return self._aggregate_series_pure_python(obj, func)
File "C:\Apps\Continuum\Anaconda2\envs\python36\lib\site-packages\pandas\core\groupby.py", line 2215, in _aggregate_series_pure_python
raise ValueError('Function does not reduce')
ValueError: Function does not reduce
my pandas version:
pd.version
'0.20.3'
Yes, using agg for this isn't the best idea, because, unless the result is a container with a single object, the result is not considered valid.
You can use groupby + apply for this.
g = df.groupby(['day', 'hour']).Value.apply(lambda x: x.values.tolist())
g
day hour
2015-12-19 10 [1805, 1794, 1796, 1806, 1802, 1800, 1804, 179...
11 [1803, 1784]
2016-07-15 17 [1811, 1787, 1800, 1795]
Name: Value, dtype: object
If you want each element in its own column, you'd do it like this:
v = pd.DataFrame(g.values.tolist(), index=g.index)\
.rename(columns=lambda x: 'value{}'.format(x + 1)).reset_index()
v is your final result.

Error to find minimum of last column of pandas DataFrame in Python

I'm using read_csv() to read data from external .csv file. It's working fine. But whenever I try to find the minimum of the last column of that dataframe using np.min(...), it's giving lots of errors. But it's interesting that the same procedure is working for the rest of the columns that the dataframe has.
I'm attaching the code here.
import numpy as np
import pandas as pd
import os
data = pd.read_csv("test_data_v4.csv", sep = ",")
print(data)
The output is like below:
LINK_CAPACITY_KBPS THROUGHPUT_KBPS HOP_COUNT PACKET_LOSS JITTER_MS \
0 25 15.0 50 0.25 20
1 20 10.5 70 0.45 3
2 17 12.0 49 0.75 7
3 18 11.0 65 0.30 11
4 14 14.0 55 0.50 33
5 15 8.0 62 0.25 31
RSSI
0 -30
1 -11
2 -26
3 -39
4 -25
5 -65
np.min(data['RSSI'])
Now the error comes:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/koushik_k/anaconda3/lib/python3.5/site-
packages/pandas/core/frame.py", line 1914, in __getitem__
return self._getitem_column(key)
File "/home/koushik_k/anaconda3/lib/python3.5/site-
packages/pandas/core/frame.py", line 1921, in _getitem_column
return self._get_item_cache(key)
File "/home/koushik_k/anaconda3/lib/python3.5/site-
packages/pandas/core/generic.py", line 1090, in _get_item_cache
values = self._data.get(item)
File "/home/koushik_k/anaconda3/lib/python3.5/site-
packages/pandas/core/internals.py", line 3102, in get
loc = self.items.get_loc(item)
File "/home/koushik_k/anaconda3/lib/python3.5/site-
packages/pandas/core/index.py", line 1692, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc
(pandas/index.c:3979)
File "pandas/index.pyx", line 157, in pandas.index.IndexEngine.get_loc
(pandas/index.c:3843)
File "pandas/hashtable.pyx", line 668, in
pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)
File "pandas/hashtable.pyx", line 676, in
pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)
KeyError: 'RSSI'
Following on DSM's comment, try data.columns = data.columns.str.strip()

Plot a data frame

I have a data frame like this:
ReviewDate_month,ProductId,Reviewer
01,185,185
02,155,155
03,130,130
04,111,111
05,110,110
06,98,98
07,101,92
08,71,71
09,73,73
10,76,76
11,105,105
12,189,189
I want to plot it, ReviewDate_Month in X, Product ID and Reviewer in Y ideally. But I will start with 1 line either Product ID or Reviewer.
so i tried:
df_no_monthlycount.plot.line
Got below error msg:
File "C:/Users/user/PycharmProjects/Assign2/Main.py", line 59, in <module>
01 185 185
02 155 155
03 130 130
04 111 111
05 110 110
06 98 98
07 101 92
08 71 71
09 73 73
10 76 76
df_no_monthlycount.plot.line
AttributeError: 'function' object has no attribute 'line'
11 105 105
12 189 189
Process finished with exit code 1
I also tried this:
df_no_monthlycount.plot(x=df_helful_monthlymean['ReviewDate_month'],y=df_helful_monthlymean['ProductId'],style='o')
Error msg like this:
Traceback (most recent call last):
File "C:/Users/user/PycharmProjects/Assign2/Main.py", line 52, in <module>
df_no_monthlycount.plot(x=df_helful_monthlymean['ReviewDate_month'],y=df_helful_monthlymean['ProductId'],style='o')
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 1797, in __getitem__
return self._getitem_column(key)
File "C:\Python34\lib\site-packages\pandas\core\frame.py", line 1804, in _getitem_column
return self._get_item_cache(key)
File "C:\Python34\lib\site-packages\pandas\core\generic.py", line 1084, in _get_item_cache
values = self._data.get(item)
File "C:\Python34\lib\site-packages\pandas\core\internals.py", line 2851, in get
loc = self.items.get_loc(item)
File "C:\Python34\lib\site-packages\pandas\core\index.py", line 1572, in get_loc
return self._engine.get_loc(_values_from_object(key))
File "pandas\index.pyx", line 134, in pandas.index.IndexEngine.get_loc (pandas\index.c:3838)
File "pandas\index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas\index.c:3718)
File "pandas\hashtable.pyx", line 686, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12294)
File "pandas\hashtable.pyx", line 694, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12245)
KeyError: 'ReviewDate_month'
Call the plot as shown below:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
print(df)
df.plot(x ='ReviewDate_month',y=['ProductId', 'Reviewer'] ,kind='line')
plt.show()
Will give you:
If you want to plot ReviewDate_Month in X, Product ID and Reviewer in Y, you can do it this way:
df_no_monthlycount.plot(x='ReviewDate_Month', y=['Product ID', 'Reviewer'])

Categories