dataframe data transfer with selected values to another dataframe - python

My goal is selecting the column Sabah in dataframe prdt and entering every value to repeated rows called Sabah in dataframe prcal
prcal
Vakit Start_Date End_Date Start_Time End_Time
0 Sabah 2022-01-01 2022-01-01 NaN NaN
1 Güneş 2022-01-01 2022-01-01 NaN NaN
2 Öğle 2022-01-01 2022-01-01 NaN NaN
3 İkindi 2022-01-01 2022-01-01 NaN NaN
4 Akşam 2022-01-01 2022-01-01 NaN NaN
..........................................................
2184 Sabah 2022-12-31 2022-12-31 NaN NaN
2185 Güneş 2022-12-31 2022-12-31 NaN NaN
2186 Öğle 2022-12-31 2022-12-31 NaN NaN
2187 İkindi 2022-12-31 2022-12-31 NaN NaN
2188 Akşam 2022-12-31 2022-12-31 NaN NaN
2189 rows × 5 columns
prdt
Day Sabah Güneş Öğle İkindi Akşam Yatsı
0 2022-01-01 06:51:00 08:29:00 13:08:00 15:29:00 17:47:00 19:20:00
1 2022-01-02 06:51:00 08:29:00 13:09:00 15:30:00 17:48:00 19:21:00
2 2022-01-03 06:51:00 08:29:00 13:09:00 15:30:00 17:48:00 19:22:00
3 2022-01-04 06:51:00 08:29:00 13:09:00 15:31:00 17:49:00 19:22:00
4 2022-01-05 06:51:00 08:29:00 13:10:00 15:32:00 17:50:00 19:23:00
...........................................................................
360 2022-12-27 06:49:00 08:27:00 13:06:00 15:25:00 17:43:00 19:16:00
361 2022-12-28 06:50:00 08:28:00 13:06:00 15:26:00 17:43:00 19:17:00
362 2022-12-29 06:50:00 08:28:00 13:07:00 15:26:00 17:44:00 19:18:00
363 2022-12-30 06:50:00 08:28:00 13:07:00 15:27:00 17:45:00 19:18:00
364 2022-12-31 06:50:00 08:28:00 13:07:00 15:28:00 17:46:00 19:19:00
365 rows × 7 columns
Selected every row called sabah prcal.iloc[::6,:]
Made a list for prdt['Sabah'].
When integrating prcal.iloc[::6,:] = prdt['Sabah'][0:365] I get a value error:
ValueError: Must have equal len keys and value when setting with an iterable

Related

Compare current row timestamp with previous row with condition in function python

I have a df sample with one of the columns named date_code, dtype datetime64[ns]:
date_code
2022-03-28
2022-03-29
2022-03-30
2022-03-31
2022-04-01
2022-04-07
2022-04-07
2022-04-08
2022-04-12
2022-04-12
2022-04-14
2022-04-14
2022-04-15
2022-04-16
2022-04-16
2022-04-17
2022-04-18
2022-04-19
2022-04-20
2022-04-20
2022-04-21
2022-04-22
2022-04-25
2022-04-25
2022-04-26
I would like to create a column based on some conditions comparing current row with previous. I trying to create a function like:
def start_date(row):
if (row['date_code'] - row['date_code'].shift(-1)).days >1:
val = row['date_code'].shift(-1)
elif row['date_code'] == row['date_code'].shift(-1):
val = row['date_code']
else:
val = np.nan()
return val
But once I apply
sample['date_zero_recorded'] = sample.apply(start_date, axis=1)
I get error:
AttributeError: 'Timestamp' object has no attribute 'shift'
How I should compare current row with previous with condition?
Edited: expected outoput
if current row more than previous by 2 or more, get previous
if current row equal past, get current
else, return NaN (incl. if current >1 than previous)
date_code date_zero_recorded
2022-03-28 NaN
2022-03-29 NaN
2022-03-30 NaN
2022-03-31 NaN
2022-04-01 NaN
2022-04-07 2022-04-01
2022-04-07 2022-04-07
2022-04-08 NaN
2022-04-12 2022-04-08
2022-04-12 2022-04-12
2022-04-14 2022-04-12
2022-04-14 2022-04-14
2022-04-15 NaN
2022-04-16 NaN
2022-04-16 2022-04-16
2022-04-17 NaN
2022-04-18 NaN
2022-04-19 NaN
2022-04-20 NaN
2022-04-20 2022-04-20
2022-04-21 NaN
2022-04-22 NaN
2022-04-25 2022-04-22
2022-04-25 2022-04-25
2022-04-26 NaN
You shouldn't use iterrows and use vectorial code instead.
For example:
sample['date_code'] = pd.to_datetime(sample['date_code'])
sample['date_zero_recorded'] = (
sample['date_code'].shift()
.where(sample['date_code'].diff().ne('1d'))
)
output:
date_code date_zero_recorded
0 2022-03-28 NaT
1 2022-03-29 NaT
2 2022-03-30 NaT
3 2022-03-31 NaT
4 2022-04-01 NaT
5 2022-04-07 2022-04-01
6 2022-04-07 2022-04-07
7 2022-04-08 NaT
8 2022-04-12 2022-04-08
9 2022-04-12 2022-04-12
10 2022-04-14 2022-04-12
11 2022-04-14 2022-04-14
12 2022-04-15 NaT
13 2022-04-16 NaT
14 2022-04-16 2022-04-16
15 2022-04-17 NaT
16 2022-04-18 NaT
17 2022-04-19 NaT
18 2022-04-20 NaT
19 2022-04-20 2022-04-20
20 2022-04-21 NaT
21 2022-04-22 NaT
22 2022-04-25 2022-04-22
23 2022-04-25 2022-04-25
24 2022-04-26 NaT

Group by column and resampled date and get rolling sum of other column

I have the following data:
(Pdb) df1 = pd.DataFrame({'id': ['SE0000195570','SE0000195570','SE0000195570','SE0000195570','SE0000191827','SE0000191827','SE0000191827','SE0000191827', 'SE0000191827'],'val': ['1','2','3','4','5','6','7','8', '9'],'date': pd.to_datetime(['2014-10-23','2014-07-16','2014-04-29','2014-01-31','2018-10-19','2018-07-11','2018-04-20','2018-02-16','2018-12-29'])})
(Pdb) df1
id val date
0 SE0000195570 1 2014-10-23
1 SE0000195570 2 2014-07-16
2 SE0000195570 3 2014-04-29
3 SE0000195570 4 2014-01-31
4 SE0000191827 5 2018-10-19
5 SE0000191827 6 2018-07-11
6 SE0000191827 7 2018-04-20
7 SE0000191827 8 2018-02-16
8 SE0000191827 9 2018-12-29
UPDATE:
As per the suggestions of #user3483203 I have gotten a bit further but not quite there. I've amended the example data above with a new row to illustrate better.
(Pdb) df2.assign(calc=(df2.dropna()['val'].groupby(level=0).rolling(4).sum().shift(-3).reset_index(0, drop=True)))
id val date calc
id date
SE0000191827 2018-02-28 SE0000191827 8 2018-02-16 26.0
2018-03-31 NaN NaN NaT NaN
2018-04-30 SE0000191827 7 2018-04-20 27.0
2018-05-31 NaN NaN NaT NaN
2018-06-30 NaN NaN NaT NaN
2018-07-31 SE0000191827 6 2018-07-11 NaN
2018-08-31 NaN NaN NaT NaN
2018-09-30 NaN NaN NaT NaN
2018-10-31 SE0000191827 5 2018-10-19 NaN
2018-11-30 NaN NaN NaT NaN
2018-12-31 SE0000191827 9 2018-12-29 NaN
SE0000195570 2014-01-31 SE0000195570 4 2014-01-31 10.0
2014-02-28 NaN NaN NaT NaN
2014-03-31 NaN NaN NaT NaN
2014-04-30 SE0000195570 3 2014-04-29 NaN
2014-05-31 NaN NaN NaT NaN
2014-06-30 NaN NaN NaT NaN
2014-07-31 SE0000195570 2 2014-07-16 NaN
2014-08-31 NaN NaN NaT NaN
2014-09-30 NaN NaN NaT NaN
2014-10-31 SE0000195570 1 2014-10-23 NaN
For my requirements, the row (SE0000191827, 2018-03-31) should have a calc value since it has four consecutive rows with a value. Currently the row is being removed with the dropna call and I can't figure out how to solve that problem.
What I need
Calculations: The dates in my initial data is quarterly dates. However, I need to transform this data into monthly rows ranging between the first and last date of each id and for each month calculate the sum of the four closest consecutive rows of the input data within that id. That's a mouthful. This led me to resample. See expected output below. I need the data to be grouped by both id and the monthly dates.
Performance: The data I'm testing on now is just for benchmarking but I will need the solution to be performant. I'm expecting to run this on upwards of 100k unique ids which may result in around 10 million rows. (100k ids, dates range back up to 10 years, 10years * 12months = 120 months per id, 100k*120 = 12million rows).
What I've tried
(Pdb) res = df.groupby('id').resample('M',on='date')
(Pdb) res.first()
id val date
id date
SE0000191827 2018-02-28 SE0000191827 8 2018-02-16
2018-03-31 NaN NaN NaT
2018-04-30 SE0000191827 7 2018-04-20
2018-05-31 NaN NaN NaT
2018-06-30 NaN NaN NaT
2018-07-31 SE0000191827 6 2018-07-11
2018-08-31 NaN NaN NaT
2018-09-30 NaN NaN NaT
2018-10-31 SE0000191827 5 2018-10-19
SE0000195570 2014-01-31 SE0000195570 4 2014-01-31
2014-02-28 NaN NaN NaT
2014-03-31 NaN NaN NaT
2014-04-30 SE0000195570 3 2014-04-29
2014-05-31 NaN NaN NaT
2014-06-30 NaN NaN NaT
2014-07-31 SE0000195570 2 2014-07-16
2014-08-31 NaN NaN NaT
2014-09-30 NaN NaN NaT
2014-10-31 SE0000195570 1 2014-10-23
This data looks very nice for my case since it's nicely grouped by id and has the dates nicely lined up by month. Here it seems like I could use something like df['val'].rolling(4) and make sure it skips NaN values and put that result in a new column.
Expected output (new column calc):
id val date calc
id date
SE0000191827 2018-02-28 SE0000191827 8 2018-02-16 26
2018-03-31 NaN NaN NaT
2018-04-30 SE0000191827 7 2018-04-20 NaN
2018-05-31 NaN NaN NaT
2018-06-30 NaN NaN NaT
2018-07-31 SE0000191827 6 2018-07-11 NaN
2018-08-31 NaN NaN NaT
2018-09-30 NaN NaN NaT
2018-10-31 SE0000191827 5 2018-10-19 NaN
SE0000195570 2014-01-31 SE0000195570 4 2014-01-31 10
2014-02-28 NaN NaN NaT
2014-03-31 NaN NaN NaT
2014-04-30 SE0000195570 3 2014-04-29 NaN
2014-05-31 NaN NaN NaT
2014-06-30 NaN NaN NaT
2014-07-31 SE0000195570 2 2014-07-16 NaN
2014-08-31 NaN NaN NaT
2014-09-30 NaN NaN NaT
2014-10-31 SE0000195570 1 2014-10-23 NaN
2014-11-30 NaN NaN NaT
2014-12-31 SE0000195570 1 2014-10-23 NaN
Here the result in calc is 26 for the first date since it adds the three preceding (8+7+6+5). The rest for that id is NaN since four values are not available.
The problems
While it may look like the data is grouped by id and date, it seems like it's actually grouped by date. I'm not sure how this works. I need the data to be grouped by id and date.
(Pdb) res['val'].get_group(datetime.date(2018,2,28))
7 6.730000e+08
Name: val, dtype: object
The result of the resample above returns a DatetimeIndexResamplerGroupby which doesn't have rolling...
(Pdb) res['val'].rolling(4)
*** AttributeError: 'DatetimeIndexResamplerGroupby' object has no attribute 'rolling'
What to do? My guess is that my approach is wrong but after scouring the documentation I'm not sure where to start.

Select pandas dataframe rows between dates and set column value

In the dataframe below, I want to set row values in the column p50 to NaN if they are below 2.0 between the dates May 15th and August 15th 2018.
date p50
2018-03-02 2018-03-02 NaN
2018-03-03 2018-03-03 NaN
2018-03-04 2018-03-04 0.022590
2018-03-05 2018-03-05 NaN
2018-03-06 2018-03-06 -0.042227
2018-03-07 2018-03-07 NaN
2018-03-08 2018-03-08 NaN
2018-03-09 2018-03-09 -0.028646
2018-03-10 2018-03-10 NaN
2018-03-11 2018-03-11 -0.045244
2018-03-12 2018-03-12 NaN
2018-03-13 2018-03-13 NaN
2018-03-14 2018-03-14 -0.020590
2018-03-15 2018-03-15 NaN
2018-03-16 2018-03-16 -0.028317
2018-03-17 2018-03-17 NaN
2018-03-18 2018-03-18 NaN
2018-03-19 2018-03-19 NaN
2018-03-20 2018-03-20 NaN
2018-03-21 2018-03-21 NaN
2018-03-22 2018-03-22 NaN
2018-03-23 2018-03-23 NaN
2018-03-24 2018-03-24 -0.066800
2018-03-25 2018-03-25 NaN
2018-03-26 2018-03-26 -0.104135
2018-03-27 2018-03-27 NaN
2018-03-28 2018-03-28 NaN
2018-03-29 2018-03-29 -0.115200
2018-03-30 2018-03-30 NaN
2018-03-31 2018-03-31 -0.000455
... ...
2018-07-03 2018-07-03 NaN
2018-07-04 2018-07-04 2.313035
2018-07-05 2018-07-05 NaN
2018-07-06 2018-07-06 NaN
2018-07-07 2018-07-07 NaN
2018-07-08 2018-07-08 NaN
2018-07-09 2018-07-09 0.054513
2018-07-10 2018-07-10 NaN
2018-07-11 2018-07-11 NaN
2018-07-12 2018-07-12 3.711159
2018-07-13 2018-07-13 NaN
2018-07-14 2018-07-14 6.583810
2018-07-15 2018-07-15 NaN
2018-07-16 2018-07-16 NaN
2018-07-17 2018-07-17 0.070182
2018-07-18 2018-07-18 NaN
2018-07-19 2018-07-19 3.688812
2018-07-20 2018-07-20 NaN
2018-07-21 2018-07-21 NaN
2018-07-22 2018-07-22 0.876552
2018-07-23 2018-07-23 NaN
2018-07-24 2018-07-24 1.077895
2018-07-25 2018-07-25 NaN
2018-07-26 2018-07-26 NaN
2018-07-27 2018-07-27 3.802159
2018-07-28 2018-07-28 NaN
2018-07-29 2018-07-29 0.077402
2018-07-30 2018-07-30 NaN
2018-07-31 2018-07-31 NaN
2018-08-01 2018-08-01 3.202214
The dataframe has a datetime index. I do the foll:
mask = (group['date'] > '2018-5-15') & (group['date'] <= '2018-8-15')
group[mask].loc[group[mask]['p50'] < 2.]['p50'] = np.NaN
However, this does not update the dataframe. How to fix this?
I think you should using .loc like
mask = (group['date'] > '2018-5-15') & (group['date'] <= '2018-8-15')
group.loc[mask&(group['p50'] < 2),'p50']=np.nan

Resample python list with pandas

Fairly new to python and pandas here.
I make a query that's giving me back a timeseries. I'm never sure how many data points I receive from the query (run for a single day), but what I do know is that I need to resample them to contain 24 points (one for each hour in the day).
Printing m3hstream gives
[(1479218009000L, 109), (1479287368000L, 84)]
Then I try to make a dataframe df with
df = pd.DataFrame(data = list(m3hstream), columns=['Timestamp', 'Value'])
and this gives me an output of
Timestamp Value
0 1479218009000 109
1 1479287368000 84
Following I do this
daily_summary = pd.DataFrame()
daily_summary['value'] = df['Value'].resample('H').mean()
daily_summary = daily_summary.truncate(before=start, after=end)
print "Now daily summary"
print daily_summary
But this is giving me a TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
Could anyone please let me know how to resample it so I have 1 point for each hour in the 24 hour period that I'm querying for?
Thanks.
First thing you need to do is convert that 'Timestamp' to an actual pd.Timestamp. It looks like those are milliseconds
Then resample with the on parameter set to 'Timestamp'
df = df.assign(
Timestamp=pd.to_datetime(df.Timestamp, unit='ms')
).resample('H', on='Timestamp').mean().reset_index()
Timestamp Value
0 2016-11-15 13:00:00 109.0
1 2016-11-15 14:00:00 NaN
2 2016-11-15 15:00:00 NaN
3 2016-11-15 16:00:00 NaN
4 2016-11-15 17:00:00 NaN
5 2016-11-15 18:00:00 NaN
6 2016-11-15 19:00:00 NaN
7 2016-11-15 20:00:00 NaN
8 2016-11-15 21:00:00 NaN
9 2016-11-15 22:00:00 NaN
10 2016-11-15 23:00:00 NaN
11 2016-11-16 00:00:00 NaN
12 2016-11-16 01:00:00 NaN
13 2016-11-16 02:00:00 NaN
14 2016-11-16 03:00:00 NaN
15 2016-11-16 04:00:00 NaN
16 2016-11-16 05:00:00 NaN
17 2016-11-16 06:00:00 NaN
18 2016-11-16 07:00:00 NaN
19 2016-11-16 08:00:00 NaN
20 2016-11-16 09:00:00 84.0
If you want to fill those NaN values, use ffill, bfill, or interpolate
df.assign(
Timestamp=pd.to_datetime(df.Timestamp, unit='ms')
).resample('H', on='Timestamp').mean().reset_index().interpolate()
Timestamp Value
0 2016-11-15 13:00:00 109.00
1 2016-11-15 14:00:00 107.75
2 2016-11-15 15:00:00 106.50
3 2016-11-15 16:00:00 105.25
4 2016-11-15 17:00:00 104.00
5 2016-11-15 18:00:00 102.75
6 2016-11-15 19:00:00 101.50
7 2016-11-15 20:00:00 100.25
8 2016-11-15 21:00:00 99.00
9 2016-11-15 22:00:00 97.75
10 2016-11-15 23:00:00 96.50
11 2016-11-16 00:00:00 95.25
12 2016-11-16 01:00:00 94.00
13 2016-11-16 02:00:00 92.75
14 2016-11-16 03:00:00 91.50
15 2016-11-16 04:00:00 90.25
16 2016-11-16 05:00:00 89.00
17 2016-11-16 06:00:00 87.75
18 2016-11-16 07:00:00 86.50
19 2016-11-16 08:00:00 85.25
20 2016-11-16 09:00:00 84.00
Let's try:
daily_summary = daily_summary.set_index('Timestamp')
daily_summary.index = pd.to_datetime(daily_summary.index, unit='ms')
For once an hour:
daily_summary.resample('H').mean()
or for once a day:
daily_summary.resample('D').mean()

AttributeError: Unknown property color_cycle

I am learning 'pandas' and trying to plot id column but I get an error AttributeError: Unknown property color_cycle and empty graph. The graph only appears in interactive shell. When I execute as script I get same error except the graph doesn't appear.
Below is the log:
>>> import pandas as pd
>>> pd.set_option('display.mpl_style', 'default')
>>> df = pd.read_csv('2015.csv', parse_dates=['log_date'])
>>> employee_198 = df[df['employee_id'] == 198]
>>> print(employee_198)
id version company_id early_minutes employee_id late_minutes \
90724 91635 0 1 NaN 198 NaN
90725 91636 0 1 NaN 198 0:20:00
90726 91637 0 1 0:20:00 198 NaN
90727 91638 0 1 0:05:00 198 NaN
90728 91639 0 1 0:25:00 198 NaN
90729 91640 0 1 0:15:00 198 0:20:00
90730 91641 0 1 NaN 198 0:15:00
90731 91642 0 1 NaN 198 NaN
90732 91643 0 1 NaN 198 NaN
90733 91644 0 1 NaN 198 NaN
90734 91645 0 1 NaN 198 NaN
90735 91646 0 1 NaN 198 NaN
90736 91647 0 1 NaN 198 NaN
90737 91648 0 1 NaN 198 NaN
90738 91649 0 1 NaN 198 NaN
90739 91650 0 1 NaN 198 0:10:00
90740 91651 0 1 NaN 198 NaN
90741 91652 0 1 NaN 198 NaN
90742 91653 0 1 NaN 198 NaN
90743 91654 0 1 NaN 198 NaN
90744 91655 0 1 NaN 198 NaN
90745 91656 0 1 NaN 198 NaN
90746 91657 0 1 1:30:00 198 NaN
90747 91658 0 1 0:04:25 198 NaN
90748 91659 0 1 NaN 198 NaN
90749 91660 0 1 NaN 198 NaN
90750 91661 0 1 NaN 198 NaN
90751 91662 0 1 NaN 198 NaN
90752 91663 0 1 NaN 198 NaN
90753 91664 0 1 NaN 198 NaN
90897 91808 0 1 NaN 198 0:04:14
91024 91935 0 1 NaN 198 0:21:43
91151 92062 0 1 NaN 198 0:42:07
91278 92189 0 1 NaN 198 0:16:36
91500 92411 0 1 NaN 198 0:07:12
91532 92443 0 1 NaN 198 NaN
91659 92570 0 1 NaN 198 0:53:03
91786 92697 0 1 NaN 198 NaN
91913 92824 0 1 NaN 198 NaN
92040 92951 0 1 NaN 198 NaN
92121 93032 0 1 4:22:35 198 NaN
92420 93331 0 1 NaN 198 NaN
92421 93332 0 1 NaN 198 3:51:15
log_date log_in_time log_out_time over_time remarks \
90724 2015-11-15 No In No Out NaN [Absent]
90725 2015-10-18 10:00:00 17:40:00 NaN NaN
90726 2015-10-19 9:20:00 17:10:00 NaN NaN
90727 2015-10-25 9:30:00 17:25:00 NaN NaN
90728 2015-10-26 9:34:00 17:05:00 NaN NaN
90729 2015-10-27 10:00:00 17:15:00 NaN NaN
90730 2015-10-28 9:55:00 17:30:00 NaN NaN
90731 2015-10-29 9:40:00 17:30:00 NaN NaN
90732 2015-10-30 9:00:00 17:30:00 0:30:00 NaN
90733 2015-10-20 No In No Out NaN [Absent]
90734 2015-10-21 No In No Out NaN [Maha Asthami]
90735 2015-10-22 No In No Out NaN [Nawami/Dashami]
90736 2015-10-23 No In No Out NaN [Absent]
90737 2015-10-24 No In No Out NaN [Off]
90738 2015-11-01 9:15:00 17:30:00 0:15:00 NaN
90739 2015-11-02 9:50:00 17:30:00 NaN NaN
90740 2015-11-03 9:30:00 17:30:00 NaN NaN
90741 2015-11-04 9:40:00 17:30:00 NaN NaN
90742 2015-11-05 9:38:00 17:30:00 NaN NaN
90743 2015-11-06 9:30:00 17:30:00 NaN NaN
90744 2015-11-08 9:30:00 17:30:00 NaN NaN
90745 2015-11-09 9:30:00 17:30:00 NaN NaN
90746 2015-11-10 9:30:00 16:00:00 NaN NaN
90747 2015-11-16 9:30:00 17:25:35 NaN NaN
90748 2015-11-07 No In No Out NaN [Off]
90749 2015-11-11 No In No Out NaN [Laxmi Puja]
90750 2015-11-12 No In No Out NaN [Govardhan Puja]
90751 2015-11-13 No In No Out NaN [Bhai Tika]
90752 2015-11-14 No In No Out NaN [Off]
90753 2015-10-31 No In No Out NaN [Off]
90897 2015-11-17 9:44:14 17:35:01 NaN NaN
91024 2015-11-18 10:01:43 17:36:29 NaN NaN
91151 2015-11-19 10:22:07 17:43:47 NaN NaN
91278 2015-11-20 9:56:36 17:37:00 NaN NaN
91500 2015-11-22 9:47:12 17:46:44 NaN NaN
91532 2015-11-21 No In No Out NaN [Off]
91659 2015-11-23 10:33:03 17:30:00 NaN NaN
91786 2015-11-24 9:34:11 17:32:24 NaN NaN
91913 2015-11-25 9:36:05 17:35:00 NaN NaN
92040 2015-11-26 9:35:39 17:58:05 0:22:26 NaN
92121 2015-11-27 9:08:45 13:07:25 NaN NaN
92420 2015-11-28 No In No Out NaN [Off]
92421 2015-11-29 13:31:15 17:34:44 NaN NaN
shift_in_time shift_out_time work_time under_time
90724 9:30:00 17:30:00 NaN NaN
90725 9:30:00 17:30:00 7:40:00 0:20:00
90726 9:30:00 17:30:00 7:50:00 0:10:00
90727 9:30:00 17:30:00 7:55:00 0:05:00
90728 9:30:00 17:30:00 7:31:00 0:29:00
90729 9:30:00 17:30:00 7:15:00 0:45:00
90730 9:30:00 17:30:00 7:35:00 0:25:00
90731 9:30:00 17:30:00 7:50:00 0:10:00
90732 9:30:00 17:30:00 8:30:00 NaN
90733 9:30:00 17:30:00 NaN NaN
90734 9:30:00 17:30:00 NaN NaN
90735 9:30:00 17:30:00 NaN NaN
90736 9:30:00 17:30:00 NaN NaN
90737 9:30:00 17:30:00 NaN NaN
90738 9:30:00 17:30:00 8:15:00 NaN
90739 9:30:00 17:30:00 7:40:00 0:20:00
90740 9:30:00 17:30:00 8:00:00 NaN
90741 9:30:00 17:30:00 7:50:00 0:10:00
90742 9:30:00 17:30:00 7:52:00 0:08:00
90743 9:30:00 17:30:00 8:00:00 NaN
90744 9:30:00 17:30:00 8:00:00 NaN
90745 9:30:00 17:30:00 8:00:00 NaN
90746 9:30:00 17:30:00 6:30:00 1:30:00
90747 9:30:00 17:30:00 7:55:35 0:04:25
90748 9:30:00 17:30:00 NaN NaN
90749 9:30:00 17:30:00 NaN NaN
90750 9:30:00 17:30:00 NaN NaN
90751 9:30:00 17:30:00 NaN NaN
90752 9:30:00 17:30:00 NaN NaN
90753 9:30:00 17:30:00 NaN NaN
90897 9:30:00 17:30:00 7:50:47 0:09:13
91024 9:30:00 17:30:00 7:34:46 0:25:14
91151 9:30:00 17:30:00 7:21:40 0:38:20
91278 9:30:00 17:30:00 7:40:24 0:19:36
91500 9:30:00 17:30:00 7:59:32 0:00:28
91532 9:30:00 17:30:00 NaN NaN
91659 9:30:00 17:30:00 6:56:57 1:03:03
91786 9:30:00 17:30:00 7:58:13 0:01:47
91913 9:30:00 17:30:00 7:58:55 0:01:05
92040 9:30:00 17:30:00 8:22:26 NaN
92121 9:30:00 17:30:00 3:58:40 4:01:20
92420 9:30:00 17:30:00 NaN NaN
92421 9:30:00 17:30:00 4:03:29 3:56:31
>>> employee_198['id'].plot()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\pandas\tools\plotting.py", line 3497, in __call__
**kwds)
File "C:\Python27\lib\site-packages\pandas\tools\plotting.py", line 2587, in plot_series
**kwds)
File "C:\Python27\lib\site-packages\pandas\tools\plotting.py", line 2384, in _plot
plot_obj.generate()
File "C:\Python27\lib\site-packages\pandas\tools\plotting.py", line 987, in generate
self._make_plot()
File "C:\Python27\lib\site-packages\pandas\tools\plotting.py", line 1664, in _make_plot
**kwds)
File "C:\Python27\lib\site-packages\pandas\tools\plotting.py", line 1678, in _plot
lines = MPLPlot._plot(ax, x, y_values, style=style, **kwds)
File "C:\Python27\lib\site-packages\pandas\tools\plotting.py", line 1300, in _plot
return ax.plot(*args, **kwds)
File "C:\Python27\lib\site-packages\matplotlib\__init__.py", line 1811, in inner
return func(ax, *args, **kwargs)
File "C:\Python27\lib\site-packages\matplotlib\axes\_axes.py", line 1427, in plot
for line in self._get_lines(*args, **kwargs):
File "C:\Python27\lib\site-packages\matplotlib\axes\_base.py", line 386, in _grab_next_args
for seg in self._plot_args(remaining, kwargs):
File "C:\Python27\lib\site-packages\matplotlib\axes\_base.py", line 374, in _plot_args
seg = func(x[:, j % ncx], y[:, j % ncy], kw, kwargs)
File "C:\Python27\lib\site-packages\matplotlib\axes\_base.py", line 280, in _makeline
seg = mlines.Line2D(x, y, **kw)
File "C:\Python27\lib\site-packages\matplotlib\lines.py", line 366, in __init__
self.update(kwargs)
File "C:\Python27\lib\site-packages\matplotlib\artist.py", line 856, in update
raise AttributeError('Unknown property %s' % k)
AttributeError: Unknown property color_cycle
>>>
There's currently a bug in Pandas 0.17.1 with Matplotlib 1.5.0
print pandas.__version__
print matplotlib.__version__
Instead of using
import pandas as pd
pd.set_option('display.mpl_style', 'default')
Use:
import matplotlib
matplotlib.style.use('ggplot')

Categories