dates seem to be a tricky thing in python, and I am having a lot of trouble simply stripping the date out of the pandas TimeStamp. I would like to get from 2013-09-29 02:34:44 to simply 09-29-2013
I have a dataframe with a column Created_date:
Name: Created_Date, Length: 1162549, dtype: datetime64[ns]`
I have tried applying the .date() method on this Series, eg: df.Created_Date.date(), but I get the error AttributeError: 'Series' object has no attribute 'date'
Can someone help me out?
map over the elements:
In [239]: from operator import methodcaller
In [240]: s = Series(date_range(Timestamp('now'), periods=2))
In [241]: s
Out[241]:
0 2013-10-01 00:24:16
1 2013-10-02 00:24:16
dtype: datetime64[ns]
In [238]: s.map(lambda x: x.strftime('%d-%m-%Y'))
Out[238]:
0 01-10-2013
1 02-10-2013
dtype: object
In [242]: s.map(methodcaller('strftime', '%d-%m-%Y'))
Out[242]:
0 01-10-2013
1 02-10-2013
dtype: object
You can get the raw datetime.date objects by calling the date() method of the Timestamp elements that make up the Series:
In [249]: s.map(methodcaller('date'))
Out[249]:
0 2013-10-01
1 2013-10-02
dtype: object
In [250]: s.map(methodcaller('date')).values
Out[250]:
array([datetime.date(2013, 10, 1), datetime.date(2013, 10, 2)], dtype=object)
Yet another way you can do this is by calling the unbound Timestamp.date method:
In [273]: s.map(Timestamp.date)
Out[273]:
0 2013-10-01
1 2013-10-02
dtype: object
This method is the fastest, and IMHO the most readable. Timestamp is accessible in the top-level pandas module, like so: pandas.Timestamp. I've imported it directly for expository purposes.
The date attribute of DatetimeIndex objects does something similar, but returns a numpy object array instead:
In [243]: index = DatetimeIndex(s)
In [244]: index
Out[244]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-10-01 00:24:16, 2013-10-02 00:24:16]
Length: 2, Freq: None, Timezone: None
In [246]: index.date
Out[246]:
array([datetime.date(2013, 10, 1), datetime.date(2013, 10, 2)], dtype=object)
For larger datetime64[ns] Series objects, calling Timestamp.date is faster than operator.methodcaller which is slightly faster than a lambda:
In [263]: f = methodcaller('date')
In [264]: flam = lambda x: x.date()
In [265]: fmeth = Timestamp.date
In [266]: s2 = Series(date_range('20010101', periods=1000000, freq='T'))
In [267]: s2
Out[267]:
0 2001-01-01 00:00:00
1 2001-01-01 00:01:00
2 2001-01-01 00:02:00
3 2001-01-01 00:03:00
4 2001-01-01 00:04:00
5 2001-01-01 00:05:00
6 2001-01-01 00:06:00
7 2001-01-01 00:07:00
8 2001-01-01 00:08:00
9 2001-01-01 00:09:00
10 2001-01-01 00:10:00
11 2001-01-01 00:11:00
12 2001-01-01 00:12:00
13 2001-01-01 00:13:00
14 2001-01-01 00:14:00
...
999985 2002-11-26 10:25:00
999986 2002-11-26 10:26:00
999987 2002-11-26 10:27:00
999988 2002-11-26 10:28:00
999989 2002-11-26 10:29:00
999990 2002-11-26 10:30:00
999991 2002-11-26 10:31:00
999992 2002-11-26 10:32:00
999993 2002-11-26 10:33:00
999994 2002-11-26 10:34:00
999995 2002-11-26 10:35:00
999996 2002-11-26 10:36:00
999997 2002-11-26 10:37:00
999998 2002-11-26 10:38:00
999999 2002-11-26 10:39:00
Length: 1000000, dtype: datetime64[ns]
In [269]: timeit s2.map(f)
1 loops, best of 3: 1.04 s per loop
In [270]: timeit s2.map(flam)
1 loops, best of 3: 1.1 s per loop
In [271]: timeit s2.map(fmeth)
1 loops, best of 3: 968 ms per loop
Keep in mind that one of the goals of pandas is to provide a layer on top of numpy so that (most of the time) you don't have to deal with the low level details of the ndarray. So getting the raw datetime.date objects in an array is of limited use since they don't correspond to any numpy.dtype that is supported by pandas (pandas only supports datetime64[ns] [that's nanoseconds] dtypes). That said, sometimes you need to do this.
Maybe this only came in recently, but there are built-in methods for this. Try:
In [27]: s = pd.Series(pd.date_range(pd.Timestamp('now'), periods=2))
In [28]: s
Out[28]:
0 2016-02-11 19:11:43.386016
1 2016-02-12 19:11:43.386016
dtype: datetime64[ns]
In [29]: s.dt.to_pydatetime()
Out[29]:
array([datetime.datetime(2016, 2, 11, 19, 11, 43, 386016),
datetime.datetime(2016, 2, 12, 19, 11, 43, 386016)], dtype=object)
You can try using .dt.date on datetime64[ns] of the dataframe.
For e.g. df['Created_date'] = df['Created_date'].dt.date
Input dataframe named as test_df:
print(test_df)
Result:
Created_date
0 2015-03-04 15:39:16
1 2015-03-22 17:36:49
2 2015-03-25 22:08:45
3 2015-03-16 13:45:20
4 2015-03-19 18:53:50
Checking dtypes:
print(test_df.dtypes)
Result:
Created_date datetime64[ns]
dtype: object
Extracting date and updating Created_date column:
test_df['Created_date'] = test_df['Created_date'].dt.date
print(test_df)
Result:
Created_date
0 2015-03-04
1 2015-03-22
2 2015-03-25
3 2015-03-16
4 2015-03-19
well I would do this way.
pdTime =pd.date_range(timeStamp, periods=len(years), freq="D")
pdTime[i].strftime('%m-%d-%Y')
Related
I have a data frame created with Pandas. It has 3 columns. One of them has the date in the format %Y%m%d%H. I need to find the rows that match a date with the format %Y%m%d.
I tried
df.loc[df["MESS_DATUM"] == 20170807]
which doesn't work. Only when I do
df.loc[df["MESS_DATUM"] == 2017080723]
it works for that single line. But I need the other lines containing the date only (without the hour). I know there is something like .str.cotains(""). Is there something similar for numeric values or a way to use wildcards in the lines above?
We can "integer divide" MESS_DATUM column by 100:
df.loc[df["MESS_DATUM"]//100 == 20170807]
Demo:
In [29]: df
Out[29]:
MESS_DATUM
0 2017080719
1 2017080720
2 2017080721
3 2017080722
4 2017080723
In [30]: df.dtypes
Out[30]:
MESS_DATUM int64
dtype: object
In [31]: df["MESS_DATUM"]//100
Out[31]:
0 20170807
1 20170807
2 20170807
3 20170807
4 20170807
Name: MESS_DATUM, dtype: int64
But I would consider converting it to datetime dtype:
df["MESS_DATUM"] = pd.to_datetime(df["MESS_DATUM"].astype(str), format='%Y%m%d%H')
If df["MESS_DATUM"] is of float dtype, then we can use the following trick:
In [41]: pd.to_datetime(df["MESS_DATUM"].astype(str).str.split('.').str[0],
format='%Y%m%d%H')
Out[41]:
0 2017-08-07 19:00:00
1 2017-08-07 20:00:00
2 2017-08-07 21:00:00
3 2017-08-07 22:00:00
4 2017-08-07 23:00:00
Name: MESS_DATUM, dtype: datetime64[ns]
I'm trying to generate the Last_Payment_Date field in my pandas dataframe, and would need to find the closest Payment_Date before the given Order_Date for each customer (i.e. groupby).
Payment_Date will always occur after Order_Date, but may take different periods of time, which is difficult to use sorting and shift to find the nearest date.
Masking seems like a possible way but I've not been able to figure a way on how to use it.
Appreciate all the help I could get!
Cust_No Order_Date Payment_Date Last_Payment_Date
A 5/8/2014 6/8/2014 Nat
B 6/8/2014 1/5/2015 Nat
B 7/8/2014 7/8/2014 Nat
A 8/8/2014 1/5/2015 6/8/2014
A 9/8/2014 10/8/2014 6/8/2014
A 10/11/2014 12/11/2014 10/8/2014
B 11/12/2014 1/1/2015 7/8/2014
B 1/2/2015 2/2/2015 1/1/2015
A 2/5/2015 5/5/2015 1/5/2015
B 3/5/2015 4/5/2015 2/2/2015
Series.searchsorted largely does what you want -- it
can be used to find where the Order_Dates fit inside Payment_Dates. In
particular, it returns the ordinal indices corresponding to where each
Order_Date would need to be inserted in order to keep the Payment_Dates
sorted. For example, suppose
In [266]: df['Payment_Date']
Out[266]:
0 2014-06-08
2 2014-07-08
4 2014-10-08
5 2014-12-11
6 2015-01-01
1 2015-01-05
3 2015-01-05
7 2015-02-02
9 2015-04-05
8 2015-05-05
Name: Payment_Date, dtype: datetime64[ns]
In [267]: df['Order_Date']
Out[267]:
0 2014-05-08
2 2014-07-08
4 2014-09-08
5 2014-10-11
6 2014-11-12
1 2014-06-08
3 2014-08-08
7 2015-01-02
9 2015-03-05
8 2015-02-05
Name: Order_Date, dtype: datetime64[ns]
then searchsorted returns
In [268]: df['Payment_Date'].searchsorted(df['Order_Date'])
Out[268]: array([0, 1, 2, 3, 3, 0, 2, 5, 8, 8])
The first value, 0, for example, indicates that the Order_Date, 2014-05-08,
would have to be inserted at ordinal index 0 (before the Payment_Date
2014-06-08) to keep the Payment_Dates in sorted order. The second value, 1,
indicates that the Order_Date, 2014-07-08, would have to be inserted at
ordinal index 1 (after the Payment_Date 2014-06-08 and before 2014-07-08)
to keep the Payment_Dates in sorted order. And so on for the other indices.
Now, of course, there are some complications:
The Payment_Dates need to be in sorted order for searchsorted to return a
meaningful result:
df = df.sort_values(by=['Payment_Date'])
We need to group by the Cust_No
grouped = df.groupby('Cust_No')
We want the index of the Payment_Date which comes before the
Order_Date. Thus, we really need the decrease the index by one:
idx = grp['Payment_Date'].searchsorted(grp['Order_Date'])
result = grp['Payment_Date'].iloc[idx-1]
So that grp['Payment_Date'].iloc[idx-1] would grab the the prior Payment_Date.
When searchsorted returns 0, the Order_Date is less than all
Payment_Dates. We want a NaT in this case.
result[idx == 0] = pd.NaT
So putting it all togther,
import pandas as pd
NaT = pd.NaT
T = pd.Timestamp
df = pd.DataFrame({
'Cust_No': ['A', 'B', 'B', 'A', 'A', 'A', 'B', 'B', 'A', 'B'],
'expected': [
NaT, NaT, NaT, T('2014-06-08'), T('2014-06-08'), T('2014-10-08'),
T('2014-07-08'), T('2015-01-01'), T('2015-01-05'), T('2015-02-02')],
'Order_Date': [
T('2014-05-08'), T('2014-06-08'), T('2014-07-08'), T('2014-08-08'),
T('2014-09-08'), T('2014-10-11'), T('2014-11-12'), T('2015-01-02'),
T('2015-02-05'), T('2015-03-05')],
'Payment_Date': [
T('2014-06-08'), T('2015-01-05'), T('2014-07-08'), T('2015-01-05'),
T('2014-10-08'), T('2014-12-11'), T('2015-01-01'), T('2015-02-02'),
T('2015-05-05'), T('2015-04-05')]})
def last_payment_date(s, df):
grp = df.loc[s.index]
idx = grp['Payment_Date'].searchsorted(grp['Order_Date'])
result = grp['Payment_Date'].iloc[idx-1]
result[idx == 0] = pd.NaT
return result
df = df.sort_values(by=['Payment_Date'])
grouped = df.groupby('Cust_No')
df['Last_Payment_Date'] = grouped['Payment_Date'].transform(last_payment_date, df)
print(df)
yields
Cust_No Order_Date Payment_Date expected Last_Payment_Date
0 A 2014-05-08 2014-06-08 NaT NaT
2 B 2014-07-08 2014-07-08 NaT NaT
4 A 2014-09-08 2014-10-08 2014-06-08 2014-06-08
5 A 2014-10-11 2014-12-11 2014-10-08 2014-10-08
6 B 2014-11-12 2015-01-01 2014-07-08 2014-07-08
1 B 2014-06-08 2015-01-05 NaT NaT
3 A 2014-08-08 2015-01-05 2014-06-08 2014-06-08
7 B 2015-01-02 2015-02-02 2015-01-01 2015-01-01
9 B 2015-03-05 2015-04-05 2015-02-02 2015-02-02
8 A 2015-02-05 2015-05-05 2015-01-05 2015-01-05
I am writing a process which takes a semi-large file as input (~4 million rows, 5 columns)
and performs a few operations on it.
Columns:
- CARD_NO
- ID
- CREATED_DATE
- STATUS
- FLAG2
I need to create a file which contains 1 copy of each CARD_NO where STATUS = '1' and CREATED_DATE is the maximum of all CREATED_DATEs for that CARD_NO.
I succeeded but my solution is very slow (3h and counting as of right now.)
Here is my code:
file = 'input.csv'
input = pd.read_csv(file)
input = input.drop_duplicates()
card_groups = input.groupby('CARD_NO', as_index=False, sort=False).filter(lambda x: x['STATUS'] == 1)
def important(x):
latest_date = x['CREATED_DATE'].values[x['CREATED_DATE'].values.argmax()]
return x[x.CREATED_DATE == latest_date]
#where the major slowdown occurs
group_2 = card_groups.groupby('CARD_NO', as_index=False, sort=False).apply(important)
path = 'result.csv'
group_2.to_csv(path, sep=',', index=False)
# ~4 minutes for the 154k rows file
# 3+ hours for ~4m rows
I was wondering if you had any advice on how to improve the running time of this little process.
Thank you and have a good day.
Setup (FYI make sure that your use parse_dates=True when reading your csv)
In [6]: n_groups = 10000
In [7]: N = 4000000
In [8]: dates = date_range('20130101',periods=100)
In [9]: df = DataFrame(dict(id = np.random.randint(0,n_groups,size=N), status = np.random.randint(0,10,size=N), date=np.random.choice(dates,size=N,replace=True)))
In [10]: pd.set_option('max_rows',10)
In [13]: df = DataFrame(dict(card_no = np.random.randint(0,n_groups,size=N), status = np.random.randint(0,10,size=N), date=np.random.choice(dates,size=N,replace=True)))
In [14]: df
Out[14]:
card_no date status
0 5790 2013-02-11 6
1 6572 2013-03-17 6
2 7764 2013-02-06 3
3 4905 2013-04-01 3
4 3871 2013-04-08 1
... ... ... ...
3999995 1891 2013-02-16 5
3999996 9048 2013-01-11 9
3999997 1443 2013-02-23 1
3999998 2845 2013-01-28 0
3999999 5645 2013-02-05 8
[4000000 rows x 3 columns]
In [15]: df.dtypes
Out[15]:
card_no int64
date datetime64[ns]
status int64
dtype: object
Only status == 1, groupby card_no, then return the max date for that group
In [18]: df[df.status==1].groupby('card_no')['date'].max()
Out[18]:
card_no
0 2013-04-06
1 2013-03-30
2 2013-04-09
...
9997 2013-04-07
9998 2013-04-07
9999 2013-04-09
Name: date, Length: 10000, dtype: datetime64[ns]
In [19]: %timeit df[df.status==1].groupby('card_no')['date'].max()
1 loops, best of 3: 934 ms per loop
If you need a transform of this (e.g. the same values for each group. Note that with < 0.14.1 (releasing this week) you will need to use this soln here, otherwise this will be pretty slow)
In [20]: df[df.status==1].groupby('card_no')['date'].transform('max')
Out[20]:
4 2013-04-10
13 2013-04-10
25 2013-04-10
...
3999973 2013-04-10
3999979 2013-04-10
3999997 2013-04-09
Name: date, Length: 399724, dtype: datetime64[ns]
In [21]: %timeit df[df.status==1].groupby('card_no')['date'].transform('max')
1 loops, best of 3: 1.8 s per loop
I suspect you prob want to merge the final transform back into the original frame
In [24]: df.join(res.to_frame('max_date'))
Out[24]:
card_no date status max_date
0 5790 2013-02-11 6 NaT
1 6572 2013-03-17 6 NaT
2 7764 2013-02-06 3 NaT
3 4905 2013-04-01 3 NaT
4 3871 2013-04-08 1 2013-04-10
... ... ... ... ...
3999995 1891 2013-02-16 5 NaT
3999996 9048 2013-01-11 9 NaT
3999997 1443 2013-02-23 1 2013-04-09
3999998 2845 2013-01-28 0 NaT
3999999 5645 2013-02-05 8 NaT
[4000000 rows x 4 columns]
In [25]: %timeit df.join(res.to_frame('max_date'))
10 loops, best of 3: 58.8 ms per loop
The csv writing will actually take a fair amount of time relative to this. I used HDF5 for things like this, MUCH faster.
I use pandas to import a csv file (about a million rows, 5 columns) that contains one column of timestamps (increasing row-by-row) in the format Hour:Min:Sec.Millsecs, e.g.
11:52:55.162
and some other columns with floats. I need to transform the timestamp column into floats (say in seconds). So far I'm using
pandas.read_csv
to get a dataframe df and then transform it into a numpy array
df=np.array(df)
All the above works great and is quite fast. However, then I use datetime.strptime (the 0th columns are the timestamps)
df[:,0]=[(datetime.strptime(str(d),'%H:%M:%S.%f')).total_seconds() for d in df[:,0]]
to transform the timestamps into seconds and unfortunately this turns out to be veryyyy slow . It's not the iteration over all the rows that so slow but
datetime.strptime
is the bottleneck. Is there a better way to do it?
Here, using timedeltas
Create a sample series
In [21]: s = pd.to_timedelta(np.arange(100000),unit='s')
In [22]: s
Out[22]:
0 00:00:00
1 00:00:01
2 00:00:02
3 00:00:03
4 00:00:04
5 00:00:05
6 00:00:06
7 00:00:07
8 00:00:08
9 00:00:09
10 00:00:10
11 00:00:11
12 00:00:12
13 00:00:13
14 00:00:14
...
99985 1 days, 03:46:25
99986 1 days, 03:46:26
99987 1 days, 03:46:27
99988 1 days, 03:46:28
99989 1 days, 03:46:29
99990 1 days, 03:46:30
99991 1 days, 03:46:31
99992 1 days, 03:46:32
99993 1 days, 03:46:33
99994 1 days, 03:46:34
99995 1 days, 03:46:35
99996 1 days, 03:46:36
99997 1 days, 03:46:37
99998 1 days, 03:46:38
99999 1 days, 03:46:39
Length: 100000, dtype: timedelta64[ns]
Convert to string for testing purposes
In [23]: t = s.apply(pd.tslib.repr_timedelta64)
These are strings
In [24]: t.iloc[-1]
Out[24]: '1 days, 03:46:39'
Dividing by a timedelta64 converts this to seconds
In [25]: pd.to_timedelta(t.iloc[-1])/np.timedelta64(1,'s')
Out[25]: 99999.0
This is currently matching using a reg-ex, so not very fast from a string directly.
In [27]: %timeit pd.to_timedelta(t)/np.timedelta64(1,'s')
1 loops, best of 3: 1.84 s per loop
This is a date-timestamp based soln
Since date times are already stored as int64's this is very easy an fast
Create a sample series
In [7]: s = Series(date_range('20130101',periods=1000,freq='ms'))
In [8]: s
Out[8]:
0 2013-01-01 00:00:00
1 2013-01-01 00:00:00.001000
2 2013-01-01 00:00:00.002000
3 2013-01-01 00:00:00.003000
4 2013-01-01 00:00:00.004000
5 2013-01-01 00:00:00.005000
6 2013-01-01 00:00:00.006000
7 2013-01-01 00:00:00.007000
8 2013-01-01 00:00:00.008000
9 2013-01-01 00:00:00.009000
10 2013-01-01 00:00:00.010000
11 2013-01-01 00:00:00.011000
12 2013-01-01 00:00:00.012000
13 2013-01-01 00:00:00.013000
14 2013-01-01 00:00:00.014000
...
985 2013-01-01 00:00:00.985000
986 2013-01-01 00:00:00.986000
987 2013-01-01 00:00:00.987000
988 2013-01-01 00:00:00.988000
989 2013-01-01 00:00:00.989000
990 2013-01-01 00:00:00.990000
991 2013-01-01 00:00:00.991000
992 2013-01-01 00:00:00.992000
993 2013-01-01 00:00:00.993000
994 2013-01-01 00:00:00.994000
995 2013-01-01 00:00:00.995000
996 2013-01-01 00:00:00.996000
997 2013-01-01 00:00:00.997000
998 2013-01-01 00:00:00.998000
999 2013-01-01 00:00:00.999000
Length: 1000, dtype: datetime64[ns]
Convert to ns since epoch / divide to get ms since epoch (if you want seconds,
divide by 10**9)
In [9]: pd.DatetimeIndex(s).asi8/10**6
Out[9]:
array([1356998400000, 1356998400001, 1356998400002, 1356998400003,
1356998400004, 1356998400005, 1356998400006, 1356998400007,
1356998400008, 1356998400009, 1356998400010, 1356998400011,
...
1356998400992, 1356998400993, 1356998400994, 1356998400995,
1356998400996, 1356998400997, 1356998400998, 1356998400999])
Pretty fast
In [12]: s = Series(date_range('20130101',periods=1000000,freq='ms'))
In [13]: %timeit pd.DatetimeIndex(s).asi8/10**6
100 loops, best of 3: 11 ms per loop
I'm guessing that the datetime object has a lot of overhead - it may be easier to do it by hand:
def to_seconds(s):
hr, min, sec = [float(x) for x in s.split(':')]
return hr*3600 + min*60 + sec
Using sum(), and enumerate() -
>>> ts = '11:52:55.162'
>>> ts1 = map(float, ts.split(':'))
>>> ts1
[11.0, 52.0, 55.162]
>>> ts2 = [60**(2-i)*n for i, n in enumerate(ts1)]
>>> ts2
[39600.0, 3120.0, 55.162]
>>> ts3 = sum(ts2)
>>> ts3
42775.162
>>> seconds = sum(60**(2-i)*n for i, n in enumerate(map(float, ts.split(':'))))
>>> seconds
42775.162
>>>
I'm constructing a dictionary using a dictionary comprehension which has read_csv embedded within it. This constructs the dictionary fine, but when I then push it into a DataFrame all of my data goes to null and the dates get very wacky as well. Here's sample code and output:
In [129]: a= {x.split(".")[0] : read_csv(x, parse_dates=True, index_col=[0])["Settle"] for x in t[:2]}
In [130]: a
Out[130]:
{'SPH2010': Date
2010-03-19 1172.95
2010-03-18 1166.10
2010-03-17 1165.70
2010-03-16 1159.50
2010-03-15 1150.30
2010-03-12 1151.30
2010-03-11 1150.60
2010-03-10 1145.70
2010-03-09 1140.50
2010-03-08 1137.10
2010-03-05 1136.50
2010-03-04 1122.30
2010-03-03 1118.60
2010-03-02 1117.40
2010-03-01 1114.60
...
2008-04-10 1370.4
2008-04-09 1367.7
2008-04-08 1378.7
2008-04-07 1378.4
2008-04-04 1377.8
2008-04-03 1379.9
2008-04-02 1377.7
2008-04-01 1376.6
2008-03-31 1329.1
2008-03-28 1324.0
2008-03-27 1334.7
2008-03-26 1340.7
2008-03-25 1357.0
2008-03-24 1357.3
2008-03-20 1329.8
Name: Settle, Length: 495,
'SPM2011': Date
2011-06-17 1279.4
2011-06-16 1269.0
2011-06-15 1265.4
2011-06-14 1289.9
2011-06-13 1271.6
2011-06-10 1269.2
2011-06-09 1287.4
2011-06-08 1277.0
2011-06-07 1284.8
2011-06-06 1285.0
2011-06-03 1296.3
2011-06-02 1312.4
2011-06-01 1312.1
2011-05-31 1343.9
2011-05-27 1329.9
...
2009-07-10 856.6
2009-07-09 861.2
2009-07-08 856.0
2009-07-07 861.7
2009-07-06 877.9
2009-07-02 875.8
2009-07-01 902.6
2009-06-30 900.3
2009-06-29 908.0
2009-06-26 901.1
2009-06-25 903.8
2009-06-24 885.2
2009-06-23 877.6
2009-06-22 876.0
2009-06-19 903.4
Name: Settle, Length: 497}
In [131]: DataFrame(a)
Out[131]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 806 entries, 2189-09-10 03:33:28.879144 to 1924-01-20 06:06:06.621835
Data columns:
SPH2010 0 non-null values
SPM2011 0 non-null values
dtypes: float64(2)
Thanks!
EDIT:
I've also tried doing this with concat and I get the same results.
You should be able to use concat and unstack. Here's an example:
df1 = pd.Series([1, 2], name='a')
df2 = pd.Series([3, 4], index=[1, 2], name='b')
d = {'A': s1, 'B': s2} # a dict of Series
In [4]: pd.concat(d)
Out[4]:
A 0 1
1 2
B 1 3
2 4
In [5]: pd.concat(d).unstack().T
Out[5]:
A B
0 1 NaN
1 2 3
2 NaN 4