Creating repeating values in a pandas Dataframe

Creating repeating values in a pandas Dataframe - python

I have 3 lists -
Name = ["ABC", "DEF", "GHI"]
Year = [2016,2017]
Month = ["Aug","Jul","Jun"]
I want to create a dataframe from these lists as follows -
df -
Name Year Month
ABC 2016 Aug
ABC 2016 Jul
ABC 2016 Jun
ABC 2017 Aug
ABC 2017 Jul
ABC 2017 Jun
DEF 2016 Aug
DEF 2016 Jul
DEF 2016 Jun
DEF 2017 Aug
DEF 2017 Jul
DEF 2017 Jun
..... and so on
for all values in the lists. Is there any method in python(pandas or numpy or scipy) to perform this? Or is looping the only way to perform this?

Use itertools.product:
pd.DataFrame(list(itertools.product(Name, Year, Month)),
columns=['Name', 'Year', 'Month'])
Name Year Month
0 ABC 2016 Aug
1 ABC 2016 Jul
2 ABC 2016 Jun
3 ABC 2017 Aug
4 ABC 2017 Jul
5 ABC 2017 Jun
6 DEF 2016 Aug
7 DEF 2016 Jul
8 DEF 2016 Jun
9 DEF 2017 Aug
10 DEF 2017 Jul
11 DEF 2017 Jun
12 GHI 2016 Aug
13 GHI 2016 Jul
14 GHI 2016 Jun
15 GHI 2017 Aug
16 GHI 2017 Jul
17 GHI 2017 Jun
If you want a fast numpy cartesian product, I'd suggest looking at
Numpy: cartesian product of x and y array points into single array of 2D points
Substituting product for a numpy alternative should be simple. All that's left to do is to call the pd.DataFrame constructor.

Related

How to split one row into multiple and apply datetime on dataframe column?

I have one dataframe which looks like below:
Date_1 Date_2
0 5 Dec 2017 5 Dec 2017
1 14 Dec 2017 14 Dec 2017
2 15 Dec 2017 15 Dec 2017
3 18 Dec 2017 21 Dec 2017 18 Dec 2017 21 Dec 2017
4 22 Dec 2017 22 Dec 2017
Conditions to be checked:
Want to check if any row contains two dates or not like 3rd row. If present split them into two separate rows.
Apply the datetime on both columns.
I am trying to do the same operation like below:
df['Date_1'] = pd.to_datetime(df['Date_1'], format='%d %b %Y')
But getting below error:
ValueError: unconverted data remains:
Expected Output:
Date_1 Date_2
0 5 Dec 2017 5 Dec 2017
1 14 Dec 2017 14 Dec 2017
2 15 Dec 2017 15 Dec 2017
3 18 Dec 2017 18 Dec 2017
4 21 Dec 2017 21 Dec 2017
5 22 Dec 2017 22 Dec 2017

After using regex with findall get the you date , your problem become a unnesting problem
s=df.apply(lambda x : x.str.findall(r'((?:\d{,2}\s)?(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)[a-z]*(?:-|\.|\s|,)\s?\d{,2}[a-z]*(?:-|,|\s)?\s?\d{,4})'))
unnesting(s,['Date_1','Date_2']).apply(pd.to_datetime)
Out[82]:
Date_1 Date_2
0 2017-12-05 2017-12-05
1 2017-12-14 2017-12-14
2 2017-12-15 2017-12-15
3 2017-12-18 2017-12-18
3 2017-12-21 2017-12-21
4 2017-12-22 2017-12-22

Add column of repeating sequential values

I have a dataframe that contains stacked monthly values and looks like:
Value Month
0 0.09187 Jan
1 0.72878 Feb
2 0.92052 Mar
3 -1.86845 Apr
4 -1.16489 May
5 -0.61433 Jun
6 0.68008 Jul
7 -1.50555 Aug
8 -0.18985 Sep
9 -1.11380 Oct
10 -0.63838 Nov
11 0.37527 Dec
12 0.234216 Jan
I would like to add a column of years, using a known range, so that the df looks like:
Value Month Year
0 0.09187 Jan 1950
1 0.72878 Feb 1950
2 0.92052 Mar 1950
3 -1.86845 Apr 1950
4 -1.16489 May 1950
5 -0.61433 Jun 1950
6 0.68008 Jul 1950
7 -1.50555 Aug 1950
8 -0.18985 Sep 1950
9 -1.11380 Oct 1950
10 -0.63838 Nov 1950
11 0.37527 Dec 1950
12 0.234216 Jan 1951
I tried initializing a years list to apply to the column as:
years = list(range(1950, 2000)
df['Year'] = years * 12
But this produced
Value Month Year
0 0.09187 Jan 1950
1 0.72878 Feb 1951
2 0.92052 Mar 1952
And so on. I've been unable to come up with any other approach

As long as you know that you have Jan data for all your years, you could do:
df['Year'] = df['Month'].eq('Jan').cumsum()+1949
>>> df
Value Month Year
0 0.091870 Jan 1950
1 0.728780 Feb 1950
2 0.920520 Mar 1950
3 -1.868450 Apr 1950
4 -1.164890 May 1950
5 -0.614330 Jun 1950
6 0.680080 Jul 1950
7 -1.505550 Aug 1950
8 -0.189850 Sep 1950
9 -1.113800 Oct 1950
10 -0.638380 Nov 1950
11 0.375270 Dec 1950
12 0.234216 Jan 1951
Or, you could follow your original logic, but use np.repeat:
import numpy as np
years = list(range(1950, 2000))
df['Year'] = np.repeat(years,12)
Or another alternative:
df['Year'] = pd.date_range('1950-01-01',periods=len(df),freq='m').year

Change date format in pandas dataframe

I have this dataframe:
date value
1 Thu 17th Nov 2016 385.943800
2 Fri 18th Nov 2016 1074.160340
3 Sat 19th Nov 2016 2980.857860
4 Sun 20th Nov 2016 1919.723960
5 Mon 21st Nov 2016 884.279340
6 Tue 22nd Nov 2016 869.071070
7 Wed 23rd Nov 2016 760.289260
8 Thu 24th Nov 2016 2481.689270
9 Fri 25th Nov 2016 2745.990070
10 Sat 26th Nov 2016 2273.413250
11 Sun 27th Nov 2016 2630.414900
12 Mon 28th Nov 2016 817.322310
13 Tue 29th Nov 2016 1766.876030
14 Wed 30th Nov 2016 469.388420
I would like to change the format of the date column to this format YYYY-MM-DD. The dataframe consists of more than 200 rows, and every day new rows will be added, so I need to find a way to do this automatically.
This link is not helping because it sets the dates like this dates = ['30th November 2009', '31st March 2010', '30th September 2010'] and I can't do it for every row. Anyone knows a way to solve this?

Dateutil will do this job.
from dateutil import parser
print df
df2 = df.copy()
df2.date = df2.date.apply(lambda x: parser.parse(x))
df2
Output:

comparing date using dd/mm/yy format in python

I want to write a program where i can compare current date with couple of dates that i have.
my data is
12 JUN 2016
21 MAR 1989
15 MAR 1958
15 SEP 1958
23 OCT 1930
15 SEP 1928
10 MAR 2010
23 JAN 1928
15 NOV 1925
26 AUG 2009
29 APR 1987
20 JUL 1962
10 MAY 1960
13 FEB 1955
10 MAR 1956
3 MAR 2010
14 NOV 1958
4 AUG 1985
24 AUG 1956
15 FEB 1955
19 MAY 1987
30 APR 1990
8 SEP 2014
18 JAN 2012
14 DEC 1960
1 AUG 1998
7 SEP 1963
9 MAR 2012
1 MAY 1990
14 MAY 1985
15 JUN 1945
5 APR 1995
26 FEB 1987
13 DEC 1983
15 AUG 2009
16 SEP 1980
16 JAN 2005
19 JUN 2011
Now how can i compare this to current date to know that date is not exceeding current date ( i.e 13/JUN/2016).
please help me! Thank you.

You have to create a datetime object using the string data. You can create the object by parsing the date string using strptime method.
from datetime import datetime
mydate = datetime.strptime("19 JUN 2011", "%d %b %Y")
And then use the object to compare it with today's date.
print mydate < datetime.today()
True

Python list index error - out of range, basically I want to loop through each element in all 3 lists

Python list index error - out of range, basically I want to loop through each element in all 3 lists
day=[15,27,3]
month=['Jan','Dec','Jun']
year=[2013,2002,2010]
for d,m,y in [day,month,year]:
myDatefunction(d,m,y)

As you will have seen, you are iterating the wrong way across those values:
>>> for d, m, y in [day, month, year]:
print d, m, y
15 27 3
Jan Dec Jun
2013 2002 2010
This is because, on each iteration, you are unpacking a single source list to d, m and y - this would fail, except that you have as many source lists as they have elements in them. To transpose the lists, use zip:
>>> for d, m, y in zip(day, month, year):
print d, m, y
15 Jan 2013
27 Dec 2002
3 Jun 2010
If you want all combinations, you can do this simply and efficiently with itertools.product:
>>> from itertools import product
>>> for d, m, y in product(day, month, year):
print d, m, y
15 Jan 2013
15 Jan 2002
15 Jan 2010
15 Dec 2013
15 Dec 2002
15 Dec 2010
15 Jun 2013
15 Jun 2002
15 Jun 2010
27 Jan 2013
27 Jan 2002
27 Jan 2010
27 Dec 2013
27 Dec 2002
27 Dec 2010
27 Jun 2013
27 Jun 2002
27 Jun 2010
3 Jan 2013
3 Jan 2002
3 Jan 2010
3 Dec 2013
3 Dec 2002
3 Dec 2010
3 Jun 2013
3 Jun 2002
3 Jun 2010

This will iterate over all elements in all lists if you rename the lists by adding (s) to their names :)
for year in years:
for month in months:
for day in days:
zip() requires the length of all lists to be the same which is likely not the case here so a plain loop in loop is the way to go.

If you want to combine the entries use zip.
>>> day=[15,27,3]
>>> month=['Jan','Dec','Jun']
>>> year=[2013,2002,2010]
>>> for d, m, y in zip(day, month, year):
print d, m, y
15 Jan 2013
27 Dec 2002
3 Jun 2010

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating repeating values in a pandas Dataframe - python

Related

How to split one row into multiple and apply datetime on dataframe column?

Add column of repeating sequential values

Change date format in pandas dataframe

comparing date using dd/mm/yy format in python

Python list index error - out of range, basically I want to loop through each element in all 3 lists

Categories

Resources