String Slicing in python UDF - python

I am trying to write a UDF in python that will be called from a pig script. The UDF needs to accept the date as a string in DD-MMM-YYYY format and return DD-MM-YYYY format. Here MMM will be like JAN, FEB.. DEC and the return MM will be 01, 02... 12.
Below is my python UDF
#!/usr/bin/python
#outputSchema("newdate:chararray")
def GetMonthMM(inputString):
print inputString
#monthstring = inputString[3:6]
sl = slice(3,6)
monthstring = inputString[sl]
monthdigit = ""
if ( monthstring == "JAN" ):
monthdigit = "01"
elif ( monthstring == "FEB"):
monthdigit = "02"
elif(monthstring == "MAR"):
monthdigit = "03"
elif(monthstring == "APR"):
monthdigit = "04"
elif(monthstring == "MAY"):
monthdigit = "05"
elif (monthstring == "JUN"):
monthdigit = "06"
elif (monthstring == "JUL"):
monthdigit = "07"
elif (monthstring == "AUG"):
monthdigit = "08"
elif (monthstring == "SEP"):
monthdigit = "09"
elif (monthstring == "OCT"):
monthdigit = "10"
elif (monthstring == "NOV"):
monthdigit = "11"
elif (monthstring == "DEC"):
monthdigit = "12"
sl1 = slice(0,3)
sl2 = slice(6,11)
str1 = inputString[sl1]
str2 = inputString[sl2]
newdate = str1 + monthdigit + str2
return monthstring;
I did some debugging and the issue seems to be that after the slicing the strings are being treated as arrays. I get the following error message
TypeError: unsupported operand type(s) for +: 'array.array' and 'str'
The same is happening even when the string is being compared to another string like at if (monthstring == "DEC"):.
Even when monthstring has DEC as value the condition never satisfies.
Has anybody faced the same issue before? Any ideas how to fix this.

Recently I've used the calendar module, might be more useful in different cases, but anyway.
import calendar
m_dict = {}
for i, month in enumerate(calendar.month_abbr[1:]): #for some reason month_abbr[0] = '', so ommit that
m_dict[month.lower()] = '{:02}'.format(i+1)
def GetMonthMM(inputStr):
day, month, year = inputStr.split('-')
return '-'.join([day, m_dict[month.lower()], year])
print(GetMonthMM('01-JAN-2015'))
# prints 01-01-2015

I would write this function as this:
#!/usr/bin/python
#outputSchema("newdate:chararray")
def GetMonthMM(inputString):
monthArray = {'JAN':'01','FEB':'02','MAR':'03','APR':'04','MAY':'05','JUN':'06','JUL':'07','AUG':'08','SEP':'09','OCT':'10','NOV':'11','DEC':'12'}
print inputString
#monthstring = inputString[3:6]
dateparts = string.join(inputString).split('-') #assuming the date is always separated by -
dateparts[1] = monthArray[dateparts[1]]
return dateparts.join('-');

Related

Number not increasing during while loop: Division by zero, calculating factors of weather

I've been having quite a bit of trouble with some code I've been working on and I'm at the point now where I cannot figure out what is making it go wrong. The code in question is
brok = []
Jan = []
Feb = []
Mar = []
Apr = []
May = []
Jun = []
Jul = []
Aug = []
Sep = []
Oct = []
Nov = []
Dec = []
with open('CLLWeatherData.csv', 'r') as inputFile:
for current_line in inputFile:
brok.append(current_line.split(","))
maxe = 0
r=1
precip = 0
for r in range(1,len(brok)):
if int(brok[r][4]) >= maxe:
maxe = int(brok[r][4])
print("3-year maximum temperature:",maxe)
mine = maxe
for r in range(1,len(brok)):
if int(brok[r][4]) <= mine:
mine = int(brok[r][4])
print("3-year minimum temperature:",mine)
for z in range (1, len(brok)):
precip += float(brok[z][2])
avgperc = precip / len(brok)
print("3-year average precipitation:","{:.3f}".format(avgperc))
# print(brok[1][0][:2])
for I in range(len(brok)):
if (brok[I][0][:2]) == "1/":
Jan.append(brok[I])
if (brok[I][0][:2]) == "2/":
Feb.append(brok[I])
if (brok[I][0][:2]) == "3/":
Mar.append(brok[I])
if (brok[I][0][:2]) == "4/":
Apr.append(brok[I])
if (brok[I][0][:2]) == "5/":
May.append(brok[I])
if (brok[I][0][:2]) == "6/":
Jun.append(brok[I])
if (brok[I][0][:2]) == "7/":
Jul.append(brok[I])
if (brok[I][0][:2]) == "8/":
Aug.append(brok[I])
if (brok[I][0][:2]) == "9/":
Sep.append(brok[I])
if (brok[I][0][:2]) == "10":
Oct.append(brok[I])
if (brok[I][0][:2]) == "11":
Nov.append(brok[I])
if (brok[I][0][:2]) == "12":
Dec.append(brok[I])
# print(Oct[0][0][-4:])
month = input("Please enter a month:")
date = int(input("Please enter a year:"))
numdays = 0
meantemp = 0.0
percentdaygood = 0.0
percentdaybad = 0.0
meanprecip = 0
munshort = month[0:3]
print("for",month,str(date) +":")
# print(month[0:3])
for I in range(len(munshort)):
if munshort[I][0] == date :
numdays += 1
print(numdays)
for I in range(numdays):
if int(munshort[I][0]) == date :
meantemp += int(munshort[I][3])
if int(munshort[I][1]) >= 10:
percentdaygood += 1
else: percentdaybad += 1
meanprecip += float(munshort[I][2])
avgtemp = meantemp/numdays
(percentotal) = (percentdaygood/percentdaybad)*100
avgprecip = meanprecip/numdays
print("")
print("Mean daily temperature:")
Currently the problem is within the counter for numdays, as I have no idea as to why it is not increasing inside of the loop, and it ends up erroring out the rest of my program with a divide by zero error.
The data table that this code pulls from
Your code:-
month = input("Please enter a month:")
munshort = month[0:3]
for I in range(len(munshort)):
if munshort[I][0] == date :
numdays += 1
print(numdays)
Errors:-
I suppose you are thinking len(munshort) gives the length of the list Jan if I input 'January' but actually it gives the length of the string 'Jan' which is 3.
Because of the first error munshort[0][0] gives 'J' and 'J' is never equal to date.
Replacement:-
You could try month_list=globals()[f'{munshort}'] which returns the length of the list Jan or other months accordingly.
month_list=globals()[f'{munshort}')
for dates in month_list:
if dates[0] == date :
numdays += 1
print(numdays)

Change time from 12-h to 24-h format

I want to convert a time from 12-h format to 24-h format
This is my code:
def change_time(time):
import datetime as dt
FMT12 = '%H:%M:%S %p'
FMT24 = '%H:%M:%S'
# time is a string
if time.find('PM') != -1: # if exists
t1 = dt.datetime.strptime(time, FMT12)
t2 = dt.datetime.strptime('12:00:00', FMT24)
time_zero = dt.datetime.strptime('00:00:00', FMT24)
return (t1 - time_zero + t2).time()
else:
return dt.datetime.strptime(time, FMT12).time()
This is the output :
print(change_time('09:52:08 PM')) # -> 21:52:08
So, this code is working, but I want a better version of it.
Here is a much faster working method:
from datetime import datetime
def change_time(time):
in_time = datetime.strptime(time, "%I:%M:%S %p")
new_time = datetime.strftime(in_time, "%H:%M:%S")
print(new_time)
change_time('09:52:08 PM')
Output:
>>> 21:52:08
def t12_to_24(time):
am_or_pm = time[-2] + time[-1]
time_update = ''
if am_or_pm == 'am' or am_or_pm == 'AM':
for i in time[0:-3]:
time_update += i
elif am_or_pm == 'pm' or am_or_pm == 'PM':
change = ''
for i in time[0:2]:
change += i
c = 12 + int(change)
if c >= 24:
c = 24 - c
c = list(str(c))
for i1 in c:
time_update += i1
for i2 in time[2:-3]:
time_update += i2
print(time_update)
time = list(input())
t12_to_24(time)

Can anyone give me a function to calculate date and time in python ? I m in WINDOWS 10

Please can anyone provide me function for time,date & day in python
just like time = time()
and time = "##:##:##" # is replaced with current time in my laptop
And date = date()
and date = "####/##/##"
And day = day()
and day = "sun/mon/tue/wed/etc"
import subprocess as sp
import datetime as dt
def time():
t = dt.datetime.now()
h = str(int(t.strftime("%H")))
m = str(int(t.strftime("%M")))
s = str(int(t.strftime("%S")))
ampm = " AM "
h = int(h)
if h > 12:
h = h - 12
ampm = " PM "
h =str(h)
return str(h) + ":" + t.strftime("%M:%S") + " " + ampm
def out(command,he):
result = he.run(command, stdout=he.PIPE, stderr=he.PIPE, universal_newlines=False, shell=True)
d = result.stdout
outp = d
f = d.find(b'\r')
if f>0:
outp = d[0:f]
outp = str(outp)
outp = outp[2:len(outp)-1]
return outp
def date(han):
myot = out("echo %date%",han)
myot = myot[4:len(myot)]
return myot
def day(he):
myot = out("echo %date%",he)
myot = myot[0:3]
if myot == "Wed":
myot = myot + "nes"
elif myot == "Thu":
myot = myot + "rs"
elif myot == "Sat":
myot = myot + "ur"
elif myot == "Tue":
myot = myot + "s"
myot = myot + "day"
return myot
date = date(sp)
day = day(sp)
time = time()
To get a timestamp with the current time and date you can use:
import datetime
timestamp = datetime.datetime.now()
You can then use that timestamp object to get the information you need with:
timestamp.year
timestamp.month
timestamp.day
timestamp.hour
timestamp.minute
timestamp.second
timestamp.weekday() - This is a method and so you need the parenthesis
If you want to understand a bit more what you are doing you can read the python datetime documentation https://docs.python.org/3/library/datetime.html

remove html tags python bugs?

import time
with open('C:/Users/LQ/Downloads/test.txt',encoding = 'utf-8') as f:
s = f.read()
start = end = 0
while start <= end:
if s[start] == '<' and s[end] == '>':
s.replace(s[start:end + 1],'')
start = end + 1
end = start
elif s[start] == '<' and s[end] != '>':
end += 1
elif s[start] != '<':
start += 1
end = start
the bug is string index out of range but I don't know how to fix it.

A question about Python script!

m = raw_input("Please enter a date(format:mm/dd/yyyy): ")
def main():
if '01' in m:
n = m.replace('01','Janauary')
print n
elif '02' in m:
n = m.replace('02','February')
print n
elif '03' in m:
n = m.replace('03','March')
print n
elif '04' in m:
n = m.replace('04','April')
print n
elif '05' in m:
n = m.replace('05','May')
print n
elif '06' in m:
n = m.replace('06','June')
print n
elif '07' in m:
n = m.replace('07','July')
print n
elif '08' in m:
n = m.replace('08','August')
print n
elif '09' in m:
n = m.replace('09','September')
print n
elif '10' in m:
n = m.replace('10','October')
print n
elif '11' in m:
n = m.replace('11','November')
print n
elif '12' in m:
n = m.replace('12','December')
print n
main()
for example, this scrpt can output 01/29/1991 to January/29/1991, but I want it output to January,29,1991 How to do it? how to replace the " / " to " , "?
Please don't do it this way; it's already wrong, and can't be fixed without a lot of work. Use datetime.strptime() to turn it into a datetime, and then datetime.strftime() to output it in the correct format.
Take advantage of the datetime module:
m = raw_input('Please enter a date(format:mm/dd/yyyy)')
# First convert to a datetime object
dt = datetime.strptime(m, '%m/%d/%Y')
# Then print it out how you want it
print dt.strftime('%B,%d,%Y')
Just like you replace all of the other strings - replace('/',',').
You might find a dictionary to be helpful here. It would be "simpler." You could try something as follows.
m = raw_input("Please enter a date(format:mm/dd/yyyy): ")
month_dict = {"01" : "January", "02" : "February", "03" : "March", ...}
# then when printing you could do the following
date_list = m.split("/") # This gives you a list like ["01", "21", "2010"]
print(month_dict[date_list[0]] + "," + date_list[1] + "," + date_list[2]
That will basically get you the same thing in 4 lines of code.
I have just rewrite your code more compact:
m = '01/15/2001'
d = {'01' : 'Jan', '02' : 'Feb'}
for key, value in d.items():
if key in m:
m = m.replace(key, value)

Categories