I'm trying to remove Columbus Day from pandas.tseries.holiday.USFederalHolidayCalendar.
This seems to be possible, as a one-time operation, with
from pandas.tseries.holiday import USFederalHolidayCalendar
cal = USFederalHolidayCalendar()
cal = cal.rules.pop(6)
However, if this code is within a function that gets called repeatedly (in a loop) to generate several independent outputs, I get the following error:
IndexError: pop index out of range
It gives me the impression that the object remains in its initial loaded state and as the loop progresses it pops holidays at index 6 until they're gone and then throws an error.
I tried reloading via importlib.reload to no avail.
Any idea what I'm doing wrong?
# Import your library
from pandas.tseries.holiday import USFederalHolidayCalendar
# Get an id of 'columbus' in 'rules' list
columbus_index = USFederalHolidayCalendar().rules.index([i for i in USFederalHolidayCalendar().rules if 'Columbus' in str(i)][0])
# Create your own class, inherit 'USFederalHolidayCalendar'
class USFederalHolidayCalendar(USFederalHolidayCalendar):
# Exclude 'columbus' entry
rules = USFederalHolidayCalendar().rules[:columbus_index] + USFederalHolidayCalendar().rules[columbus_index+1:]
# Create an object from your class
cal = USFederalHolidayCalendar()
print(cal.rules)
[Holiday: New Years Day (month=1, day=1, observance=<function nearest_workday at 0x7f6afad571f0>),
Holiday: Martin Luther King Jr. Day (month=1, day=1, offset=<DateOffset: weekday=MO(+3)>),
Holiday: Presidents Day (month=2, day=1, offset=<DateOffset: weekday=MO(+3)>),
Holiday: Memorial Day (month=5, day=31, offset=<DateOffset: weekday=MO(-1)>),
Holiday: July 4th (month=7, day=4, observance=<function nearest_workday at 0x7f6afad571f0>),
Holiday: Labor Day (month=9, day=1, offset=<DateOffset: weekday=MO(+1)>),
Holiday: Veterans Day (month=11, day=11, observance=<function nearest_workday at 0x7f6afad571f0>),
Holiday: Thanksgiving (month=11, day=1, offset=<DateOffset: weekday=TH(+4)>),
Holiday: Christmas (month=12, day=25, observance=<function nearest_workday at 0x7f6afad571f0>)]
The problem here is that rules is a class attribute (a list of objects). See the code taken from here:
class USFederalHolidayCalendar(AbstractHolidayCalendar):
"""
US Federal Government Holiday Calendar based on rules specified by:
https://www.opm.gov/policy-data-oversight/
snow-dismissal-procedures/federal-holidays/
"""
rules = [
Holiday("New Years Day", month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
USMemorialDay,
Holiday("July 4th", month=7, day=4, observance=nearest_workday),
USLaborDay,
USColumbusDay,
Holiday("Veterans Day", month=11, day=11, observance=nearest_workday),
USThanksgivingDay,
Holiday("Christmas", month=12, day=25, observance=nearest_workday),
]
Since the attribute is defined on the class, there is only one underlying list referred to, so if operations on different instances of that class both attempt to edit the list, then you'll have some unwanted behavior. Here is an example that shows what's going on:
>>> class A:
... rules = [0,1,2]
...
>>> a1 = A()
>>> a2 = A()
>>> a1.rules.pop()
2
>>> a1.rules.pop()
1
>>> a2.rules.pop()
0
>>> a2.rules.pop()
IndexError: pop from empty list
>>> a3 = A()
>>> a3.rules
[]
Also, each module in python is imported only one time
I have two API.
Australia API- This API works only for year 1985 to 2024.
USA API- I wanted this API need should work only before 1985.
Taking 4 things from user.
-Start Year
-End Year
-latitude
-longitude
sample command: python test.py -latitude '' -longitude '' -startYear '' -endYear ''
User can enter 3 ways of input.
Case 1. Start year=before 1985, end year= After 1985 ---->both AUSTRALIA and USA api run.
Case 2. Start year=At 1985 or later, end year= after 1985 ---->only AUSTRALIA api should run.
Case 3. Start year=before 1985, end year=before 1985 ------>only USA api run
Problem is that I am not able to figure out how to write the code for Case 1 after writing the code for case 2(Australia API) and case 3(USA API).
import requests
import json
import argparse
import time
import pandas as pd
import warnings
warnings.filterwarnings("ignore")
parser = argparse.ArgumentParser(description="Process some integers.")
parser.add_argument("-latitude", help="Latitude(Degress)")
parser.add_argument("-longitude", help="Longitude(Degress)")
parser.add_argument("-startYear", help="Start of the Year")
parser.add_argument("-endYear", help="End of the Year")
parser.add_argument("--verbose", help="display processing information")
start = time.time()
def main(latitude,longitude,startYear,endYear,verbose):
parameters = {
"latd": latitude, # [deg]
"latm": 00, # [deg]
"lats": 00, # [deg]
"lond": longitude, # [deg]
"lonm": 00, # [deg]
"lons": 00, # [deg]
"elev" : 00, # [km]
"year" : None, # [YYYY]
"month" : '07', # [MM]
"day": '01', # [DD]
"Ein": 'D' # [Model]
}
hostname = "https://api.geomagnetism.ga.gov.au/agrf"
hostname1 = "http://www.ngdc.noaa.gov/geomag-web/calculators/calculateDeclination?%s"
df_1=pd.DataFrame()
for year in range(startYear, endYear):
if (startYear>=1985 and endYear>1985):
-----
elif (startYear<1985 and endYear<1985):
-------
if endYear < 1985:
if startYear < 1985:
# Case 3
elif startYear >= 1985:
# Case 2
elif startYear < 1985:
# Case 1
else:
# Case where endYear < 1985 and startYear > 1985 (probably an input error)
if end > 1985:
australia
if start < 1985:
usa
else:
usa
Is there any way to print the numbers in real times instead of printing them one by one? I have 6 different countries
china = 1399746872
india = 1368138206
USA = 327826334
Japan = 12649000
Russia = 146804372
Sweden = 10379295
I change this numbers in the script but how do I print them so I see them change?
!EDITED!
I want to kind of overwrite this list everytime it prints so I see the numbers go up
Countries = []
china = 1399746872
india = 1368138206
USA = 327826334
Japan = 12649000
Russia = 146804372
Sweden = 10379295
Countries.append(china)
Countries.append(india)
Countries.append(USA)
Countries.append(Japan)
Countries.append(Russia)
Countries.append(Sweden)
print(Countries)
you could use os.system("cls") to clear the console.
I made a little demo:
import time, sys, json, os
from random import randint
vals = {
"china": 1399746872,
"india": 1368138206,
"USA": 327826334,
"Japan": 12649000,
"Russia": 146804372,
"Sweden": 10379295
}
for _ in range(100):
# clear console
os.system("cls")
# print values
[print(f"{k}: {v}") for k, v in vals.items()]
# renew values with random generated integers
vals = {k:randint(0, 1000000) for k in vals}
# sleep 5s
time.sleep(5)
I have three branch locations with the times in respect to their cities, however, I don't know how to write the names of the cities in front of the times. Would someone please help me?
Thanks
from datetime import datetime
from pytz import timezone
import pytz
portland_time = datetime.now(tz=pytz.UTC).replace(microsecond=0)
Portland = portland_time.astimezone(pytz.timezone('US/Pacific'))
new_york_time = portland_time.astimezone(timezone('US/Eastern'))
Ny = new_york_time
london_time = portland_time.astimezone(timezone('Europe/London'))
London = london_time
cities = {'Portland': Portland,
'Ny': Ny,
'London': London}
def branches():
for city in cities:
Branchtime=int(cities[city].strftime('%H'))
if Branchtime >= 9 and Branchtime < 21:
print(city, cities[city], 'OPEN')
else:
print(city, cities[city], 'CLOSED')
branches()
Do you mind using a dict instead of a list for your cities? If not you can do this:
from datetime import datetime
from pytz import timezone
import pytz
portland_time = datetime.now(tz=pytz.UTC).replace(microsecond=0)
Portland = portland_time.astimezone(pytz.timezone('US/Pacific'))
new_york_time = portland_time.astimezone(timezone('US/Eastern'))
Ny = new_york_time
london_time = portland_time.astimezone(timezone('Europe/London'))
London = london_time
cities = {'Portland': Portland,
'Ny': Ny,
'London': London}
for city in cities:
Branchtime=int(cities[city].strftime('%H'))
if Branchtime >= 9 and Branchtime < 21:
print(city, cities[city], 'OPEN')
else:
print(city, cities[city], 'CLOSED')
Gives you:
Ny 2017-06-10 02:22:55-04:00 CLOSED
Portland 2017-06-09 23:22:55-07:00 CLOSED
London 2017-06-10 07:22:55+01:00 CLOSED
I wanted to parse a text file that contains unstructured text. I need to get the address, date of birth, name, sex, and ID.
. 55 MORILLO ZONE VIII,
BARANGAY ZONE VIII
(POB.), LUISIANA, LAGROS
F
01/16/1952
ALOMO, TERESITA CABALLES
3412-00000-A1652TCA2
12
. 22 FABRICANTE ST. ZONE
VIII LUISIANA LAGROS,
BARANGAY ZONE VIII
(POB.), LUISIANA, LAGROS
M
10/14/1967
AMURAO, CALIXTO MANALO13
In the example above, the first 3 lines is the address, the line with just an "F" is the sex, the DOB would be the line after "F", name after the DOB, the ID after the name, and the no. 12 under the ID is the index/record no.
However, the format is not consistent. In the second group, the address is 4 lines instead of 3 and the index/record no. is appended after the name (if the person doesn't have an ID field).
I wanted to rewrite the text into the following format:
name, ID, address, sex, DOB
Here is a first stab at a pyparsing solution (easy-to-copy code at the pyparsing pastebin). Walk through the separate parts, according to the interleaved comments.
data = """\
. 55 MORILLO ZONE VIII,
BARANGAY ZONE VIII
(POB.), LUISIANA, LAGROS
F
01/16/1952
ALOMO, TERESITA CABALLES
3412-00000-A1652TCA2
12
. 22 FABRICANTE ST. ZONE
VIII LUISIANA LAGROS,
BARANGAY ZONE VIII
(POB.), LUISIANA, LAGROS
M
10/14/1967
AMURAO, CALIXTO MANALO13
"""
from pyparsing import LineEnd, oneOf, Word, nums, Combine, restOfLine, \
alphanums, Suppress, empty, originalTextFor, OneOrMore, alphas, \
Group, ZeroOrMore
NL = LineEnd().suppress()
gender = oneOf("M F")
integer = Word(nums)
date = Combine(integer + '/' + integer + '/' + integer)
# define the simple line definitions
gender_line = gender("sex") + NL
dob_line = date("DOB") + NL
name_line = restOfLine("name") + NL
id_line = Word(alphanums+"-")("ID") + NL
recnum_line = integer("recnum") + NL
# define forms of address lines
first_addr_line = Suppress('.') + empty + restOfLine + NL
# a subsequent address line is any line that is not a gender definition
subsq_addr_line = ~(gender_line) + restOfLine + NL
# a line with a name and a recnum combined, if there is no ID
name_recnum_line = originalTextFor(OneOrMore(Word(alphas+',')))("name") + \
integer("recnum") + NL
# defining the form of an overall record, either with or without an ID
record = Group((first_addr_line + ZeroOrMore(subsq_addr_line))("address") +
gender_line +
dob_line +
((name_line +
id_line +
recnum_line) |
name_recnum_line))
# parse data
records = OneOrMore(record).parseString(data)
# output the desired results (note that address is actually a list of lines)
for rec in records:
if rec.ID:
print "%(name)s, %(ID)s, %(address)s, %(sex)s, %(DOB)s" % rec
else:
print "%(name)s, , %(address)s, %(sex)s, %(DOB)s" % rec
print
# how to access the individual fields of the parsed record
for rec in records:
print rec.dump()
print rec.name, 'is', rec.sex
print
Prints:
ALOMO, TERESITA CABALLES, 3412-00000-A1652TCA2, ['55 MORILLO ZONE VIII,', 'BARANGAY ZONE VIII', '(POB.), LUISIANA, LAGROS'], F, 01/16/1952
AMURAO, CALIXTO MANALO, , ['22 FABRICANTE ST. ZONE', 'VIII LUISIANA LAGROS,', 'BARANGAY ZONE VIII', '(POB.), LUISIANA, LAGROS'], M, 10/14/1967
['55 MORILLO ZONE VIII,', 'BARANGAY ZONE VIII', '(POB.), LUISIANA, LAGROS', 'F', '01/16/1952', 'ALOMO, TERESITA CABALLES', '3412-00000-A1652TCA2', '12']
- DOB: 01/16/1952
- ID: 3412-00000-A1652TCA2
- address: ['55 MORILLO ZONE VIII,', 'BARANGAY ZONE VIII', '(POB.), LUISIANA, LAGROS']
- name: ALOMO, TERESITA CABALLES
- recnum: 12
- sex: F
ALOMO, TERESITA CABALLES is F
['22 FABRICANTE ST. ZONE', 'VIII LUISIANA LAGROS,', 'BARANGAY ZONE VIII', '(POB.), LUISIANA, LAGROS', 'M', '10/14/1967', 'AMURAO, CALIXTO MANALO', '13']
- DOB: 10/14/1967
- address: ['22 FABRICANTE ST. ZONE', 'VIII LUISIANA LAGROS,', 'BARANGAY ZONE VIII', '(POB.), LUISIANA, LAGROS']
- name: AMURAO, CALIXTO MANALO
- recnum: 13
- sex: M
AMURAO, CALIXTO MANALO is M
you have to exploit whatever regularity and structure the text does have.
I suggest you read one line at a time and match it to a regular expression to determine its type, fill in the appropriate field in a person object. writing out that object and starting a new one whenever you get a field that you already have filled in.
It may be overkill, but the leading edge machine learning algorithms for this type of problem are based on conditional random fields. For example, Accurate Information Extraction from Research Papers
using Conditional Random Fields.
There is software out there that makes training these models relatively easy. See Mallet or CRF++.
You can probably do this with regular expressions without too much difficulty. If you have never used them before, check out the python documentation, then fire up redemo.py (on my computer, it's in c:\python26\Tools\scripts).
The first task is to split the flat file into a list of entities (one chunk of text per record). From the snippet of text you gave, you could split the file with a pattern matching the beginning of a line, where the first character is a dot:
import re
re_entity_splitter = re.compile(r'^\.')
entities = re_entity_splitter.split(open(textfile).read())
Note that the dot must be escaped (it's a wildcard character by default). Note also the r before the pattern. The r denotes 'raw string' format, which excuses you from having to escape the escape characters, resulting in so-called 'backslash plague.'
Once you have the file split into individual people, picking out the gender and birthdate is a snap. Use these:
re_gender = re.compile(r'^[MF]')
re_birth_Date = re.compile(r'\d\d/\d\d/\d\d')
And away you go. You can paste the flat file into re demo GUI and experiment with creating patterns to match what you need. You'll have it parsed in no time. Once you get good at this, you can use symbolic group names (see docs) to pick out individual elements quickly and cleanly.
Here's a quick hack job.
f = open('data.txt')
def process(file):
address = ""
for line in file:
if line == '': raise StopIteration
line = line.rstrip() # to ignore \n
if line in ('M','F'):
sex = line
break
else:
address += line
DOB = file.readline().rstrip() # to ignore \n
name = file.readline().rstrip()
if name[-1].isdigit():
name = re.match(r'^([^\d]+)\d+', name).group(1)
ID = None
else:
ID = file.readline().rstrip()
file.readline() # ignore the record #
print (name, ID, address, sex, DOB)
while True:
process(f)