Parsing a fixed-width file in Python with Big Decimals - python

I have to parse the following file in python:
20100322;232400;1.355800;1.355900;1.355800;1.355900;0
20100322;232500;1.355800;1.355900;1.355800;1.355900;0
20100322;232600;1.355800;1.355800;1.355800;1.355800;0
I need to end upwith the following variables (first line is parsed as example):
year = 2010
month = 03
day = 22
hour = 23
minute = 24
p1 = Decimal('1.355800')
p2 = Decimal('1.355900')
p3 = Decimal('1.355800')
p4 = Decimal('1.355900')
I have tried:
line = '20100322;232400;1.355800;1.355900;1.355800;1.355900;0'
year = line[:4]
month = line[4:6]
day = line[6:8]
hour = line[9:11]
minute = line[11:13]
p1 = Decimal(line[16:24])
p2 = Decimal(line[25:33])
p3 = Decimal(line[34:42])
p4 = Decimal(line[43:51])
print(year)
print(month)
print(day)
print(hour)
print(minute)
print(p1)
print(p2)
print(p3)
print(p4)
Which works fine, but I am wondering if there is an easier way to parse this (maybe using struct) to avoid having to count each position manually.

from decimal import Decimal
from datetime import datetime
line = "20100322;232400;1.355800;1.355900;1.355800;1.355900;0"
tokens = line.split(";")
dt = datetime.strptime(tokens[0] + tokens[1], "%Y%m%d%H%M%S")
decimals = [Decimal(string) for string in tokens[2:6]]
# datetime objects also have some useful attributes: dt.year, dt.month, etc.
print(dt, *decimals, sep="\n")
Output:
2010-03-22 23:24:00
1.355800
1.355900
1.355800
1.355900

You could use regex:
import re
to_parse = """
20100322;232400;1.355800;1.355900;1.355800;1.355900;0
20100322;232500;1.355800;1.355900;1.355800;1.355900;0
20100322;232600;1.355800;1.355800;1.355800;1.355800;0
"""
stx = re.compile(
r'(?P<date>(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2}));'
r'(?P<time>(?P<hour>\d{2})(?P<minute>\d{2})(?P<second>\d{2}));'
r'(?P<p1>[\.\-\d]*);(?P<p2>[\.\-\d]*);(?P<p3>[\.\-\d]*);(?P<p4>[\.\-\d]*)'
)
f = [{k:float(v) if 'p' in k else int(v) for k,v in a.groupdict().items()} for a in stx.finditer(to_parse)]
print(f)
Output:
[{'date': 20100322,
'day': 22,
'hour': 23,
'minute': 24,
'month': 3,
'p1': 1.3558,
'p2': 1.3559,
'p3': 1.3558,
'p4': 1.3559,
'second': 0,
'time': 232400,
'year': 2010},
{'date': 20100322,
'day': 22,
'hour': 23,
'minute': 25,
'month': 3,
'p1': 1.3558,
'p2': 1.3559,
'p3': 1.3558,
'p4': 1.3559,
'second': 0,
'time': 232500,
'year': 2010},
{'date': 20100322,
'day': 22,
'hour': 23,
'minute': 26,
'month': 3,
'p1': 1.3558,
'p2': 1.3558,
'p3': 1.3558,
'p4': 1.3558,
'second': 0,
'time': 232600,
'year': 2010}]
Here i stored everything in a list, but you could actually go through the results of finditer line by line if you don't want to store everything in memory.
You can also replace fload and/or int with Decimal if needed

Related

How to sup up rows within a dictionary?

I have a dictionary:
{
"account": "x*", 'amount': 300, 'day': 3, 'month': 'June',
"account": "y*", 'amount': 550, 'day': 9, 'month': 'May',
"account": 'z*', 'amount': -200, 'day': 21, 'month': 'June'
"account" : "g", "amount" : 80" "day" : 10" month" : "May"
}
How do I find the total amount for each month June and May separately?
dictionary= sum(d["amount"] for d in my_dict)
You can filter which elements to sum, by adding an if statement at the end of the one-liner for-loop:
sum(d['amount'] for d in my_dict if d['month'] == month)
Then, we can wrap this line of code inside a small function to compute the results for May and June:
my_dict = [{'account': 'x*', 'amount': 300, 'day': 3, 'month': 'June'},
{'account': 'y*', 'amount': 550, 'day': 9, 'month': 'May' },
{'account': 'z*', 'amount': -200, 'day': 21, 'month': 'June'},
{'account': 'g' , 'amount': 80, 'day': 10, 'month': 'May' }]
get_sum = lambda my_dict, month: sum(d['amount'] for d in my_dict if d['month'] == month)
sum_June = get_sum(my_dict, 'June')
sum_May = get_sum(my_dict, 'May' )
print('sum_June:', sum_June)
# sum_June: 100
print('sum_May :', sum_May)
# sum_May : 630
PS. Initially, the dictionary my_dict was over-writting data, because everything was stored in the same object. In the code above, my_dict is split into a list with multiple rows to avoid this issue. Please consider this methodology to store data in your project - it is very common.

How to find a total year sales from a dictionary?

I have this dictionary, and when I code for it, I only have the answer for June, May, September. How would I code for the months that are not given in the dictionary? Obviously, I have zero for them.
{'account': 'Amazon', 'amount': 300, 'day': 3, 'month': 'June'}
{'account': 'Facebook', 'amount': 550, 'day': 5, 'month': 'May'}
{'account': 'Google', 'amount': -200, 'day': 21, 'month': 'June'}
{'account': 'Amazon', 'amount': -300, 'day': 12, 'month': 'June'}
{'account': 'Facebook', 'amount': 130, 'day': 7, 'month': 'September'}
{'account': 'Google', 'amount': 250, 'day': 27, 'month': 'September'}
{'account': 'Amazon', 'amount': 200, 'day': 5, 'month': 'May'}
The method I used for months mentioned in the dictionary:
year_balance=sum(d["amount"] for d in my_dict) print(f"The total year balance is {year_balance} $.")
import calendar
months = calendar.month_name[1:]
results = dict(zip(months, [0]*len(months)))
for d in data:
results[d["month"]] += d["amount"]
# then you have results dict with monthly amounts
# sum everything to get yearly total
total = sum(results.values())
This might help:
from collections import defaultdict
mydict = defaultdict(lambda: 0)
print(mydict["January"])
Also, given the comments you have written, is this what you are looking for?
your_list_of_dicts = [
{"January": 3, "March": 5},
{"January": 3, "April": 5}
]
import calendar
months = calendar.month_name[1:]
month_totals = dict()
for month in months:
month_totals[month] = 0
for d in your_list_of_dicts:
month_totals[month] += d[month] if month in d else 0
print(month_totals)
{'January': 6, 'February': 0, 'March': 5, 'April': 5, 'May': 0, 'June': 0, 'July': 0, 'August': 0, 'September': 0, 'October': 0, 'November': 0, 'December': 0}
You can read the following blog regarding the usage of dictionaries and how to perform calculations.
5 best ways to sum dictionary values in python
This is on of the examples given in the blog.
wages = {'01': 910.56, '02': 1298.68, '03': 1433.99, '04': 1050.14, '05': 877.67}
total = sum(wages.values())
print('Total Wages: ${0:,.2f}'.format(total))
Here is the result with 100,000 records.
Result with 100,000 records

consecutive days of login count python

I am getting my time in day and month format like this:
final =[{'day': 29, 'month': 5},{'day': 30, 'month': 5},{'day': 1, 'month': 6},{'day': 2, 'month': 6},{'day': 3, 'month': 6},{'day': 4, 'month': 6},{'day': 5, 'month': 6},{'day': 6, 'month': 6}, {'day': 7, 'month': 6}, {'day': 8, 'month': 6}, {'day': 9, 'month': 6}]
I want to check count of consecutive days in array from today to keep count of last online days . and if like my previous day exist it will add 1 in total count. for example {'day': 5, 'month': 6},{'day': 8, 'month': 6}, {'day': 9, 'month': 6}
in these three record 6 is missing so my count will be 2 .
there is now issue that like if it goes to previous month and there month end is like 30 and month 5 , how I will add this to my count ?
for now : I am doing like this
#getting today day and month and year
today_time = int(time.time())
today_time_day = datetime.datetime.fromtimestamp(today_time)
#to check if previous month day end start
month = monthrange(today_time_day.year, today_time_day.month)
print(month)
i =0
streak = 0
for x in reversed(final):
if today_time_day.day - i == x['day']:
streak += 1
else:
streak = 1
break
i += 1
print(streak)
I am trying to calculate but answer is wrong and not sure how I can use previous month streak .
So the answer is we need to keep track of last month count and reset loop count
today_time = int(time.time())
today_time_day = datetime.datetime.fromtimestamp(today_time)
final =[{'day': 29, 'month': 5},{'day': 28, 'month': 5},{'day': 1, 'month': 6},{'day': 2, 'month': 6},{'day': 3, 'month': 6},{'day': 4, 'month': 6},{'day': 5, 'month': 6},{'day': 6, 'month': 6}, {'day': 7, 'month': 6}, {'day': 8, 'month': 6}, {'day': 9, 'month': 6}]
month = monthrange(today_time_day.year, today_time_day.month)
i =0
current_day = today_time_day.day
streak = 0
for x in reversed(final):
if current_day - i == 0 and today_time_day.month -1 == x["month"] :
current_day = month[1]
i = 0
if current_day - i == x['day']:
streak += 1
else:
break
i += 1
print(streak)

Lowercase dictionary items within a list python

I am trying to lowercase all the keys in a dictionary(s) that are within a list. I actually have a code that prints the lowercase output I want within a for loop. I'm using a dictionary comprehension to lowercase, but I'm not sure how to append the changed values to my list.
amdardict = [{'1031': 98, '1032': 1, '33007': 70, 'AIRCRAFT_FLIGHT_NUMBER': 'CNFNXQ', 'DAY': 5, 'HEIGHT_OR_ALTITUDE': 1490.0, 'HOUR': 0, 'LATITUDE': 39.71, 'LONGITUDE': -41.79, 'MINUTE': 0, 'MONTH': 10, 'PHASE_OF_AIRCRAFT_FLIGHT': 5, 'TEMPERATURE_DRY_BULB_TEMPERATURE': 289.0, 'WIND_DIRECTION': 219, 'WIND_SPEED': 3.0, 'YEAR': 2019}
{'12101': 248.75, '4006': 55, '7010': 6135, '8009': 3, 'aircraft_flight_number': '????????', 'aircraft_registration_number_or_other_identification': 'AU0155', 'aircraft_tail_number': '??????', 'day': 5, 'destination_airport': '???', 'hour': 0, 'latitude': -34.3166, 'longitude': 151.9333, 'minute': 8, 'month': 10, 'observation_sequence_number': 64, 'origination_airport': '???', 'wind_direction': 208, 'wind_speed': 23.0, 'year': 2019}
]
for d in amdardict: print(dict((k.lower(), v) for k, v in d.items()))
Why modify the original list? Can you create a new empty list and slightly modify your code to append to that new list instead of printing:
new_list = []
for d in amdardict:
new_list.append(dict((k.lower(), v) for k, v in d.items()))
To change the keys in-place, you can use the dict.pop method.
>>> # Copy the list in case we make a mistake
>>> import copy
>>> backup = copy.deepcopy(amdardict)
>>> for d in amdardict:
... # <ake a list of keys() because we can't loop over keys()
... # and change keys simultaneously
... for k in list(d.keys()):
... if not k.islower():
# pop removes the key from the dict and returns the value
... d[k.lower()] = d.pop(k)
...
>>> amdardict
[{'aircraft_flight_number': 'CNFNXQ', 'day': 5, 'height_or_altitude': 1490.0, 'temperature_dry_bulb_temperature': 289.0, 'wind_direction': 219, 'wind_speed': 3.0, 'year': 2019, 'hour': 0, 'latitude': 39.71, 'longitude': -41.79, 'minute': 0, 'month': 10, 'phase_of_aircraft_flight': 5, '1031': 98, '1032': 1, '33007': 70}, {'aircraft_flight_number': '????????', 'aircraft_registration_number_or_other_identification': 'AU0155', 'aircraft_tail_number': '??????', 'day': 5, 'destination_airport': '???', 'hour': 0, 'latitude': -34.3166, 'longitude': 151.9333, 'minute': 8, 'month': 10, 'observation_sequence_number': 64, 'origination_airport': '???', 'wind_direction': 208, 'wind_speed': 23.0, 'year': 2019, '12101': 248.75, '4006': 55, '7010': 6135, '8009': 3}]

Python list of dictionaries: Get last item of each day

I have a list of dictionaries, ordered by the key date:
d = [{'date': datetime.strptime('2016-01-01 07:00', "%Y-%m-%d %H:%M"), 'val': 1},
{'date': datetime.strptime('2016-01-01 23:00', "%Y-%m-%d %H:%M"), 'val': 3},
{'date': datetime.strptime('2016-01-02 07:00', "%Y-%m-%d %H:%M"), 'val': 5},
{'date': datetime.strptime('2016-01-02 22:13', "%Y-%m-%d %H:%M"), 'val': 7},
{'date': datetime.strptime('2016-01-02 23:00', "%Y-%m-%d %H:%M"), 'val': 9},
{'date': datetime.strptime('2016-01-03 00:10', "%Y-%m-%d %H:%M"), 'val': 17},
{'date': datetime.strptime('2016-01-03 09:12', "%Y-%m-%d %H:%M"), 'val': 25},
{'date': datetime.strptime('2016-01-03 21:52', "%Y-%m-%d %H:%M"), 'val': 37}]
And i want to get the last(latest) item of each day, so in this case it would be:
{'date': datetime.strptime('2016-01-01 23:00', "%Y-%m-%d %H:%M"), 'val': 3},
{'date': datetime.strptime('2016-01-02 23:00', "%Y-%m-%d %H:%M"), 'val': 9},
{'date': datetime.strptime('2016-01-03 21:52', "%Y-%m-%d %H:%M"), 'val': 37},
I have the following piece of code which does the trick:
previous_item = None
wanted_data = []
for index, entry in enumerate(d):
if not previous_item:
previous_item = entry
continue
if entry['date'].date() != previous_item['date'].date():
wanted_data.append(previous_item)
previous_item = entry
#Add as well the last item
if index + 1 == len(d):
wanted_data.append(entry)
But i believe there are better and faster ways to do it... Besides, thats pretty ugly.
Is there a more pythonish way to achieve this?
Thanks!
Assuming that the data is already sorted by 'date' (it seems to be in your case), you can use itertools.groupby to group by the date(), and then get the last item from each group.
>>> d = sorted(d, key=lambda x: x["date"]) # only if not already sorted
>>> groups = itertools.groupby(d, lambda x: x["date"].date())
>>> wanted_data = [list(grp)[-1] for key, grp in groups]
>>> wanted_data
[{'date': datetime.datetime(2016, 1, 1, 23, 0), 'val': 3},
{'date': datetime.datetime(2016, 1, 2, 23, 0), 'val': 9},
{'date': datetime.datetime(2016, 1, 3, 21, 52), 'val': 37}]
Note that this will expand each of the groups into a list. If this is too expensive, because there are very many entries for each date, you could create a function to get the last entry from an iterator, e.g. using reduce (or functools.reduce in Python 3):
>>> last = lambda x: functools.reduce(lambda x, y: y, x)
>>> wanted_data = [last(grp) for key, grp in groups]

Categories