histogram indexed by month and day [duplicate] - python

This question already has an answer here:
Plotting histogram of list of tuplets matplotlib
(1 answer)
Closed 4 years ago.
I'm trying to create a histogram of the number of month / day pairs. So, I have an array which consists of the following:
date_patterns = [(12,1,1992), (1,4,1993), (1,5,1993),
(1,6,1993), (1,4,1994), (1,5,1994),
(2,9,1995), (3,4,1995), (1,4,1996)]
I'd like this histogram indexed by just the month and day so:
(12,1) = 1
(1,4) = 3
(1,5) = 2
(1,6) = 1
(2,9) = 1
(3, 4) = 1

import itertools
date_patterns = [(12,1,1992), (1,4,1993), (1,5,1993),
(1,6,1993), (1,4,1994), (1,5,1994),
(2,9,1995), (3,4,1995), (1,4,1996)]
#use a list comprehension to go through the date patterns grouped by day, month and then count the lengths of the groups
groups = [(k, len(list(g))) for k, g in itertools.groupby(sorted(date_patterns), lambda x:(x[0], x[1]))]
print groups

Related

Im getting a different output than expected when using df.loc to change some values of the df

I have a data frame, and I want to assign a quartile number based on the quartile variable, which gives me the ranges that I later use in the for. The problem is that instead of just changing the quartile number, it its creating n (len of the datframe) rows, and then using the row number for the loop.
expected result
actual output
quartile = numpy.quantile(pivot['AHT'], [0.25,0.5,0.75])
pivot['Quartile'] = 0
for i in range(0,len(pivot)-1):
if i <= quartile[0]:
pivot.loc[i,'Quartile'] = 1
elif i <= quartile[1]:
pivot.loc[i,'Quartile'] = 2
elif i <= quartile[2]:
pivot.loc[i,'Quartile'] = 3
else:
pivot.loc[i,'Quartile'] = 4
Use qcut with labels=False and add 1 or specify values of labels in list:
pivot['Quartile'] = pd.qcut(pivot['AHT'], 4, labels=False) + 1
pivot['Quartile'] = pd.qcut(pivot['AHT'], 4, labels=[1,2,3,4])

Python regex to find connected digits [duplicate]

This question already has answers here:
How to efficiently parse fixed width files?
(11 answers)
Closed 1 year ago.
I have raw txt files and need to use regex to search each digit separated by space.
Question, data format is like:
6 3 1 0
7 3 1 0
8 35002 0
9 34104 0
My regex is:
(?P<COORD>\d+)
The matched output for first two lines are, (6,3,1,0) and (7,3,1,0) which are correct.
However, it doesn't apply to last two lines, their output are (8, 35002, 0) and (9, 34104, 0). The correct grouping numbers should be (8, 3, 5002, 0) and (9, 3, 4104, 0). How can I solve this?
If the numbers are aligned and the width of the columns are fixed,
You can use
width = 4
for line in lines:
columns = [ line[j: j + width] for j in range(0, len(line), width)]
numbers = list(map(lambda x: int(x.strip()), columns))
# or a one liner
print(list(int(line[j:j+width].strip()) for j in range(0, len(line), width)))

Accumulated sum of 2D array [duplicate]

This question already has answers here:
Multidimensional cumulative sum in numpy
(3 answers)
Closed 2 years ago.
Suppose I have a 2D numpy array like below
dat = np.array([[1,2],[3,4],[5,6],[7,8])
I want to get a new array with each row equals to the sum of its previous rows with itself, like the following
first row: [1,2]
second row: [1,2] + [3,4] = [4,6]
third row: [4,6] + [5,6] = [9,12]
forth row: [9,12] + [7,8] = [16,20]
So the array would be like
dat = np.array([[1,2],[4,6],[9,12],[16,20])
np.cumsum is what you are looking for:
dat = np.array([[1,2],[3,4],[5,6],[7,8]])
result = np.cumsum(dat, axis=0)

Counting string occurrences in list

I'm trying to work on a simple python problem hosted on hackerrankteam but I'm having difficulty with the count function for lists in Python. I've tried multiple test cases but my count function always returns 0.
Objective: count the number of occurrences that consecutive squares equals the number of days.
Is this an issue with the type of list? Is there an easier way for me count the values in one line instead of having to check the valued pairs and then count the sums?
import sys
def solve(size, squares, day, month):
check = [sum(squares[nums:nums+month]) == day for nums in range(0,len(squares))]
print (check) #Test list output
count = check.count('True')
return count
#Test Cases 1
# size = 6
# squares = [1,1,1,1,1,1]
# day, month = (3,2)
#Output 0
#Test Cases 2
# size = 1
# squares = [4]
# day, month = (4,1)
#Output 1
#Test Cases 3
size = 5
squares = [1,2,1,3,2]
day, month = [3,2]
#Output 2
#Custom User Input:
# size = int(input().strip())
# squares = list(map(int, input().strip().split(' ')))
# day, month = input().strip().split(' ')
# day, month = [int(day), int(month)]
result = solve(size, squares, day, month)
print(result)
check.count('True')
This code is counting the number of occurrences of the string 'True'.
It should instead be:
check.count(True)
You could also simply use this:
sum(check)

How to reduce a collection of ranges to a minimal set of ranges [duplicate]

This question already has answers here:
Union of multiple ranges
(5 answers)
Closed 7 years ago.
I'm trying to remove overlapping values from a collection of ranges.
The ranges are represented by a string like this:
499-505 100-115 80-119 113-140 500-550
I want the above to be reduced to two ranges: 80-140 499-550. That covers all the values without overlap.
Currently I have the following code.
cr = "100-115 115-119 113-125 80-114 180-185 500-550 109-120 95-114 200-250".split(" ")
ar = []
br = []
for i in cr:
(left,right) = i.split("-")
ar.append(left);
br.append(right);
inc = 0
for f in br:
i = int(f)
vac = []
jnc = 0
for g in ar:
j = int(g)
if(i >= j):
vac.append(j)
del br[jnc]
jnc += jnc
print vac
inc += inc
I split the array by - and store the range limits in ar and br. I iterate over these limits pairwise and if the i is at least as great as the j, I want to delete the element. But the program doesn't work. I expect it to produce this result: 80-125 500-550 200-250 180-185
For a quick and short solution,
from operator import itemgetter
from itertools import groupby
cr = "499-505 100-115 80-119 113-140 500-550".split(" ")
fullNumbers = []
for i in cr:
a = int(i.split("-")[0])
b = int(i.split("-")[1])
fullNumbers+=range(a,b+1)
# Remove duplicates and sort it
fullNumbers = sorted(list(set(fullNumbers)))
# Taken From http://stackoverflow.com/questions/2154249
def convertToRanges(data):
result = []
for k, g in groupby(enumerate(data), lambda (i,x):i-x):
group = map(itemgetter(1), g)
result.append(str(group[0])+"-"+str(group[-1]))
return result
print convertToRanges(fullNumbers)
#Output: ['80-140', '499-550']
For the given set in your program, output is ['80-125', '180-185', '200-250', '500-550']
Main Possible drawback of this solution: This may not be scalable!
Let me offer another solution that doesn't take time linearly proportional to the sum of the range sizes. Its running time is linearly proportional to the number of ranges.
def reduce(range_text):
parts = range_text.split()
if parts == []:
return ''
ranges = [ tuple(map(int, part.split('-'))) for part in parts ]
ranges.sort()
new_ranges = []
left, right = ranges[0]
for range in ranges[1:]:
next_left, next_right = range
if right + 1 < next_left: # Is the next range to the right?
new_ranges.append((left, right)) # Close the current range.
left, right = range # Start a new range.
else:
right = max(right, next_right) # Extend the current range.
new_ranges.append((left, right)) # Close the last range.
return ' '.join([ '-'.join(map(str, range)) for range in new_ranges ]
This function works by sorting the ranges, then looking at them in order and merging consecutive ranges that intersect.
Examples:
print(reduce('499-505 100-115 80-119 113-140 500-550'))
# => 80-140 499-550
print(reduce('100-115 115-119 113-125 80-114 180-185 500-550 109-120 95-114 200-250'))
# => 80-125 180-185 200-250 500-550

Categories