Python regex to find connected digits [duplicate] - python

This question already has answers here:
How to efficiently parse fixed width files?
(11 answers)
Closed 1 year ago.
I have raw txt files and need to use regex to search each digit separated by space.
Question, data format is like:
6 3 1 0
7 3 1 0
8 35002 0
9 34104 0
My regex is:
(?P<COORD>\d+)
The matched output for first two lines are, (6,3,1,0) and (7,3,1,0) which are correct.
However, it doesn't apply to last two lines, their output are (8, 35002, 0) and (9, 34104, 0). The correct grouping numbers should be (8, 3, 5002, 0) and (9, 3, 4104, 0). How can I solve this?

If the numbers are aligned and the width of the columns are fixed,
You can use
width = 4
for line in lines:
columns = [ line[j: j + width] for j in range(0, len(line), width)]
numbers = list(map(lambda x: int(x.strip()), columns))
# or a one liner
print(list(int(line[j:j+width].strip()) for j in range(0, len(line), width)))

Related

How to print a rectangle pattern using numbers in python

I am having trouble solving the following question:
Write a program that draws “modular rectangles” like the ones below. The user specifies the width and height of the rectangle, and the entries start at 0 and increase typewriter fashion from left to right and top to bottom, but are all done mod 10. Example: Below are examples of a 3 x 5 rectangular:
The following code is what I have tried to solve the problem:
I know it's bad but I still don't know how to print the 5 till 9 numbers on top of each other.
width = int(input("Enter the width of the rectangle:"))
height = int(input("Enter the height of the rectangle:"))
for x in range(0, width, 1):
for y in range(0, height, 1):
print(y, end = ' ')
print()
Thank you all in advance.
You're on the right track, but you have a few issues. Firstly you need to iterate y before x, since you process the columns for each row, not rows for each column. Secondly, you need to compute how many values you have output, which you can do with (y*width+x). To output a single digit, take that value modulo 10. Finally range(0, width, 1) is just the same as range(width). Putting it all together:
width = 5
height = 3
for y in range(height):
for x in range(width):
print((y*width+x)%10, end=' ')
print()
Output:
0 1 2 3 4
5 6 7 8 9
0 1 2 3 4

histogram indexed by month and day [duplicate]

This question already has an answer here:
Plotting histogram of list of tuplets matplotlib
(1 answer)
Closed 4 years ago.
I'm trying to create a histogram of the number of month / day pairs. So, I have an array which consists of the following:
date_patterns = [(12,1,1992), (1,4,1993), (1,5,1993),
(1,6,1993), (1,4,1994), (1,5,1994),
(2,9,1995), (3,4,1995), (1,4,1996)]
I'd like this histogram indexed by just the month and day so:
(12,1) = 1
(1,4) = 3
(1,5) = 2
(1,6) = 1
(2,9) = 1
(3, 4) = 1
import itertools
date_patterns = [(12,1,1992), (1,4,1993), (1,5,1993),
(1,6,1993), (1,4,1994), (1,5,1994),
(2,9,1995), (3,4,1995), (1,4,1996)]
#use a list comprehension to go through the date patterns grouped by day, month and then count the lengths of the groups
groups = [(k, len(list(g))) for k, g in itertools.groupby(sorted(date_patterns), lambda x:(x[0], x[1]))]
print groups

Memory efficient way to read an array of integers from single line of input in python2.7

I want to read a single line of input containing integers separated by spaces.
Currently I use the following.
A = map(int, raw_input().split())
But now the N is around 10^5 and I don't need the whole array of integers, I just need to read them 1 at a time, in the same sequence as the input.
Can you suggest an efficient way to do this in Python2.7
Use generators:
numbers = '1 2 5 18 10 12 16 17 22 50'
gen = (int(x) for x in numbers.split())
for g in gen:
print g
1
5
6
8
10
12
68
13
the generator object would use one item at a time, and won't construct a whole list.
You could parse the data a character at a time, this would reduce memory usage:
data = "1 50 30 1000 20 4 1 2"
number = []
numbers = []
for c in data:
if c == ' ':
if number:
numbers.append(int(''.join(number)))
number = []
else:
number.append(c)
if number:
numbers.append(int(''.join(number)))
print numbers
Giving you:
[1, 50, 30, 1000, 20, 4, 1, 2]
Probably quite a bit slower though.
Alternatively, you could use itertools.groupby() to read groups of digits as follows:
from itertools import groupby
data = "1 50 30 1000 20 4 1 2"
numbers = []
for k, g in groupby(data, lambda c: c.isdigit()):
if k:
numbers.append(int(''.join(g)))
print numbers
If you're able to destroy the original string, split accepts a parameter for the maximum number of breaks.
See docs for more details and examples.

what are the possible permutations of 8 digits

I need to know what are the possible permutations of 8 digits following the rules of my python code:
import itertools
import time
import string
numbers = set(range(10))
letters = set(string.ascii_letters)
mylist=[]
start=time.time()
comb = ([x for x in itertools.combinations([0,1,2,3,4,5,6,7,8,9,'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'], 8)
if set(x) & letters and set(x) & numbers])
f=open("data.txt","w")
f.write("%s" % comb)
f.close()
end=time.time()
diff=end-start
print ("Se obtuvieron {} combinaciones.".format(len(comb)))
print ("En un tiempo total de:",diff,"segundos")
There's a lot of them. To be clear:
Combinations of 123 for 2 digits are 12, 13, 23.
Permutations of 123 for 2 digits are 12, 13, 21, 23, 31, 32.
Combinations is a smaller number because order doesn't matter. Your code looks like you require at least one number or letter in your 8-digit combination, so you need the sum of:
Combinations of 1 digit times combinations of 7 letters.
Combinations of 2 digits times combinations of 6 letters.
etc...
Combinations of 7 digits times combinations of 1 letter.
Permutations should be 62 letters/numbers taken 8 at a time, minus the all-letter permutations of 52 letters taken 8 at a time, minus the all-number permutations of 10 numbers taken 8 at a time.
from math import factorial as f
def P(n,k):
return f(n)//f(n-k)
def C(n,k):
return f(n)//f(n-k)//f(k)
letters = 52
numbers = 10
length = 8
combinations = sum(C(numbers,i) * C(letters,length-i) for i in range(1,length))
print('Combinations: {:20,}'.format(combinations))
permutations = P(letters+numbers,length) - P(letters,length) - P(numbers,length)
print('Permutations: {:20,}'.format(permutations))
Output:
Combinations: 2,628,560,350
Permutations: 105,983,553,312,000
Trying to generate all those combinations or permutations in an in-memory list as your code is doing is not a good idea.
For the record, I don't think you are asking the right question. You say permutation, but your code uses combinations, those are different things.
I will not give you the complete answer, because it would take forever to compute that. To put it in perspective just how big this number is. The permutation of 8 numbers from 0~9 is: 1.814.400
Starting with: (0, 1, 2, 3, 4, 5, 6, 7), ending in (9, 8, 7, 6, 5, 4, 3, 2)
You can demonstrate how many permutation of 8 there's in all the ASCII letters with the numbers from 0~9 using this:
mylist = range(10)
mylist.extend(ascii_letters)
i = 0
for n in permutations(mylist,8):
i += 1
But this will take VERY LONG, just to show how big this number is:
I ran it for a couple of minutes and it was over 1.500.000.000.( 1.5 billion )
Also, your code doesn't make much sense. Why do you need to calculate such big number? Why do you need to write it to a file(it will probably take forever/run out memory and/or space). Try elaborating what you want.

Python, divide string into several substrings

I have a string of RNA i.e:
AUGGCCAUA
I would like to generate all substrings by the following way:
#starting from 0 character
AUG, GCC, AUA
#starting from 1 character
UGG, CCA
#starting from 2 character
GGC, CAU
I wrote a code that solves the first sub-problem:
for i in range(0,len(rna)):
if fmod(i,3)==0:
print rna[i:i+3]
I have tried to change the starting position i.e.:
for i in range(1,len(rna)):
But it produces me the incorrect results:
GCC, UA #instead of UGG, CCA
Could you please give me a hint where is my mistake?
The problem with your code is that you are always extracting substring from the index which is divisible by 3. Instead, try this
a = 'AUGGCCAUA'
def getSubStrings(RNA, position):
return [RNA[i:i+3] for i in range(position, len(RNA) - 2, 3)]
print getSubStrings(a, 0)
print getSubStrings(a, 1)
print getSubStrings(a, 2)
Output
['AUG', 'GCC', 'AUA']
['UGG', 'CCA']
['GGC', 'CAU']
Explanation
range(position, len(RNA) - 2, 3) will generate a list of numbers with common difference 3, starting from the position till the length of the list - 2. For example,
print range(1, 8, 3)
1 is the starting number, 8 is the last number, 3 is the common difference and it will give
[1, 4, 7]
These are our starting indices. And then we use list comprehension to generate the new list like this
[RNA[i:i+3] for i in range(position, len(RNA) - 2, 3)]
Is this what you're looking for?
for i in range(len(rna)):
if rna[i+3:]:
print(rna[i:i+3])
outputs:
AUG
UGG
GGC
GCC
CCA
CAU
I thought of this oneliner:
a = 'AUGGCCAUA'
[a[x:x+3] for x in range(len(a))][:-2]
def generate(str, index):
for i in range(index, len(str), 3):
if len(str[i:i+3]) == 3:
print str[i:i+3]
Example:
In [29]: generate(str, 1)
UGG
CCA
In [30]: generate(str, 0)
AUG
GCC
AUA

Categories