How to split string with different phones using re?

How to split string with different phones using re? - python

For example there are such phones:
phones = '+35(123) 456 78 90 (123) 555 55 55 (908)985 88 89 (593)592 56 95'
I need to get:
phones_list = ['+35(123) 456 78 90', '(123) 555 55 55', '(908)985 88 89', (593)592 56 95]
Trying to solve using re, but quite a hard task to me.

This approach uses the + or ( to signal the beginning of a phone number. It does not require multiple-spaces:
>>> phones = '+35(123) 456 78 90 (123) 555 55 55 (908)985 88 89 (593)592 56 95'
>>> re.split(r' +(?=[(+])', phones)
['+35(123) 456 78 90', '(123) 555 55 55', '(908)985 88 89', '(593)592 56 95']
This splits the string based on one-or-more spaces followed by either ( or +.
In the regular expression, + matches one or more spaces. (?=[(+]) is a look-ahead. It requires that the spaces be followed by either ( or + but does not consume the ( or +. Because we are using a look-ahead instead of a plain match, the the leading ( and + remain part of the phone number.

Related

How to iterate and print each element in list (Python)

arr(['36 36 30','47 96 90','86 86 86']
I want to store and print the values like this,
36
36
30
47
...
How do I do this using python?

the simplest way is to use for and str.split()
arr=['36 36 30','47 96 90','86 86 86']
for block in arr:
cells = block.split()
for cell in cells:
print(cell)
prints
36
36
30
47
96
90
86
86
86
you can also use a list comprehension like so, which returns the same result.
print("\n".join([ele for block in arr for ele in block.split()]))

You can use lists and split in python. Try in this way:
arr = ['36 36 30','47 96 90','86 86 86']
for i in arr:
elems = i.split()
for elem in elems:
print(elem)

We can try the following approach. Build a single string of space separated numbers, split it, then join on newline.
inp = ['36 36 30', '47 96 90', '86 86 86']
output = '\n'.join(' '.join(inp).split())
print(output)
This prints:
36
36
30
47
96
90
86
86
86

Protect one specific case in regex in python

I need to replace german phone numbers in python, which is well-explained here:
Regexp for german phone number format
Possible formats are:
06442) 3933023
(02852) 5996-0
(042) 1818 87 9919
06442 / 3893023
06442 / 38 93 02 3
06442/3839023
042/ 88 17 890 0
+49 221 549144 – 79
+49 221 - 542194 79
+49 (221) - 542944 79
0 52 22 - 9 50 93 10
+49(0)121-79536 - 77
+49(0)2221-39938-113
+49 (0) 1739 906-44
+49 (173) 1799 806-44
0173173990644
0214154914479
02141 54 91 44 79
01517953677
+491517953677
015777953677
02162 - 54 91 44 79
(02162) 54 91 44 79
I am using the following code:
df['A'] = df['A'].replace(r'(\(?([\d \-\)\–\+\/\(]+)\)?([ .\-–\/]?)([\d]+))', r'\TEL', regex=True)
The Problem is I have dates in the text:
df['A']
2017-03-07 13:48:39 Dear Sear Madam...
This is necassary to keep, how can I exclude the format: 2017-03-07and 13:48:39from my regex replacement?
Short Example:
df['A']
2017-03-077
2017-03-07
0211 11112244
desired output:
df['A']
TEL
2017-03-07
TEL

Any way you slice it you are not dealing with regular data and regular expressions work best with regular data. You are always going to run into "false positives" in your situation.
Your best bet is to write out each pattern individually as a giant OR. Below is the pattern for the first three phone numbers so just do the rest of them.
\d{5}\) \d{7}|\(\d{5}\) \d{4}-\d|\(\d{3}\) \d{4} \d{2} \d{4}
https://regex101.com/r/6NPzup/1

Third line of input not read

Why doesn't it read the third line of input
This is the code that is written in python3. Not much explanation is required as it's very basic programming.
n, x = list(map(int, input().split(" ")))
s = []
print(x)
for i in range(0,3):
s.append(input())
print(s)
print("hello")
Input is :
5 3
89 90 78 93 80
90 91 85 88 86
91 92 83 89 90.5
Output I got:
3
['89 90 78 93 80']
['89 90 78 93 80', '90 91 85 88 86']

You need an additional newline character at the end of your input so that input() would recognize the last line as a line.

I changed the .split(" ") to .split() and got the output.

Find all matches of permutations within allotted time

I'm writing a program that takes 9 characters, creates all possible permutations, and grabs a dictionary files for each character and then creates a set of all possible words. What I need to do is compare all permutations to words and return matches.
import os, itertools
def parsed(choices):
mySet = set()
location = os.getcwd()
for item in choices:
filename = location + "\\dicts\\%s.txt" % (item)
mySet.update(open(filename).read().splitlines())
return mySet
def permutations(input):
possibilities = []
pospos = []
for x in range(3,9):
pospos.append([''.join(i) for i in itertools.permutations(input, x)])
for pos in pospos:
for i in pos:
possibilities.append(i)
return possibilities
The problematic function is this one:
def return_matches():
matches = []
words = parsed(['s','m','o','k','e', 'j', 'a', 'c', 'k'])
pos = permutations(['s','m','o','k','e', 'j', 'a', 'c', 'k'])
for item in pos:
if item in words:
matches.append(item)
return matches
This code should return:
matches = ['a', 'om', 'ja', 'jo', ..., 'jacks', 'cokes', 'kecks', 'jokes', 'cakes', 'smoke', 'comes', 'makes', 'cameos']
If I get this code to work properly, it takes 10 - 15 minutes to complete. On the other hand, every attempt at making this execute within allotted time, it can only be done with 5 or less characters or returns the wrong result.
So my question is how to optimize this code to return the right result, within 30 seconds time.
Edit
http://www.mso.anu.edu.au/~ralph/OPTED/v003 this is the website I'm scraping the dictionary files from.

It wastes RAM and time storing all the permutations in a list before you test if they're valid. Instead, test the permutations as you generate them, and save the valid ones into a set to eliminate duplicates.
Duplicates are possible because of the way itertools.permutations works:
Elements are treated as unique based on their position, not on their
value. So if the input elements are unique, there will be no repeat
values in each permutation.
Your input word "SMOKEJACK" contains 2 Ks, so every permutation containing K gets generated twice.
Anyway, here's some code that uses the SOWPODS Scrabble word list for English.
from itertools import permutations
# Get all the words from the SOWPODS file
all_words = set('AI')
fname = 'scrabble_wordlist_sowpods.txt'
with open(fname) as f:
all_words.update(f.read().splitlines())
print(len(all_words))
choices = 'SMOKEJACK'
# Generate all permutations of `choices` from length 3 to 8
# and save them in a set to eliminate duplicates.
matches = set()
for n in range(3, 9):
for t in permutations(choices, n):
s = ''.join(t)
if s in all_words:
matches.add(s)
for i, s in enumerate(sorted(matches)):
print('{:3} {}'.format(i, s))
output
216555
0 ACE
1 ACES
2 ACME
3 ACMES
4 AESC
5 AKE
6 AKES
7 AMOK
8 AMOKS
9 ASK
10 CAKE
11 CAKES
12 CAM
13 CAME
14 CAMEO
15 CAMEOS
16 CAMES
17 CAMS
18 CASE
19 CASK
20 CEAS
21 COKE
22 COKES
23 COMA
24 COMAE
25 COMAKE
26 COMAKES
27 COMAS
28 COME
29 COMES
30 COMS
31 COS
32 COSE
33 COSMEA
34 EAS
35 EKKA
36 EKKAS
37 EMS
38 JACK
39 JACKS
40 JAK
41 JAKE
42 JAKES
43 JAKS
44 JAM
45 JAMES
46 JAMS
47 JOCK
48 JOCKS
49 JOE
50 JOES
51 JOKE
52 JOKES
53 KAE
54 KAES
55 KAM
56 KAME
57 KAMES
58 KAS
59 KEA
60 KEAS
61 KECK
62 KECKS
63 KEKS
64 KOA
65 KOAS
66 KOS
67 MAC
68 MACE
69 MACES
70 MACK
71 MACKS
72 MACS
73 MAE
74 MAES
75 MAK
76 MAKE
77 MAKES
78 MAKO
79 MAKOS
80 MAKS
81 MAS
82 MASE
83 MASK
84 MES
85 MESA
86 MOA
87 MOAS
88 MOC
89 MOCK
90 MOCKS
91 MOCS
92 MOE
93 MOES
94 MOKE
95 MOKES
96 MOS
97 MOSE
98 MOSK
99 OAK
100 OAKS
101 OCA
102 OCAS
103 OES
104 OKA
105 OKAS
106 OKE
107 OKES
108 OMS
109 OSE
110 SAC
111 SACK
112 SAE
113 SAKE
114 SAM
115 SAME
116 SAMEK
117 SCAM
118 SEA
119 SEAM
120 SEC
121 SECO
122 SKA
123 SKEO
124 SMA
125 SMACK
126 SMOCK
127 SMOKE
128 SOAK
129 SOC
130 SOCA
131 SOCK
132 SOJA
133 SOKE
134 SOMA
135 SOME
This code runs in around 2.5 seconds on my rather ancient 32 bit 2GHz machine running Python 3.6.0 on Linux. It's slightly faster on Python 2 (since Python2 strings are ASCII, not Unicode).

Instead of generating all the permutations of your letters, you should use a Prefix Tree, or Trie, to keep track of all the prefixes to valid words.
def make_trie(words):
res = {}
for word in words:
d = res
for c in word:
d = d.setdefault(c, {})
d["."] = None
return res
We are using d["."] = None here to signify where a prefix actually becomes a valid word. Creating the tree can take a few seconds, but you only have to do this once.
Now, we can go through our letters in a recursive function, checking for each letter whether it contributes to a valid prefix in the current stage of the recursion: (That rest = letters[:i] + letters[i+1:] part is not very efficient, but as we will see it does not matter much.)
def find_words(trie, letters, prefix=""):
if "." in trie: # found a full valid word
yield prefix
for i, c in enumerate(letters):
if c in trie: # contributes to valid prefix
rest = letters[:i] + letters[i+1:]
for res in find_words(trie[c], rest, prefix + c):
yield res # all words starting with that prefix
Minimal example:
>>> trie = make_trie(["cat", "cats", "act", "car", "carts", "cash"])
>>> trie
{'a': {'c': {'t': {'.': None}}}, 'c': {'a': {'r': {'t': {'s':
{'.': None}}, '.': None}, 's': {'h': {'.': None}}, 't':
{'s': {'.': None}, '.': None}}}}
>>> set(find_words(trie, "acst"))
{'cat', 'act', 'cats'}
Or with your 9 letters and the words from sowpods.txt:
with open("sowpods.txt") as words:
trie = make_trie(map(str.strip, words)) # ~1.3 s on my system, only once
res = set(find_words(trie, "SMOKEJACK")) # ~2 ms on my system
You have to pipe the result through a set as you have duplicate letters. This yields 153 words, after a total of 623 recursive calls to find_words (measured with a counter variable). Compare that to 216,555 words in the sowpods.txt file and a total of 986,409 permutations of all the 1-9 letter combinations that could make up a valid word. Thus, once the trie is initially generated, res = set(find_words(...)) takes only a few milli seconds.
You could also change the find_words function to use a mutable dictionary of letter counts instead of a string or list of letters. This way, no duplicates are generated and the function is called fewer times, but the overall running time does not change much.
def find_words(trie, letters, prefix=""):
if "." in trie:
yield prefix
for c in letters:
if letters[c] and c in trie:
letters[c] -= 1
for res in find_words(trie[c], letters, prefix + c):
yield res
letters[c] += 1
Then call it like this: find_words(trie, collections.Counter("SMOKEJACK"))

How do i convert a string of ascii values to there original character/number in python

i have a string with numbers that i previously converted with my encoder but now i am trying to decode it ive searched around and no answers seem to work
if you have any i dear how to do this then let me know
string = 91 39 65 97 66 98 67 99 32 49 50 51 39 93
outcome = ABCabc 123

outcome = "".join([your_decoder.decode(x) for x in string.split(" ")])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to split string with different phones using re? - python

For example there are such phones: phones = '+35(123) 456 78 90 (123) 555 55 55 (908)985 88 89 (593)592 56 95' I need to get: phones_list = ['+35(123) 456 78 90', '(123) 555 55 55', '(908)985 88 89', (593)592 56 95] Trying to solve using re, but quite a hard task to me.

Related

How to iterate and print each element in list (Python)

Protect one specific case in regex in python

Third line of input not read

Find all matches of permutations within allotted time

How do i convert a string of ascii values to there original character/number in python

Categories

Resources