Write a Python formatted generator

Write a Python formatted generator - python

To generate a Tecplot file I use:
import numpy as np
x, y = np.genfromtxt('./files.dat', unpack=True)
nb_value = x.size
x_splitted = np.split(x, nb_value // 1000 + 1)
y_splitted = np.split(y, nb_value // 1000 + 1)
with open('./test.dat', 'w') as f:
f.write('TITLE = \" YOUPI \" \n')
f.write('VARIABLES = \"x\" \"Y\" \n')
f.write('ZONE T = \"zone1 \" , I=' + str(nb_value) + ', F=BLOCK \n')
for idx in range(len(x_splitted)):
string_list = ["%.7E" % val for val in x_splitted[idx]]
f.write('\t'.join(string_list)+'\n')
for idx in range(len(y_splitted)):
string_list = ["%.7E" % val for val in y_splitted[idx]]
f.write('\t'.join(string_list)+'\n')
Here is an example of file.dat:
-6.491083147394967334e-02 6.917197804459292456e+02
-6.489978349202699115e-02 6.871829941905543819e+02
-6.481115367048655151e-02 6.707292800160890920e+02
-6.479991205404790622e-02 6.756112033303363660e+02
-6.471117816968344205e-02 7.666798999627604871e+02
-6.469995628177811764e-02 7.819675271405360490e+02
This code is working but I have seen that I should use .format() instead of %. This is running: string_list = ["{}".format(list(val for val in y_splitted[idx]))] but won't work with Tecplot because we need 7E.
If I try: string_list = ["{.7E}".format(list(val for val in y_splitted[idx]))] it doesn't work at all. I got: AttributeError: 'list' object has no attribute '7E'
What would be the best way to do what I am trying to do?

Formatting specifiers come after a : colon:
["{:.7E}".format(val) for val in y_splitted[idx]]
Note that I had to adjust your list comprehension syntax as well; you only want to apply each val to str.format(), not the whole loop. In essence, you only needed to replace the "%.7E" % val part here.
See the Format String Syntax documentation:
replacement_field ::= "{" [field_name] ["!" conversion] [":" format_spec] "}"
Demo:
>>> ["%.7E" % val for val in (2.8, 4.2e5)]
['2.8000000E+00', '4.2000000E+05']
>>> ["{:.7E}".format(val) for val in (2.8, 4.2e5)]
['2.8000000E+00', '4.2000000E+05']
Not that you really need to use str.format() since there is there are no other parts to the string; if all you have is "{:<formatspec>}", just use the format() function and pass in the <formatspec> as the second argument:
[format(val, ".7E") for val in y_splitted[idx]]
Note that in Python, you generally don't loop over a range() then use the index to get a list value. Just loop over the list directly:
for xsplit in x_splitted:
string_list = [format(val, ".7E") for val in xsplit]
f.write('\t'.join(string_list) + '\n')
for ysplit in y_splitted:
string_list = [format(val, ".7E") for val in ysplit]
f.write('\t'.join(string_list)+'\n')
You also don't have to escape the " characters in your strings; you only need to do that when the string delimiters are also " characters; you are using ' instead. You can use str.format() to insert the nb_value there too:
f.write('TITLE = " YOUPI " \n')
f.write('VARIABLES = "x" "Y" \n')
f.write('ZONE T = "zone1 " , I={}, F=BLOCK \n'.format(nb_value))

Related

Replace ip partially with x in python

I have several ip addresses like
162.1.10.15
160.15.20.222
145.155.222.1
I am trying to replace the ip's like below.
162.x.xx.xx
160.xx.xx.xxx
145.xxx.xxx.x
How to achieve this in python.

Here’s a slightly simpler solution
import re
txt = "192.1.2.3"
x = txt.split(".", 1) # ['192', '1.2.3']
y = x[0] + "." + re.sub(r"\d", "x", x[1])
print(y) # 192.x.x.x

We can use re.sub with a callback function here:
def repl(m):
return m.group(1) + '.' + re.sub(r'.', 'x', m.group(2)) + '.' + re.sub(r'.', 'x', m.group(3)) + '.' + re.sub(r'.', 'x', m.group(4))
inp = "160.15.20.222"
output = re.sub(r'\b(\d+)\.(\d+)\.(\d+)\.(\d+)\b', repl, inp)
print(output) # 160.xx.xx.xxx
In the callback, the idea is to use re.sub to surgically replace each digit by x. This keeps the same width of each original number.

This is not the optimize solution but it works for me .
import re
Ip_string = "160.15.20.222"
Ip_string = Ip_string.split('.')
Ip_String_x =""
flag = False
for num in Ip_string:
if flag:
num = re.sub('\d','x',num)
Ip_String_x = Ip_String_x + '.'+ num
else:
flag = True
Ip_String_x = num

Solution 1
Other answers are good, and this single regex works, too:
import re
strings = [
'162.1.10.15',
'160.15.20.222',
'145.155.222.1',
]
for string in strings:
print(re.sub(r'(?:(?<=\.)|(?<=\.\d)|(?<=\.\d\d))\d', 'x', string))
output:
162.x.xx.xx
160.xx.xx.xxx
145.xxx.xxx.x
Explanation
(?<=\.) means following by single dot.
(?<=\.\d) means follwing by single dot and single digit.
(?<=\.\d\d) means following by single dot and double digit.
\d means a digit.
So, all digits that following by single dot and none/single/double digits are replaced with 'x'
(?<=\.\d{0,2}) or similar patterns are not allowed since look-behind ((?<=...)) should has fixed-width.
Solution 2
Without re module and regex,
for string in strings:
first, *rest = string.split('.')
print('.'.join([first, *map(lambda x: 'x' * len(x), rest)]))
above code has same result.

There are multiple ways to go about this. Regex is the most versatile and fancy way to write string manipulation codes. But you can also do it by same old for-loops with split and join functions.
ip = "162.1.10.15"
#Splitting the IPv4 address using '.' as the delimiter
ip = ip.split(".")
#Converting the substrings to x's except 1st string
for i,val in enumerate(ip[1:]):
cnt = 0
for x in val:
cnt += 1
ip[i+1] = "x" * cnt
#Combining the substrings back to ip
ip = ".".join(ip)
print(ip)
I highly recommend checking Regex but this is also a valid way to go about this task.
Hope you find this useful!

Pass an array of IPs to this function:
def replace_ips(ip_list):
r_list=[]
for i in ip_list:
first,*other=i.split(".",3)
r_item=[]
r_item.append(first)
for i2 in other:
r_item.append("x"*len(i2))
r_list.append(".".join(r_item))
return r_list
In case of your example:
print(replace_ips(["162.1.10.15","160.15.20.222","145.155.222.1"]))#==> expected output: ["162.x.xx.xx","160.xx.xx.xxx","145.xxx.xxx.x"]

Oneliner FYI:
import re
ips = ['162.1.10.15', '160.15.20.222', '145.155.222.1']
pattern = r'\d{1,3}'
replacement_sign = 'x'
res = [re.sub(pattern, replacement_sign, ip[::-1], 3)[::-1] for ip in ips]
print(res)

Regex with python dictionary

I am trying to do some "batch" find and replace.
I have the following string:
abc123 = abc122 + V[2] + V[3]
I would like to find every instance of abc{someNumber} = and replace the instance's abc portion with int ijk{someNumber} =, and also replace V[3] with a keyword in a dictionary.
dictToReplace={"[1]": "_i", "[2]":"_j", "[3]":"_k"}
The expected end result would be:
int ijk123 = ijk122 + V_j + V_k
What is the best way to achieve this? RegEx for the first part? Can it also be used for the second?

I'd split the logic in two steps:
1.) First replace the keyword abc\d+
2.) Replace the keys found in dictionary with their respective values
import re
dictToReplace = {"[1]": "_i", "[2]": "_j", "[3]": "_k"}
s = "abc123 = abc122 + V[2] + V[3]"
pat1 = re.compile(r"abc(\d+)")
pat2 = re.compile("|".join(map(re.escape, dictToReplace)))
s = pat1.sub(r"ijk\1", s)
s = pat2.sub(lambda g: dictToReplace[g.group(0)], s)
print(s)
Prints:
ijk123 = ijk122 + V_j + V_k

Use a function as the replacement value in re.sub(). It can then look up the matched value in the dictionary to get the replacement.
string = 'abc123 = abc122 + V[2] + V[3]'
# change abc### to ijk###
result = re.sub(r'abc(\d+)', r'ijk\1', string)
# replace any V[###] with V_xxx from the dict.
result = re.sub(r'V(\[\d+\])', lambda m: 'V' + dictToReplace.get(m.group(1), m.group(1)), result)

Is there a regrex script that can be used to extract texts by defining a start and an end in a text file [duplicate]

Let's say I have a string 'gfgfdAAA1234ZZZuijjk' and I want to extract just the '1234' part.
I only know what will be the few characters directly before AAA, and after ZZZ the part I am interested in 1234.
With sed it is possible to do something like this with a string:
echo "$STRING" | sed -e "s|.*AAA\(.*\)ZZZ.*|\1|"
And this will give me 1234 as a result.
How to do the same thing in Python?

Using regular expressions - documentation for further reference
import re
text = 'gfgfdAAA1234ZZZuijjk'
m = re.search('AAA(.+?)ZZZ', text)
if m:
found = m.group(1)
# found: 1234
or:
import re
text = 'gfgfdAAA1234ZZZuijjk'
try:
found = re.search('AAA(.+?)ZZZ', text).group(1)
except AttributeError:
# AAA, ZZZ not found in the original string
found = '' # apply your error handling
# found: 1234

>>> s = 'gfgfdAAA1234ZZZuijjk'
>>> start = s.find('AAA') + 3
>>> end = s.find('ZZZ', start)
>>> s[start:end]
'1234'
Then you can use regexps with the re module as well, if you want, but that's not necessary in your case.

regular expression
import re
re.search(r"(?<=AAA).*?(?=ZZZ)", your_text).group(0)
The above as-is will fail with an AttributeError if there are no "AAA" and "ZZZ" in your_text
string methods
your_text.partition("AAA")[2].partition("ZZZ")[0]
The above will return an empty string if either "AAA" or "ZZZ" don't exist in your_text.
PS Python Challenge?

Surprised that nobody has mentioned this which is my quick version for one-off scripts:
>>> x = 'gfgfdAAA1234ZZZuijjk'
>>> x.split('AAA')[1].split('ZZZ')[0]
'1234'

you can do using just one line of code
>>> import re
>>> re.findall(r'\d{1,5}','gfgfdAAA1234ZZZuijjk')
>>> ['1234']
result will receive list...

import re
print re.search('AAA(.*?)ZZZ', 'gfgfdAAA1234ZZZuijjk').group(1)

You can use re module for that:
>>> import re
>>> re.compile(".*AAA(.*)ZZZ.*").match("gfgfdAAA1234ZZZuijjk").groups()
('1234,)

In python, extracting substring form string can be done using findall method in regular expression (re) module.
>>> import re
>>> s = 'gfgfdAAA1234ZZZuijjk'
>>> ss = re.findall('AAA(.+)ZZZ', s)
>>> print ss
['1234']

text = 'I want to find a string between two substrings'
left = 'find a '
right = 'between two'
print(text[text.index(left)+len(left):text.index(right)])
Gives
string

>>> s = '/tmp/10508.constantstring'
>>> s.split('/tmp/')[1].split('constantstring')[0].strip('.')

With sed it is possible to do something like this with a string:
echo "$STRING" | sed -e "s|.*AAA\(.*\)ZZZ.*|\1|"
And this will give me 1234 as a result.
You could do the same with re.sub function using the same regex.
>>> re.sub(r'.*AAA(.*)ZZZ.*', r'\1', 'gfgfdAAA1234ZZZuijjk')
'1234'
In basic sed, capturing group are represented by \(..\), but in python it was represented by (..).

You can find first substring with this function in your code (by character index). Also, you can find what is after a substring.
def FindSubString(strText, strSubString, Offset=None):
try:
Start = strText.find(strSubString)
if Start == -1:
return -1 # Not Found
else:
if Offset == None:
Result = strText[Start+len(strSubString):]
elif Offset == 0:
return Start
else:
AfterSubString = Start+len(strSubString)
Result = strText[AfterSubString:AfterSubString + int(Offset)]
return Result
except:
return -1
# Example:
Text = "Thanks for contributing an answer to Stack Overflow!"
subText = "to"
print("Start of first substring in a text:")
start = FindSubString(Text, subText, 0)
print(start); print("")
print("Exact substring in a text:")
print(Text[start:start+len(subText)]); print("")
print("What is after substring \"%s\"?" %(subText))
print(FindSubString(Text, subText))
# Your answer:
Text = "gfgfdAAA1234ZZZuijjk"
subText1 = "AAA"
subText2 = "ZZZ"
AfterText1 = FindSubString(Text, subText1, 0) + len(subText1)
BeforText2 = FindSubString(Text, subText2, 0)
print("\nYour answer:\n%s" %(Text[AfterText1:BeforText2]))

Using PyParsing
import pyparsing as pp
word = pp.Word(pp.alphanums)
s = 'gfgfdAAA1234ZZZuijjk'
rule = pp.nestedExpr('AAA', 'ZZZ')
for match in rule.searchString(s):
print(match)
which yields:
[['1234']]

One liner with Python 3.8 if text is guaranteed to contain the substring:
text[text.find(start:='AAA')+len(start):text.find('ZZZ')]

Just in case somebody will have to do the same thing that I did. I had to extract everything inside parenthesis in a line. For example, if I have a line like 'US president (Barack Obama) met with ...' and I want to get only 'Barack Obama' this is solution:
regex = '.*\((.*?)\).*'
matches = re.search(regex, line)
line = matches.group(1) + '\n'
I.e. you need to block parenthesis with slash \ sign. Though it is a problem about more regular expressions that Python.
Also, in some cases you may see 'r' symbols before regex definition. If there is no r prefix, you need to use escape characters like in C. Here is more discussion on that.

also, you can find all combinations in the bellow function
s = 'Part 1. Part 2. Part 3 then more text'
def find_all_places(text,word):
word_places = []
i=0
while True:
word_place = text.find(word,i)
i+=len(word)+word_place
if i>=len(text):
break
if word_place<0:
break
word_places.append(word_place)
return word_places
def find_all_combination(text,start,end):
start_places = find_all_places(text,start)
end_places = find_all_places(text,end)
combination_list = []
for start_place in start_places:
for end_place in end_places:
print(start_place)
print(end_place)
if start_place>=end_place:
continue
combination_list.append(text[start_place:end_place])
return combination_list
find_all_combination(s,"Part","Part")
result:
['Part 1. ', 'Part 1. Part 2. ', 'Part 2. ']

In case you want to look for multiple occurences.
content ="Prefix_helloworld_Suffix_stuff_Prefix_42_Suffix_andsoon"
strings = []
for c in content.split('Prefix_'):
spos = c.find('_Suffix')
if spos!=-1:
strings.append( c[:spos])
print( strings )
Or more quickly :
strings = [ c[:c.find('_Suffix')] for c in content.split('Prefix_') if c.find('_Suffix')!=-1 ]

Here's a solution without regex that also accounts for scenarios where the first substring contains the second substring. This function will only find a substring if the second marker is after the first marker.
def find_substring(string, start, end):
len_until_end_of_first_match = string.find(start) + len(start)
after_start = string[len_until_end_of_first_match:]
return string[string.find(start) + len(start):len_until_end_of_first_match + after_start.find(end)]

Another way of doing it is using lists (supposing the substring you are looking for is made of numbers, only) :
string = 'gfgfdAAA1234ZZZuijjk'
numbersList = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
output = []
for char in string:
if char in numbersList: output.append(char)
print(f"output: {''.join(output)}")
### output: 1234

Typescript. Gets string in between two other strings.
Searches shortest string between prefixes and postfixes
prefixes - string / array of strings / null (means search from the start).
postfixes - string / array of strings / null (means search until the end).
public getStringInBetween(str: string, prefixes: string | string[] | null,
postfixes: string | string[] | null): string {
if (typeof prefixes === 'string') {
prefixes = [prefixes];
}
if (typeof postfixes === 'string') {
postfixes = [postfixes];
}
if (!str || str.length < 1) {
throw new Error(str + ' should contain ' + prefixes);
}
let start = prefixes === null ? { pos: 0, sub: '' } : this.indexOf(str, prefixes);
const end = postfixes === null ? { pos: str.length, sub: '' } : this.indexOf(str, postfixes, start.pos + start.sub.length);
let value = str.substring(start.pos + start.sub.length, end.pos);
if (!value || value.length < 1) {
throw new Error(str + ' should contain string in between ' + prefixes + ' and ' + postfixes);
}
while (true) {
try {
start = this.indexOf(value, prefixes);
} catch (e) {
break;
}
value = value.substring(start.pos + start.sub.length);
if (!value || value.length < 1) {
throw new Error(str + ' should contain string in between ' + prefixes + ' and ' + postfixes);
}
}
return value;
}

a simple approach could be the following:
string_to_search_in = 'could be anything'
start = string_to_search_in.find(str("sub string u want to identify"))
length = len("sub string u want to identify")
First_part_removed = string_to_search_in[start:]
end_coord = length
Extracted_substring=First_part_removed[:end_coord]

One liners that return other string if there was no match.
Edit: improved version uses next function, replace "not-found" with something else if needed:
import re
res = next( (m.group(1) for m in [re.search("AAA(.*?)ZZZ", "gfgfdAAA1234ZZZuijjk" ),] if m), "not-found" )
My other method to do this, less optimal, uses regex 2nd time, still didn't found a shorter way:
import re
res = ( ( re.search("AAA(.*?)ZZZ", "gfgfdAAA1234ZZZuijjk") or re.search("()","") ).group(1) )

Python - How to split string by non-alphanumberical, but keep any non-alphanumerical aswell

string = "Tes.t / &hi-&"
Expected Output - ["Tes" , "." , "t" , " " ," /" , "&" , "hi" ,"-", "&"]
or
Expected Output - ["Tes" , "." , "t" , " / &" , "hi" , "-&"]
Preferably the latter output would be more better but either would work perfectly.

Code
def splitnonalpha(s):
"""Split whenever the type of following characater is different (i.e. alpha or non-alpha)"""
current = s[0]
result = []
for pos in range(1, len(s)):
if s[pos].isalpha() and current[-1].isalpha():
current += s[pos] # same type as previous
elif not s[pos].isalpha() and not current[-1].isalpha():
current += s[pos] # same type as previous
else:
# Different type-->store current, and reset to current character
result.append(current)
current = s[pos]
if current:
result.append(current)
return result
Test
s = "Tes.t / &hi-&"
print(splitnonalpha(s))
Output
['Tes', '.', 't', ' / &', 'hi', '-&']

You could try something where you check if a character is in ascii_letters or not and add it to the same string or a different to the last depending on this. This could look like;
from string import ascii_letters
import sys
from typing import List
def main(input_string: str) -> List[str]:
output = []
sub_string = ''
last_was_ascii = None
for char in input_string:
char_is_ascii = char in ascii_letters
if last_was_ascii is None or char_is_ascii == last_was_ascii:
sub_string += char
else:
output.append(sub_string)
sub_string = char
last_was_ascii = char_is_ascii
output.append(sub_string)
print(output)
if __name__ == "__main__":
main(*sys.argv[1:])
Which given the command line input python example_file.py "Tes.t / &hi-&" will print ['Tes', '.', 't', ' / &', 'hi', '-&'], i.e. the second example you have listed.
It's a little verbose however does the trick

one solution is to use regex:
find all alphanumerical:
an = re.findall("[a-zA-Z0-9]+", s)
find all non alphanumerical:
non_an = re.findall("[^a-zA-Z0-9]+", s)
zip them:
ziped = zip(an, non_an)
flatten the zip:
flat = sum(ziped, ())
or in a one liner:
sum(zip(re.findall("[a-zA-Z0-9]+", s), re.findall("[^a-zA-Z0-9]+", s)), ())
to cover cases that include more alphanumerical than non alphanumerical (or vice versa) use itertools.zip_longest() and drop nulls:
from itertools import zip_longest
[x for x in sum(zip_longest(re.findall("\w+", s), re.findall("[\W]+", s)), ()) if x]

How to parse a "here document" in Python?

I want to write a Python method that reads a text file with key-values:
FOO=BAR
BUZ=BLEH
I also want to support newlines either through quoting and \n, and by supporting here-docs:
MULTILINE1="This\nis a test"
MULTILINE2= <<DOC
This
is a test
DOC
While the first one is easy to implement, I'm struggling with the second. Is there maybe something in Python's stdlib (i.e. shlex) that I can use already?

"test.txt" content:
FOO=BAR
BUZ=BLEH
MULTILINE1="This\nis a test"
MULTILINE2= <<DOC
This
is a test
DOC
Function:
def read_strange_file(filename):
with open(filename) as f:
file_content = f.read().splitlines()
res = {}
key, value, delim = "", "", ""
for line in file_content:
if "=" in line and not delim:
key, value = line.split("=")
if value.strip(" ").startswith("<<"):
delim = value.strip(" ")[2:] # extracting delimiter keyword
value = ""
continue
if not delim or (delim and line == delim):
if value.startswith("\"") and value.endswith("\""):
# [1: -1] delete quotes
value = bytes(value[1: -1], "utf-8").decode("unicode_escape")
if delim:
value = value[:-1] # delete "\n"
res[key] = value
delim = ""
if delim:
value += line + "\n"
return res
Usage:
result = read_strange_file("test.txt")
print(result)
Output:
{'FOO': 'BAR', 'BUZ': 'BLEH', 'MULTILINE1': 'This\nis a test', 'MULTILINE2': 'This\nis a test'}

I'm assuming that this is the test string (i.e., there are unseen \n characters at the end of each line):
s = ''
s += 'MULTILINE1="This\nis a test"\n'
s += 'MULTILINE2= <<DOC\n'
s += 'This\n'
s += 'is a test\n'
s += 'DOC\n'
The best I can do is to cheat using NumPy:
import numpy as np
A = np.asarray([ss.rsplit('\n', 1) for ss in ('\n'+s).split('=')])
keys = A[:-1,1].tolist()
values = A[1:,0].tolist()
#optionally parse here-documents
di = 'DOC' #delimiting identifier
values = [v.strip().lstrip('<<%s\n'%di).rstrip('\n%s'%di) for v in values]
print('Keys: ', keys)
print('Values: ', values)
#if you want a dictionary:
d = dict( zip(keys, values) )
This results in:
Keys: ['MULTILINE1', 'MULTILINE2']
Values: ['"This\nis a test"', '"This\nis a test"']
It works by sneakily adding a \n character to the beginning of the string, then splitting the whole string by = characters, then finally uses rsplit to retain all values to the right of =, even when those values contain multiple \n characters. Printing the array A makes things clearer:
[['', 'MULTILINE1'],
['"This\nis a test"', 'MULTILINE2'],
[' <<DOC\nThis\nis a test\nDOC', '' ]]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Write a Python formatted generator - python

Related

Replace ip partially with x in python

Regex with python dictionary

Is there a regrex script that can be used to extract texts by defining a start and an end in a text file [duplicate]

Python - How to split string by non-alphanumberical, but keep any non-alphanumerical aswell

How to parse a "here document" in Python?

Categories

Resources