Say we have an numpy.ndarray with numpy.str_ elements. For example, below arr is the numpy.ndarray with two numpy.str_ elements like this:
arr = ['12345"""ABCDEFG' '1A2B3C"""']
Trying to perform string slicing on each numpy element.
For example, how can we slice the first element '12345"""ABCDEFG' so that we replace its 10 last characters with the string REPL, i.e.
arr = ['12345REPL' '1A2B3C"""']
Also, is it possible to perform string substitutions, e.g. substitute all characters after a specific symbol?
Strings are immutable, so you should either create slices and manually recombine or use regular expressions. For example, to replace the last 10 characters of the first element in your array, arr, you could do:
import numpy as np
import re
arr = np.array(['12345"""ABCDEFG', '1A2B3C"""'])
arr[0] = re.sub(arr[0][-10:], 'REPL', arr[0])
print(arr)
#['12345REPL' '1A2B3C"""']
If you want to replace all characters after a specific character you could use a regular expression or find the index of that character in the string and use that as the slicing index.
EDIT: Your comment is more about regular expressions than simply Python slicing, but this is how you could replace everything after the triple quote:
re.sub('["]{3}(.+)', 'REPL', arr[0])
This line essentially says, "Find the triple quote and everything after it, but only replace every character after the triple quotes."
In python, strings are immutable. Also, in NumPy, array scalars are immutable; your string is therefore immutable.
What you would want to do in order to slice is to treat your string like a list and access the elements.
Say we had a string where we wanted to slice at the 3rd letter, excluding the third letter:
my_str = 'purple'
sliced_str = my_str[:3]
Now that we have the part of the string, say we wanted to substitute z's for every letter following where we sliced. We would have to work with the new string that pulled out the letters we wanted, and create an additional string with the desired string that we want to create:
# say I want to replace the end of 'my_str', from where we sliced, with a string named 's'
s = 'dandylion'
new_string = sliced_str + s # returns 'pudandylion'
Because string types are immutable, you have to store elements you want to keep, then combine the stored elements with the elements you would like to add in a new variable.
np.char has replace function, which applies the corresponding string method to each element of the array:
In [598]: arr = np.array(['12345"""ABCDEFG', '1A2B3C"""'])
In [599]: np.char.replace(arr,'"""ABCDEFG',"REPL")
Out[599]:
array(['12345REPL', '1A2B3C"""'],
dtype='<U9')
In this particular example it can be made to work, but it isn't nearly as general purpose as re.sub. Also these char functions are only modestly faster than iterating on the array. There are some good examples of that in #Divakar's link.
Related
I'm trying to get this string into list, how can i do that pleas ?
My string :
x = "[(['xyz1'], 'COM95'), (['xyz2'], 'COM96'), (['xyz3'], 'COM97'), (['xyz4'], 'COM98'), (['xyz5'], 'COM99'), (['xyz6'], 'COM100')]"
I want to convert it to a list, so that:
print(list[0])
Output : (['xyz1'], 'COM95')
If you have this string instead of a list, that presumes it is coming from somewhere outside your control (otherwise you'd just make a proper list). If the string is coming from a source outside your program eval() is dangerous. It will gladly run any code passed to it. In this case you can use ast.liter_eval() which is safer (but make sure you understand the warning on the docs):
import ast
x = "[(['xyz1'], 'COM95'), (['xyz2'], 'COM96'), (['xyz3'], 'COM97'), (['xyz4'], 'COM98'), (['xyz5'], 'COM99'), (['xyz6'], 'COM100')]"
l = ast.literal_eval(x)
Which gives an l of:
[(['xyz1'], 'COM95'),
(['xyz2'], 'COM96'),
(['xyz3'], 'COM97'),
(['xyz4'], 'COM98'),
(['xyz5'], 'COM99'),
(['xyz6'], 'COM100')]
If the structure is uniformly a list of tuples with a one-element list of strings and an individual string, you can manually parse it using the single quote as a separator. This will give you one string value every other component of the split (which you can access using a striding subscript). You can then build the actual tuple from pairing of two values:
tuples = [([a],s) for a,s in zip(*[iter(x.split("'")[1::2])]*2)]
print(tuples[0])
(['xyz1'], 'COM95')
Note that this does not cover the case where an individual string contains a single quote that needed escaping
You mean convert list like string into list? Maybe you can use eval().
For example
a="[1,2,3,4]"
a=eval(a)
Then a become a list
to convert as list use x = eval(x)
print(list[0]) will give you an error because list is a python builtin function
you should do print(x[0]) to get what you want
For example, is it possible to convert the input
x = 10hr
into something like
y = 10
z = hr
I considering slicing, but the individual parts of the string will never be of a fixed length -- for example, the base string could also be something like 365d or 9minutes.
I'm aware of split() and re.match which can separate items into a list/group based on delimitation. But I'm curious what the shortest way to split a string containing a string and an integer into two separate variables is, without having to reassign the elements of the list.
You could use list comprehension and join it as a string
x='10hr'
digits="".join([i for i in x if not i.isalpha()])
letters="".join([i for i in x if i.isalpha()])
You don't need some fancy function or regex for your use case
x = '10hr'
i=0
while x[i].isdigit():
i+=1
The solution assumes that the string is going to be in format you have mentioned: 10hr, 365d, 9minutes, etc..
Above loop will get you the first index value i for the string part
>>i
2
>>x[:i]
'10'
>>x[i:]
'hr'
I have the following array:
a =['1','2']
I want to convert this array into the below format :
a=[1,2]
How can I do that?
You can do it like that. You change each element of a (which are strings) in an integer.
a=[int(x) for x in a]
This single inverted comma you are talking about is the difference between str and int. This is pretty basic python stuff.
A string is a characters, displayed with the inverted comma's around it. 'Hello' is a string, but '1' can be a string too.
In you case ['1','2'] is a list of strings, and [1,2] is a list of numbers.
To convert a string to an int, you can do what is called casting. This is converting one type to another (They have to be compatible though.) Casting 'hello' to a number doesn't make sense and won't work.
Casting '1' to a number is possible by calling int('1') which will result in 1
In your case you can cast all elements in you list by calling a = [int(x) for x in a].
For more info on types see this article.
For information on list comprehensions (What I used to change your list) see this article.
I have an array with 4 integer elements for example [1,0,1,0]
I want to convert it into string '1010'
How do that?
I've tried this
b=''.join(str(syndrome_noised.T))
print(b)
but I got '[1,0,1,0]'.
How this string without brackets.
The reason this fails is because you apply str(..) to the matrix. This will generate a single string. This string is however iterable, so you ''.join(..) the characters of that string back together, turning it into the original string again.
What you probably need to do, is convert every single element into a string, and then join these together, like:
b = ''.join(str(x) for x in syndrome_noised.T)
We thus iterate over the elements x in the syndrome_noised.T array, and we each time map it to a str(..), we then join these together.
We can shorten the code a bit, but still have the same semantics, with map:
b = ''.join(map(str, syndrome_noised.T))
syndrome_noised = [1,0,1,0]
''.join(str(x) for x in syndrome_noised)
you could do this by using a for loop like below:
text = str()
for i in array:
text += str(i)
print(text)
and that would return 1010
I have a string like this that I need to parse into a 2D array:
str = "'813702104[813702106]','813702141[813702143]','813702172[813702174]'"
the array equiv would be:
arr[0][0] = 813702104
arr[0][1] = 813702106
arr[1][0] = 813702141
arr[1][1] = 813702143
#... etc ...
I'm trying to do this by REGEX. The string above is buried in an HTML page but I can be certain it's the only string in that pattern on the page. I'm not sure if this is the best way, but it's all I've got right now.
imgRegex = re.compile(r"(?:'(?P<main>\d+)\[(?P<thumb>\d+)\]',?)+")
If I run imgRegex.match(str).groups() I only get one result (the first couplet). How do I either get multiple matches back or a 2d match object (if such a thing exists!)?
Note: Contrary to how it might look, this is not homework
Note part deux: The real string is embedded in a large HTML file and therefore splitting does not appear to be an option.
I'm still getting answers for this, so I thought I better edit it to show why I'm not changing the accepted answer. Splitting, though more efficient on this test string, isn't going to extract the parts from a whole HTML file. I could combine a regex and splitting but that seems silly.
If you do have a better way to find the parts from a load of HTML (the pattern \d+\[\d+\] is unique to this string in the source), I'll happily change accepted answers. Anything else is academic.
I would try findall or finditer instead of match.
Edit by Oli: Yeah findall work brilliantly but I had to simplify the regex to:
r"'(?P<main>\d+)\[(?P<thumb>\d+)\]',?"
I think I will not go for regex for this task. Python list comprehension is quite powerful for this
In [27]: s = "'813702104[813702106]','813702141[813702143]','813702172[813702174]'"
In [28]: d=[[int(each1.strip(']\'')) for each1 in each.split('[')] for each in s.split(',')]
In [29]: d[0][1]
Out[29]: 813702106
In [30]: d[1][0]
Out[30]: 813702141
In [31]: d
Out[31]: [[813702104, 813702106], [813702141, 813702143], [813702172, 813702174]]
Modifying your regexp a little,
>>> str = "'813702104[813702106]','813702141[813702143]','813702172[813702174]"
>>> imgRegex = re.compile(r"'(?P<main>\d+)\[(?P<thumb>\d+)\]',?")
>>> print imgRegex.findall(str)
[('813702104', '813702106'), ('813702141', '813702143')]
Which is a "2 dimensional array" - in Python, "a list of 2-tuples".
I've got something that seems to work on your data set:
In [19]: str = "'813702104[813702106]','813702141[813702143]','813702172[813702174]'"
In [20]: ptr = re.compile( r"'(?P<one>\d+)\[(?P<two>\d+)\]'" )
In [21]: ptr.findall( str )
Out [23]:
[('813702104', '813702106'),
('813702141', '813702143'),
('813702172', '813702174')]
Alternatively, you could use Python's [statement for item in list] syntax for building lists. You should find this to be considerably faster than a regex, particularly for small data sets. Larger data sets will show a less marked difference (it only has to load the regular expressions engine once no matter the size), but the listmaker should always be faster.
Start by splitting the string on commas:
>>> str = "'813702104[813702106]','813702141[813702143]','813702172[813702174]'"
>>> arr = [pair for pair in str.split(",")]
>>> arr
["'813702104[813702106]'", "'813702141[813702143]'", "'813702172[813702174]'"]
Right now, this returns the same thing as just str.split(","), so isn't very useful, but you should be able to see how the listmaker works — it iterates through list, assigning each value to item, executing statement, and appending the resulting value to the newly-built list.
In order to get something useful accomplished, we need to put a real statement in, so we get a slice of each pair which removes the single quotes and closing square bracket, then further split on that conveniently-placed opening square bracket:
>>> arr = [pair[1:-2].split("[") for pair in str.split(",")]
>>> arr
>>> [['813702104', '813702106'], ['813702141', '813702143'], ['813702172', '813702174']]
This returns a two-dimensional array like you describe, but the items are all strings rather than integers. If you're simply going to use them as strings, that's far enough. If you need them to be actual integers, you simply use an "inner" listmaker as the statement for the "outer" listmaker:
>>> arr = [[int(x) for x in pair[1:-2].split("[")] for pair in str.split(",")]
>>> arr
>>> [[813702104, 813702106], [813702141, 813702143], [813702172, 813702174]]
This returns a two-dimensional array of the integers representing in a string like the one you provided, without ever needing to load the regular expressions engine.