What is the best way to create a string array in python? - python

I'm relatively new to Python and it's libraries and I was wondering how I might create a string array with a preset size. It's easy in java but I was wondering how I might do this in python.
So far all I can think of is
strs = ['']*size
And some how when I try to call string methods on it, the debugger gives me an error X operation does not exist in object tuple.
And if it was in java this is what I would want to do.
String[] ar = new String[size];
Arrays.fill(ar,"");
Please help.
Error code
strs[sum-1] = strs[sum-1].strip('\(\)')
AttributeError: 'tuple' object has no attribute 'strip'
Question: How might I do what I can normally do in Java in Python while still keeping the code clean.

In python, you wouldn't normally do what you are trying to do. But, the below code will do it:
strs = ["" for x in range(size)]

In Python, the tendency is usually that one would use a non-fixed size list (that is to say items can be appended/removed to it dynamically). If you followed this, there would be no need to allocate a fixed-size collection ahead of time and fill it in with empty values. Rather, as you get or create strings, you simply add them to the list. When it comes time to remove values, you simply remove the appropriate value from the string. I would imagine you can probably use this technique for this. For example (in Python 2.x syntax):
>>> temp_list = []
>>> print temp_list
[]
>>>
>>> temp_list.append("one")
>>> temp_list.append("two")
>>> print temp_list
['one', 'two']
>>>
>>> temp_list.append("three")
>>> print temp_list
['one', 'two', 'three']
>>>
Of course, some situations might call for something more specific. In your case, a good idea may be to use a deque. Check out the post here: Python, forcing a list to a fixed size. With this, you can create a deque which has a fixed size. If a new value is appended to the end, the first element (head of the deque) is removed and the new item is appended onto the deque. This may work for what you need, but I don't believe this is considered the "norm" for Python.

The simple answer is, "You don't." At the point where you need something to be of fixed length, you're either stuck on old habits or writing for a very specific problem with its own unique set of constraints.

The best and most convenient method for creating a string array in python is with the help of NumPy library.
Example:
import numpy as np
arr = np.chararray((rows, columns))
This will create an array having all the entries as empty strings. You can then initialize the array using either indexing or slicing.

Are you trying to do something like this?
>>> strs = [s.strip('\(\)') for s in ['some\\', '(list)', 'of', 'strings']]
>>> strs
['some', 'list', 'of', 'strings']

But what is a reason to use fixed size? There is no actual need in python to use fixed size arrays(lists) so you always have ability to increase it's size using append, extend or decrease using pop, or at least you can use slicing.
x = ['' for x in xrange(10)]

strlist =[{}]*10
strlist[0] = set()
strlist[0].add("Beef")
strlist[0].add("Fish")
strlist[1] = {"Apple", "Banana"}
strlist[1].add("Cherry")
print(strlist[0])
print(strlist[1])
print(strlist[2])
print("Array size:", len(strlist))
print(strlist)

The error message says it all: strs[sum-1] is a tuple, not a string. If you show more of your code someone will probably be able to help you. Without that we can only guess.

Sometimes I need a empty char array. You cannot do "np.empty(size)" because error will be reported if you fill in char later. Then I usually do something quite clumsy but it is still one way to do it:
# Suppose you want a size N char array
charlist = [' ']*N # other preset character is fine as well, like 'x'
chararray = np.array(charlist)
# Then you change the content of the array
chararray[somecondition1] = 'a'
chararray[somecondition2] = 'b'
The bad part of this is that your array has default values (if you forget to change them).

def _remove_regex(input_text, regex_pattern):
findregs = re.finditer(regex_pattern, input_text)
for i in findregs:
input_text = re.sub(i.group().strip(), '', input_text)
return input_text
regex_pattern = r"\buntil\b|\bcan\b|\bboat\b"
_remove_regex("row and row and row your boat until you can row no more", regex_pattern)
\w means that it matches word characters, a|b means match either a or b, \b represents a word boundary

If you want to take input from user here is the code
If each string is given in new line:
strs = [input() for i in range(size)]
If the strings are separated by spaces:
strs = list(input().split())

Related

Remove duplicates of set of characters in string - Python

I have a string '1a1b1c1d3e3e3e1f1g2h2h1i1j1k1l1m1n4o4o4o4o1p1q2r2r1s2t2t2u2u1v1w1x1y1z'
and I want to remove all of the duplicates of these charterers: 3e, 4o, 2r etc.
How can I do that in Python?
str_='1a1b1c1d3e3e3e1f1g2h2h1i1j1k1l1m1n4o4o4o4o1p1q2r2r1s2t2t2u2u1v1w1x1y1z'
seen = set()
result = []
n=2
for i in range(0,len(str_),n):
item=str_[i:i+n]
if item not in seen:
seen.add(item)
result.append(item)
This is a pretty crude way of doing it.
But it seams to do the job without begin to complicated.
This also assumes that it's known character compositions you need to remove. You didn't mention that you need to remove all duplicates, only a set of known ones.
x = '1a1b1c1d3e3e3e1f1g2h2h1i1j1k1l1m1n4o4o4o4o1p1q2r2r1s2t2t2u2u1v1w1x1y1z'
for y in ['3e', '4o', '2r']:
x = x[:x.find(y)+len(y)] + x[x.find(y)+len(y):].replace(y, '')
print(x)
Finds the first occurance of your desired object (3e for instance) and builds a new version of the string up to and including that object, and prepends the string with the rest of the original string but with replacing your object with a empty string.
This is a bit slow, but again, gets the job done.
No error handling here tho so be wary of -1 positions etc.
You can use list comprehension and set to do this in the following way:
s = '1a1b1c1d3e3e3e1f1g2h2h1i1j1k1l1m1n4o4o4o4o1p1q2r2r1s2t2t2u2u1v1w1x1y1z'
s = [s[i:i+2] for i in range(0, len(s) - 1, 2)]
s = set(s)
Hope it helps

Check if an int/str is in the list and its location. Python 3.3.2

Say that I have a list that looks something like the this :
MyList = [1,2,3,4,5,"z","x","c","v","b"]
Now the users inputs : "5z1b3". How would you replace each int/str with its location in the list. I'm thinking of using something like this:
for x in MyList.... if located in list, replace with letter/number with its location.
Not entirely sure how to do it though. Help would be much appreciated.
edit::::: It's something I'm working on and I must use both ints and strs in the list. Also I lied about the output I need. Thanks for mentioning it avarnert. commas between each letter/number in the output would make it work for me. Any ideas how to do it ?
Use a list comprehension:
[MyList.index(c) for c in inputstring]
This'll have to scan through MyList for each entry; you could optimize that quite a bit by using a dictionary indexing from character to position; this has the added advantage we can ensure we only have strings as well:
index = {str(c): i for i, c in enumerate(MyList)}
[index[c] for c in inputstring]
If you then need a formatted string, turn the indices to strings and join the final output:
index = {str(c): str(i) for i, c in enumerate(MyList)}
','.join([index[c] for c in inputstring])
I would go about using the list.index() method. See below for example:
MyList = [MyList.index(chr) for chr in user_input]
EDIT:
This however assumes that each character from the user input will be found in MyList, and also that each character in MyList will appear only once.

Python text encryption: rot13

I am currently doing an assignment that encrypts text by using rot 13, but some of my text wont register.
# cgi is to escape html
# import cgi
def rot13(s):
#string encrypted
scrypt=''
alph='abcdefghijklmonpqrstuvwxyz'
for c in s:
# check if char is in alphabet
if c.lower() in alph:
#find c in alph and return its place
i = alph.find(c.lower())
#encrypt char = c incremented by 13
ccrypt = alph[ i+13 : i+14 ]
#add encrypted char to string
if c==c.lower():
scrypt+=ccrypt
if c==c.upper():
scrypt+=ccrypt.upper()
#dont encrypt special chars or spaces
else:
scrypt+=c
return scrypt
# return cgi.escape(scrypt, quote = True)
given_string = 'Rot13 Test'
print rot13(given_string)
OUTPUT:
13 r
[Finished in 0.0s]
Hmmm, seems like a bunch of things are not working.
Main problem should be in ccrypt = alph[ i+13 : i+14 ]: you're missing a % len(alph) otherwise if, for example, i is equal to 18, then you'll end out of the list boundary.
In your output, in fact, only e is encoded to r because it's the only letter in your test string which, moved by 13, doesn't end out of boundary.
The rest of this answer are just tips to clean the code a little bit:
instead of alph='abc.. you can declare an import string at the beginning of the script and use a string.lowercase
instead of using string slicing, for just one character it's better to use string[i], gets the work done
instead of c == c.upper(), you can use builtin function if c.isupper() ....
The trouble you're having is with your slice. It will be empty if your character is in the second half of the alphabet, because i+13 will be off the end. There are a few ways you could fix it.
The simplest might be to simply double your alphabet string (literally: alph = alph * 2). This means you can access values up to 52, rather than just up to 26. This is a pretty crude solution though, and it would be better to just fix the indexing.
A better option would be to subtract 13 from your index, rather than adding 13. Rot13 is symmetric, so both will have the same effect, and it will work because negative indexes are legal in Python (they refer to positions counted backwards from the end).
In either case, it's not actually necessary to do a slice at all. You can simply grab a single value (unlike C, there's no char type in Python, so single characters are strings too). If you were to make only this change, it would probably make it clear why your current code is failing, as trying to access a single value off the end of a string will raise an exception.
Edit: Actually, after thinking about what solution is really best, I'm inclined to suggest avoiding index-math based solutions entirely. A better approach is to use Python's fantastic dictionaries to do your mapping from original characters to encrypted ones. You can build and use a Rot13 dictionary like this:
alph="abcdefghijklmnopqrstuvwxyz"
rot13_table = dict(zip(alph, alph[13:]+alph[:13])) # lowercase character mappings
rot13_table.update((c.upper(),rot13_table[c].upper()) for c in alph) # upppercase
def rot13(s):
return "".join(rot13_table.get(c, c) for c in s) # non-letters are ignored
First thing that may have caused you some problems - your string list has the n and the o switched, so you'll want to adjust that :) As for the algorithm, when you run:
ccrypt = alph[ i+13 : i+14 ]
Think of what happens when you get 25 back from the first iteration (for z). You are now looking for the index position alph[38:39] (side note: you can actually just say alph[38]), which is far past the bounds of the 26-character string, which will return '':
In [1]: s = 'abcde'
In [2]: s[2]
Out[2]: 'c'
In [3]: s[2:3]
Out[3]: 'c'
In [4]: s[49:50]
Out[4]: ''
As for how to fix it, there are a number of interesting methods. Your code functions just fine with a few modifications. One thing you could do is create a mapping of characters that are already 'rotated' 13 positions:
alph = 'abcdefghijklmnopqrstuvwxyz'
coded = 'nopqrstuvwxyzabcdefghijklm'
All we did here is split the original list into halves of 13 and then swap them - we now know that if we take a letter like a and get its position (0), the same position in the coded list will be the rot13 value. As this is for an assignment I won't spell out how to do it, but see if that gets you on the right track (and #Makoto's suggestion is a perfect way to check your results).
This line
ccrypt = alph[ i+13 : i+14 ]
does not do what you think it does - it returns a string slice from i+13 to i+14, but if these indices are greater than the length of the string, the slice will be empty:
"abc"[5:6] #returns ''
This means your solution turns everything from n onward into an empty string, which produces your observed output.
The correct way of implementing this would be (1.) using a modulo operation to constrain the index to a valid number and (2.) using simple character access instead of string slices, which is easier to read, faster, and throws an IndexError for invalid indices, meaning your error would have been obvious.
ccrypt = alph[(i+13) % 26]
If you're doing this as an exercise for a course in Python, ignore this, but just saying...
>>> import codecs
>>> codecs.encode('Some text', 'rot13')
'Fbzr grkg'
>>>

Python: removing specific lines from an object

I have a bit of a weird question here.
I am using iperf to test performance between a device and a server. I get the results of this test over SSH, which I then want to parse into values using a parser that has already been made. However, there are several lines at the top of the results (which I read into an object of lines) that I don't want to go into the parser. I know exactly how many lines I need to remove from the top each time though. Is there any way to drop specific entries out of a list? Something like this in psuedo-python
print list
["line1","line2","line3","line4"]
list = list.drop([0 - 1])
print list
["line3","line4"]
If anyone knows anything I could use I would really appreciate you helping me out. The only thing I can think of is writing a loop to iterate through and make a new list only putting in what I need. Anyway, thanlks!
Michael
Slices:
l = ["line1","line2","line3","line4"]
print l[2:] # print from 2nd element (including) onwards
["line3","line4"]
Slices syntax is [from(included):to(excluded):step]. Each part is optional. So you can write [:] to get the whole list (or any iterable for that matter -- string and tuple as an example from the built-ins). You can also use negative indexes, so [:-2] means from beginning to the second last element. You can also step backwards, [::-1] means get all, but in reversed order.
Also, don't use list as a variable name. It overrides the built-in list class.
This is what the slice operator is for:
>>> before = [1,2,3,4]
>>> after = before[2:]
>>> print after
[3, 4]
In this instance, before[2:] says 'give me the elements of the list before, starting at element 2 and all the way until the end.'
(also -- don't use reserved words like list or dict as variable names -- doing that can lead to confusing bugs)
You can use slices for that:
>>> l = ["line1","line2","line3","line4"] # don't use "list" as variable name, it's a built-in.
>>> print l[2:] # to discard items up to some point, specify a starting index and no stop point.
['line3', 'line4']
>>> print l[:1] + l[3:] # to drop items "in the middle", join two slices.
['line1', 'line4']
why not use a basic list slice? something like:
list = list[3:] #everything from the 3 position to the end
You want del for that
del list[:2]
You can use "del" statment to remove specific entries :
del(list[0]) # remove entry 0
del(list[0:2]) # remove entries 0 and 1

Python regex to parse into a 2D array

I have a string like this that I need to parse into a 2D array:
str = "'813702104[813702106]','813702141[813702143]','813702172[813702174]'"
the array equiv would be:
arr[0][0] = 813702104
arr[0][1] = 813702106
arr[1][0] = 813702141
arr[1][1] = 813702143
#... etc ...
I'm trying to do this by REGEX. The string above is buried in an HTML page but I can be certain it's the only string in that pattern on the page. I'm not sure if this is the best way, but it's all I've got right now.
imgRegex = re.compile(r"(?:'(?P<main>\d+)\[(?P<thumb>\d+)\]',?)+")
If I run imgRegex.match(str).groups() I only get one result (the first couplet). How do I either get multiple matches back or a 2d match object (if such a thing exists!)?
Note: Contrary to how it might look, this is not homework
Note part deux: The real string is embedded in a large HTML file and therefore splitting does not appear to be an option.
I'm still getting answers for this, so I thought I better edit it to show why I'm not changing the accepted answer. Splitting, though more efficient on this test string, isn't going to extract the parts from a whole HTML file. I could combine a regex and splitting but that seems silly.
If you do have a better way to find the parts from a load of HTML (the pattern \d+\[\d+\] is unique to this string in the source), I'll happily change accepted answers. Anything else is academic.
I would try findall or finditer instead of match.
Edit by Oli: Yeah findall work brilliantly but I had to simplify the regex to:
r"'(?P<main>\d+)\[(?P<thumb>\d+)\]',?"
I think I will not go for regex for this task. Python list comprehension is quite powerful for this
In [27]: s = "'813702104[813702106]','813702141[813702143]','813702172[813702174]'"
In [28]: d=[[int(each1.strip(']\'')) for each1 in each.split('[')] for each in s.split(',')]
In [29]: d[0][1]
Out[29]: 813702106
In [30]: d[1][0]
Out[30]: 813702141
In [31]: d
Out[31]: [[813702104, 813702106], [813702141, 813702143], [813702172, 813702174]]
Modifying your regexp a little,
>>> str = "'813702104[813702106]','813702141[813702143]','813702172[813702174]"
>>> imgRegex = re.compile(r"'(?P<main>\d+)\[(?P<thumb>\d+)\]',?")
>>> print imgRegex.findall(str)
[('813702104', '813702106'), ('813702141', '813702143')]
Which is a "2 dimensional array" - in Python, "a list of 2-tuples".
I've got something that seems to work on your data set:
In [19]: str = "'813702104[813702106]','813702141[813702143]','813702172[813702174]'"
In [20]: ptr = re.compile( r"'(?P<one>\d+)\[(?P<two>\d+)\]'" )
In [21]: ptr.findall( str )
Out [23]:
[('813702104', '813702106'),
('813702141', '813702143'),
('813702172', '813702174')]
Alternatively, you could use Python's [statement for item in list] syntax for building lists. You should find this to be considerably faster than a regex, particularly for small data sets. Larger data sets will show a less marked difference (it only has to load the regular expressions engine once no matter the size), but the listmaker should always be faster.
Start by splitting the string on commas:
>>> str = "'813702104[813702106]','813702141[813702143]','813702172[813702174]'"
>>> arr = [pair for pair in str.split(",")]
>>> arr
["'813702104[813702106]'", "'813702141[813702143]'", "'813702172[813702174]'"]
Right now, this returns the same thing as just str.split(","), so isn't very useful, but you should be able to see how the listmaker works — it iterates through list, assigning each value to item, executing statement, and appending the resulting value to the newly-built list.
In order to get something useful accomplished, we need to put a real statement in, so we get a slice of each pair which removes the single quotes and closing square bracket, then further split on that conveniently-placed opening square bracket:
>>> arr = [pair[1:-2].split("[") for pair in str.split(",")]
>>> arr
>>> [['813702104', '813702106'], ['813702141', '813702143'], ['813702172', '813702174']]
This returns a two-dimensional array like you describe, but the items are all strings rather than integers. If you're simply going to use them as strings, that's far enough. If you need them to be actual integers, you simply use an "inner" listmaker as the statement for the "outer" listmaker:
>>> arr = [[int(x) for x in pair[1:-2].split("[")] for pair in str.split(",")]
>>> arr
>>> [[813702104, 813702106], [813702141, 813702143], [813702172, 813702174]]
This returns a two-dimensional array of the integers representing in a string like the one you provided, without ever needing to load the regular expressions engine.

Categories