Replace Strings in Python 3 - python

I'm trying to replace a string in python like this:
private_ips.replace("{",'')
The error I get back is this:
Traceback (most recent call last):
File ".\aws_ec2_list_instances.py", line 39, in <module>
private_ips.replace("{",'')
AttributeError: 'set' object has no attribute 'replace'
What am I doing wrong?

private_ips is set object. You can use replace only on strings.
To represent set as string take this code snippet:
private_ips_as_string = '{' + ', '.join(str(elem) for elem in private_ips) + '}'

Let's back up a little ...
tree = objectpath.Tree(instance)
private_ips = set(tree.execute('$..PrivateIpAddress'))
Your initial problem is that you specifically converted the return value into a set. If you don't want a set, then don't convert it to one, or convert it back to something more useful to you. Since you've failed to provide a Minimal, complete, verifiable example, we can't fix everything, but I'll use an intuitive leap here ...
tree.execute returns a list of IP addresses.
You're using set to remove duplicate addresses in a list.
If so, you're fine up to this point. To get the address as a string, I think you want to iterate through the items in the set:
for ip_addr in private_ips:
# Handle ip_addr, a single IP address seen as a str.
If you need the addresses lined up, you can always convert back to a list with
private_ips = list(private_ips)
... and if you know there is exactly one addr that you want as a string, you can grab it in one step with
single_ip = list(private_ips)[0]
... or just grab it directly from your function's return value:
single_ip = tree.execute('$..PrivateIpAddress')[0]
To explain what did happen to you:
You called a function that return a sequence of some sort.
You converted that sequence to a set, a common technique for removing duplicates.
You tried to remove braces from the set, as if it were a string.
The problem is that a set does not have braces. Those braces are a notational convenience; they exist only in the __repr__ (output string representation) of the data type, not in the set itself. You cannot manipulate that representation. This would be something like trying to remove the up-vote and down-vote arrows from this question by editing the question text: you can't do it, because those are part of the delivery framework.
Similarly, you cannot remove the quotation marks from the ends of a string, because they're not part of the string.
To get rid of the braces, you quit using a set: reach inside and pull out the contents as an individual element.

Related

In python, is there a way to remove all text following the last instance of a delimiter?

I'm trying to create a random text generator in python. I'm using Markovify to produce the required text, a filter to not let it start generating text unless the first word is capitalized and, to prevent it from ending "mid sentence", want the program to search from the back of the output to the front and remove all text after the last (for instance) period. I want it to ignore all other instances of the selected delimiter(s). I have no idea how many instances of the delimiter will occur in the generated text, nor have anyway to know in advance.
While looking into this I found rsplit(), and tried using that, but ran into a problem.
'''tweet = buff.rsplit('.')[-1] '''
The above is what I tried first, and I thought it was working until I noticed that all of the lines printed with that had only a single sentence in them. Never more than that. The problem seems to be that the text is being dumped into an array of strings, and the [-1] bit is calling just one entry from that array.
'''tweet = buff.rsplit('.') - buff.rsplit('.')[-1] '''
Next I tried the above. The thinking, was that it would remove the last entry in the array, and then I could just print what remained. It... didn't go to plan. I get an "unsupported operand type" error, specifically tied to the attempt to subtract. Not sure what I'm missing at this point.
.rsplit has second optional argument - maxsplit i.e. maximum number of split to do. You could use it following way:
txt = 'some.text.with.dots'
all_but_last = txt.rsplit('.', 1)[0]
print(all_but_last)
Output:
some.text.with

unable to convert python type str into a list array

i'm new to python, and i am developing a tool/script for ssh and sftp. i noticed some of the code i'm using creates what i thought was a string array
channel_data = str()
to hold the console output from an ssh session. if i check "type" on channel_data it comes back as class 'str' ,
but yet if i perform for loop to read each item in channel_data , and channel_data contains what appears to be 30 lines from an ssh console
for line in channel_data:
if "my text" in line:
found = True
each iteration of "line" shows a single character, as if the whole ssh console output of 30 lines of text is broken down into single character array. i do have \n within all the text.
for example channel_data would contain "Cisco Nexus Operation System (NX-OS) Software\r\nCopyright (c) 2002-2016\r\n ..... etc. etc.. ", but again would read in my for loop and print out "C" then "i" then "s" etc..
i'm trying to understand do i have a char array here or a string array here that is made up of single string characters and how to convert it into a string list based on \n within Python?
You can iterate a string just like a list in Python. So, yes, as expected, your string type channel_data will in fact give you every character.
Python does not have a char array. You will have a list of strings, even as a single character as each item in the list:
>>> type(['a', 'b'])
<type 'list'>
Also, just for the sake of adding some extra information for your own knowledge when it comes to usage of terminology, there is a difference between array and list in Python: Python List vs. Array - when to use?
So, what you are actually looking to do here is take the channel_data string and make it a list by calling the split method on it.
The split method will, by default, split on white space characters only. Check the documentation. So, you will want to make sure what you want to actually split on and provide that detail to the method.
You can take a look at splitlines to see if that works for you.
As specified in the documentation for splitlines:
Line breaks are not included in the resulting list unless keepends is
given and true.
Your result will then be a list of strings as you expect. So, as an example you can do:
your_new_list_of_str = channel_data.split('\n')
or
your_new_list_of_str = channel_data.splitlines()
string_list = channel_data.splitlines()
See docs at https://docs.python.org/3.6/library/stdtypes.html#str.splitlines

Remove duplicates from a list in Python

I have a python script which parses an xml file and then gives me the required information. My output looks like this, and is 100% correct:
output = ['77:275,77:424,77:425,77:426,77:427,77:412,77:413,77:414,77:412,77:413,77:414,77:412,77:413,77:414,77:412,77:413,77:414,77:431,77:432,77:433,77:435,77:467,77:470,77:471,77:484,77:485,77:475,77:476,77:437,77:438,77:439,77:440,77:442,77:443,77:444,77:445,77:446,77:447,77:449,77:450,77:451,77:454,77:455,77:456,77:305,77:309,77:496,77:497,77:500,77:504,77:506,77:507,77:508,77:513,77:515,77:514,77:517,77:518,77:519,77:521,77:522,77:523,77:403,77:406,77:404,77:405,77:403,77:406,77:404,77:405,77:526,77:496,77:497,77:500,77:504,77:506,77:507,77:508,77:513,77:515,77:514,77:517,77:518,77:519,77:521,77:522,77:523,77:403,77:406,77:404,77:405,77:403,77:406,77:404,77:405,77:526,77:317,77:321,77:346,77:349,77:350,77:351,77:496,77:497,77:500,77:504,77:506,77:507,77:508,77:513,77:515,77:514,77:517,77:518,77:519,77:521,77:522,77:523,77:403,77:406,77:404,77:405,77:403,77:406,77:404,77:405,77:526,77:496,77:497,77:500,77:504,77:506,77:507,77:508,77:513,77:515,77:514,77:517,77:518,77:519,77:521,77:522,77:523,77:403,77:406,77:404,77:405,77:403,77:406,77:404,77:405,77:526,77:362,77:367,77:369,77:374,77:370,77:372,77:373,77:387,77:388,77:389,77:392,77:393,77:394,77:328,77:283,77:284,77:285,77:288,77:289,77:290,77:292,']
It is all fine, but I want to remove the duplicate elements in an element, like in the case above. I tried using the OrderedDict package or just simple list(set(output)), but obvoiusly they both didn't work. Does anyone have a tip for me on how to solve this problem.
You have one element in a list. If you expected it to be treated as separate elements, you need to explicitly split it.
You could split the string on the ',' comma character into a list with str.split():
separate_elements = output[0].split(',')
after which you can use set() (unordered) or OrderedDict (maintaining order) and re-join the string if you still need just the one string object:
','.join(set(separate_elements))
You can put that back into a list with just one element, but there is little point if all you ever handle is that one string.

Searching two things

I am using re and would like to search a string between two strings. My problem is the string that I would like to search may end with either newline(\n) or another string. So what I want to do is if it is newline or another string it should give me back the string. The reason why I want to do that is some of my documents are created wrong in a way that it does not have new line, so I have to get the text until newline and then check if it has the corresponding string.
I have tried this:
recipients = re.search('Recipients:(.*)\n', body)
reciBody = re.search('(.*)Notes', recipients.group(1).encode("utf-8"))
Later on I am trying to split this by using:
recipientsList = reciBody.group(1).encode("utf-8").split(',')
The problem is I am getting this error if there is no corresponding string:
recipientsList = reciBody.group(1).encode("utf-8").split(',')
AttributeError: 'NoneType' object has no attribute 'group'
What other ways can I use? Or how can I handle this errror?
I'm assuming nothing needs to be done if the group isn't found. Simplest is to just skip the error.
try:
recipientsList = reciBody.group(1).encode("utf-8").split(',')
except AttributeError:
pass # nothing needs to be done
Instead of pass you may need to set recipientsList to something else

How to compare unicode strings with entity ref to non-unicode string

I am evaluating hundreds of thousands of html files. I am looking for particular parts of the files. There can be small variations in the way the files were created
For example, in one file I can have a section heading (after I converted it to upper and split then joined the text to get rid of possibly inconsistent white space:
u'KEY1A\x97RISKFACTORS'
In another file I could have:
'KEY1ARISKFACTORS'
I am trying to create a dictionary of possible responses and I want to compare these two and conclude that they are equal. But every substitution I try to run the first string to remove the '\97 does not seem to work
There are a fair number of variations of keys with various representations of entities so I would really like to create a dictionary more or less automatically so I have something like:
key_dict={'u'KEY1A\x97RISKFACTORS':''KEY1ARISKFACTORS',''KEY1ARISKFACTORS':'KEY1ARISKFACTORS',. . .}
I am assuming that since when I run
S1='A'
S2=u'A'
S1==S2
I get
True
I should be able to compare these once the html entities are handled
What I specifically tried to do is
new_string=u'KEY1A\x97RISKFACTORS'.replace('|','')
I got an error
Sorry, I have been at this since last night. SLott pointed out something and I see I used the wrong label I hope this makes more sense
You are correct that if S1='A' and S2 = u'A', then S1 == S2. Instead of assuming this though, you can do a simple test:
key_dict= {u'A':'Value1',
'A':'Value2'}
print key_dict
print u'A' == 'A'
This outputs:
{u'A': 'Value2'}
True
That resolved, let's look at:
new_string=u'KEY1A\x97DEMOGRAPHICRESPONSES'.replace('|','')
There's a problem here, \x97 is the value you're trying to replace in the target string. However, your search string is '|', which is hex value 0x7C (ascii and unicode) and clearly not the value you need to replace. Even if the target and search string were both ascii or unicode, you'd still not find the '\x97'. Second problem is that you are trying to search for a non-unicode string in a unicode string. The easiest solution, and one that makes the most sense is to simply search for u'\x97':
print u'KEY1A\x97DEMOGRAPHICRESPONSES'
print u'KEY1A\x97DEMOGRAPHICRESPONSES'.replace(u'\x97', u'')
Outputs:
KEY1A\x97DEMOGRAPHICRESPONSES
KEY1ADEMOGRAPHICRESPONSES
Why not the obvious .replace(u'\x97','')? Where does the idea of that '|' come from?
>>> s = u'KEY1A\x97DEMOGRAPHICRESPONSES'
>>> s.replace(u'\x97', '')
u'KEY1ADEMOGRAPHICRESPONSES'

Categories