trouble formatting serial data from python enumeration - python

So I get a list of mentions from twitter, and need to know the index of each mention. To do this I use:
for idx, result in enumerate(mentions, start = 48):
message = result['text']
print idx, message
which returns
48 " first message"
as expected. However, I need to use this index for some serial data that requires me to convert the index into hex. So.. I use:
hidx = hex(idx)
which then returns 0x30. However I need to somehow have this result in the exact format of "\x30" so that I can use serial to write:
serial.write("\x30")
what is the best way to accomplish this? If I keep my hex-converted index code the way it is, I get that pesky extra 0 and no backslash that causes the serial code to actually write serial.write(0x30), which is not what I need. Im hoping to find a way that, because of the for loop, I will receive:
serial.write("\x30")
serial.write("\x31")
serial.write("\x32")
ect. for as many mentions as I need.
Is there a way to strip the first zero and add the \? Maybe a better way? Im new to python and serial communication so any help will be much appreciated.

(Assuming Python 2) "\x30" is a 1-byte string and the byte in question is the one at ASCII code 48:
>>> print repr('\x30')
'0'
So, all you need to do to emit it is serial.write(chr(idx)) -- no need to mess with hex!

Related

Is there a way to strip the end of a string until a certain character is reached?

I'm working on a side project for myself and have stumbled on an issue that I'm not sure how to solve for. I have a url, for arguments sake let's say https://stackoverflow.com/xyz/abc. I'm attempting to strip the the end of the url so that I am only left with https://stackoverflow.com/xyz/.
Initially I tried to use the strip function and specify a length/position to remove up to, but realized for other url's I'm working with, it is not the same length. (i.e. URL 1 = /xyz/abc, URL 2 = /xyz/abcd))
Is there any advice for achieving this, I looked into using the regular expression operations in Python, but was unsure how to apply it to this use case. Ideally I would like to write a function that would start from the end of the string and strip away all characters till the first '/' is reached. Any advice would be appreciated.
Thanks
Why not just use rfind, which starts from the end?
>>> string = 'https://stackoverflow.com/xyz/abc'
>>> string = string[:string.rfind('/')+1]
>>> print(string)
'https://stackoverflow.com/xyz/'
And if you don't want the character either (the / in this case), simply remove the +1.
Keep in mind however that this only works if the string actually contains the character you are looking for.
If you want to protect against this, you will have to use the following:
string = 'https://stackoverflow.com/xyz/abc'
idx = string.rfind('/')
if(idx != -1):
string = string[:idx+1]
Unless, obviously, you do want to end up with an empty string in case the character is not found.
Then the first example works just fine.
if yo dont want to use regex, you can combine both the split and join().
lol = 'https://stackoverflow.com/xyz/abc'
splt= lol.split('/')[:-1]
'/'.join(splt)
output
'https://stackoverflow.com/xyz'

Python strip only single specific characters from text/json

I'm currently trying to scrape data from a website and want to automatically save them. I want to format the data before so I can use them as cvs or similar. The json is:
{"counts":{"default":"27","quick_mode1":"48","quick_mode2":"13","custom":"281","quick_mode3":"0","total":369}}
My code is:
x = '{"counts":{"default":"27","quick_mode1":"48","quick_mode2":"13","custom":"281","quick_mode3":"0","total":369}}'
y = json.loads(x)
print(y["total"])
But due to the {"counts": on the beginning and the corresponding } on the end I can't just use it as a normal json file because the formatting breaks and it just puts an error message out (json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)), when I remove the characters manually it then works again.
How can I get rid of only those 2 parts?
I think that json library should go along with this, but if it really is an issue you can't get rid of, you could remove all occurrences of { and } characters in the string you are are receiving with .replace() function.
This requires chaining for every character you want to remove and is rather not an optimal solution, but as far as the strings you are processing are not long and/or you are not concerned about efficiency this should do just fine.
Example:
my_var.replace('{', '').replace('}', '')

Python: Separating string into individual letters/words to be converted into ascii-hex or ascii-dec

As the title suggests, I want to get a string, split it into individual bits to input into something like ord('') and get a value for each individual character in that string. Still learning python so things like this get super confusing :P. Furthermore, the process for encryption for each of the codes will just be to shift the alphabet's dec number by a specified value and decrypt into the shifted value, plus state that value for each character. How would i go about doing this? any and all help would be greatly appreciated!
message=input("Enter message here: ", )
shift=int(input("Enter Shift....explained shift: ", )
for c in list(message):
a=ord(c)
print c
This is the very basic idea of what i was doing (was more code but similar), but obviously it didn't work :C, the indented--> just means that it was indented, just don't know how to do that in stack overflow.
UPDATE: IT WORKS (kinda) using the loop and tweaking it according to the comments i got a list of every single ascii dec value for each character in the string!, ill try and use #Hugh Bothwell's suggestion within the loop and hopefully get some work done.
mystring = "this is a test"
shift = 3
encoded = ''.join(chr(ord(ch) + shift) for ch in mystring)
You'll have to do a little more if you want your alphabet to wrap around, ie encode('y') == 'b', but this should give you the gist of it.

How to compare unicode strings with entity ref to non-unicode string

I am evaluating hundreds of thousands of html files. I am looking for particular parts of the files. There can be small variations in the way the files were created
For example, in one file I can have a section heading (after I converted it to upper and split then joined the text to get rid of possibly inconsistent white space:
u'KEY1A\x97RISKFACTORS'
In another file I could have:
'KEY1ARISKFACTORS'
I am trying to create a dictionary of possible responses and I want to compare these two and conclude that they are equal. But every substitution I try to run the first string to remove the '\97 does not seem to work
There are a fair number of variations of keys with various representations of entities so I would really like to create a dictionary more or less automatically so I have something like:
key_dict={'u'KEY1A\x97RISKFACTORS':''KEY1ARISKFACTORS',''KEY1ARISKFACTORS':'KEY1ARISKFACTORS',. . .}
I am assuming that since when I run
S1='A'
S2=u'A'
S1==S2
I get
True
I should be able to compare these once the html entities are handled
What I specifically tried to do is
new_string=u'KEY1A\x97RISKFACTORS'.replace('|','')
I got an error
Sorry, I have been at this since last night. SLott pointed out something and I see I used the wrong label I hope this makes more sense
You are correct that if S1='A' and S2 = u'A', then S1 == S2. Instead of assuming this though, you can do a simple test:
key_dict= {u'A':'Value1',
'A':'Value2'}
print key_dict
print u'A' == 'A'
This outputs:
{u'A': 'Value2'}
True
That resolved, let's look at:
new_string=u'KEY1A\x97DEMOGRAPHICRESPONSES'.replace('|','')
There's a problem here, \x97 is the value you're trying to replace in the target string. However, your search string is '|', which is hex value 0x7C (ascii and unicode) and clearly not the value you need to replace. Even if the target and search string were both ascii or unicode, you'd still not find the '\x97'. Second problem is that you are trying to search for a non-unicode string in a unicode string. The easiest solution, and one that makes the most sense is to simply search for u'\x97':
print u'KEY1A\x97DEMOGRAPHICRESPONSES'
print u'KEY1A\x97DEMOGRAPHICRESPONSES'.replace(u'\x97', u'')
Outputs:
KEY1A\x97DEMOGRAPHICRESPONSES
KEY1ADEMOGRAPHICRESPONSES
Why not the obvious .replace(u'\x97','')? Where does the idea of that '|' come from?
>>> s = u'KEY1A\x97DEMOGRAPHICRESPONSES'
>>> s.replace(u'\x97', '')
u'KEY1ADEMOGRAPHICRESPONSES'

How to work with very long strings in Python?

I'm tackling project euler's problem 220 (looked easy, in comparison to some of the
others - thought I'd try a higher numbered one for a change!)
So far I have:
D = "Fa"
def iterate(D,num):
for i in range (0,num):
D = D.replace("a","A")
D = D.replace("b","B")
D = D.replace("A","aRbFR")
D = D.replace("B","LFaLb")
return D
instructions = iterate("Fa",50)
print instructions
Now, this works fine for low values, but when you put it to repeat higher then you just get a "Memory error". Can anyone suggest a way to overcome this? I really want a string/file that contains instructions for the next step.
The trick is in noticing which patterns emerge as you run the string through each iteration. Try evaluating iterate(D,n) for n between 1 and 10 and see if you can spot them. Also feed the string through a function that calculates the end position and the number of steps, and look for patterns there too.
You can then use this knowledge to simplify the algorithm to something that doesn't use these strings at all.
Python strings are not going to be the answer to this one. Strings are stored as immutable arrays, so each one of those replacements creates an entirely new string in memory. Not to mention, the set of instructions after 10^12 steps will be at least 1TB in size if you store them as characters (and that's with some minor compressions).
Ideally, there should be a way to mathematically (hint, there is) generate the answer on the fly, so that you never need to store the sequence.
Just use the string as a guide to determine a method which creates your path.
If you think about how many "a" and "b" characters there are in D(0), D(1), etc, you'll see that the string gets very long very quickly. Calculate how many characters there are in D(50), and then maybe think again about where you would store that much data. I make it 4.5*10^15 characters, which is 4500 TB at one byte per char.
Come to think of it, you don't have to calculate - the problem tells you there are 10^12 steps at least, which is a terabyte of data at one byte per character, or quarter of that if you use tricks to get down to 2 bits per character. I think this would cause problems with the one-minute time limit on any kind of storage medium I have access to :-)
Since you can't materialize the string, you must generate it. If you yield the individual characters instead of returning the whole string, you might get it to work.
def repl220( string ):
for c in string:
if c == 'a': yield "aRbFR"
elif c == 'b': yield "LFaLb"
else yield c
Something like that will do replacement without creating a new string.
Now, of course, you need to call it recursively, and to the appropriate depth. So, each yield isn't just a yield, it's something a bit more complex.
Trying not to solve this for you, so I'll leave it at that.
Just as a word of warning be careful when using the replace() function. If your strings are very large (in my case ~ 5e6 chars) the replace function would return a subset of the string (around ~ 4e6 chars) without throwing any errors.
You could treat D as a byte stream file.
Something like:-
seedfile = open('D1.txt', 'w');
seedfile.write("Fa");
seedfile.close();
n = 0
while (n
warning totally untested

Categories