Compare format string to actual string python - python

I have the following format, as a string variable
format = '$<amount> has been received from <user>'
How would I check if another string fits the format, for example:
message = '$50 has been received from Hugh'
I'd like to check that the message exactly fits the format, and save the data, in this case 50 and Hugh in two separate variables.
I checked RegEx on a few websites such as W3Schools and PyPi, but couldn't find anything that fits what it is I'm trying to do.

message = '$50 has been received from Hugh'
import re
regex = re.compile(r'\$(\d+) has been received from (\w+)')
amount, user = regex.findall(message)[0]
print(amount, user) # 50, Hugh

Related

Overflow error when reading json file

I am trying to read a json which includes a number of tweets, but I get the following error.
OverflowError: int too large to convert
The script filters multiple json files to get specific tweets, and it crashes when reaching to a specific json.
The line that creates the error is this one :
df_temp = pd.read_json(path_or_buf=json_path, lines=True)
Here is the error in the cmd
Just store the user id as a String, and treat it like it is one (this is actually what you should do when dealing with this kind of ids). If you can't change the json input format, you can always parse it like a string before parsing it like a json object, and add the quotes to the id code, using for instance regexes: Regex in python.
I don't know with which library you are parsing the json, but maybe also implicit casting will work: either try the "getString" method on the number instead of the "getInt" method, or force python to treat the object like a string, with something like x = "" + json.getId()
Python is pretty loose on typing and may let you do it.

Extract information in a line of text with a format from user input

I am trying to make a program which takes in input song files and a format to write metatags in file. Here is a few examples of the call:
./parser '%n_-_%t.mp3' 01_-_Respect.mp3 gives me track=01; title=Respect
./parser '%b._%n.%t.mp3' The_Queen_of_Soul._01.Respect.mp3 gives me album=The_Queen_of_Sould; track=01; title=Respect
./parser '%a-%b._%n.%t.mp3' Aretha_Franklin-The_Queen_of_Soul._01.Respect.mp3 gives me artist=Aretha_Franklin; track=01; title=Respect
./parser '%a_-_%b_-_%n_-_%t.mp3' Aretha_Franklin_-_The_Queen_of_Soul_-_01_-_Respect.mp3 gives me artist=Aretha_Franklin; track=01; title=Respect
For a call on the file 01_-_Respect.mp3, I'd like to have a variable containing 01, and the other Respect.
Here %n and %t represents respectively the number and the title of the songs. The problem is that I don't know how to extract this information in bash (or eventually in python).
My biggest problem is that I don't know the format in advance!
Note: There is more information than this, for example %b for the album, %a for the artist etc.
Well, you can use the string method split to split the string by _-_.
and for taking the input from the command line, you can use sys.argv to get that.
here's an example:
import sys
number,title = sys.argv[1].split("_-_")
Update:
Surely you can pass the pattern as a first argument and the file as the second argument like that:
import sys
pattern = sys.argv[1]
number,title = sys.argv[2].split(pattern)
Now if you need more complex and dynamic processing, then Regex is your winning card!
And in order to write a good regex, you got to understand your data and your problem or you'll end up writing a glitchy regex
You can elaborate on this. It is a very simple example, though.
import re
p = re.compile('([0-1][0-1])_\-_(.*)\.mp3')
title = '01_-_Respect.mp3'
p.findall(title)
Output
[('01', 'Respect')]
I use this page to play with regex.
Update
Since the format is given, go with string slicing. Ok, pretty limited to the specific case..
number = title[:title.find('_')]
>>> number
'01'
>>> track = title[len(number) + 3:len(title)-4]
>>> track
'Respect'
Try This code:
(considering argument is given in runtime)
tmp=$1
num=echo ${tmp%%_*}
title=echo ${tmp##*_}|cut -d. -f1
Variables num and title will store the parts from the argument

Python splitting values from urllib in string

I'm trying to get IP location and other stuff from ipinfodb.com, but I'm stuck.
I want to split all of the values into new strings that I can format how I want later. What I wrote so far is:
resp = urllib2.urlopen('http://api.ipinfodb.com/v3/ip-city/?key=mykey&ip=someip').read()
out = resp.replace(";", " ")
print out
Before I replaced the string into new one the output was:
OK;;someip;somecountry;somecountrycode;somecity;somecity;-;42.1975;23.3342;+05:00
So I made it show only
OK someip somecountry somecountrycode somecity somecity - 42.1975;23.3342 +05:00
But the problem is that this is pretty stupid, because I want to use them not in one string, but in more, because what I do now is print out and it outputs this, I want to change it like print country, print city and it outputs the country,city etc. I tried checking in their site, there's some class for that but it's for different api version so I can't use it (v2, mine is v3). Does anyone have an idea how to do that?
PS. Sorry if the answer is obvious or I'm mistaken, I'm new with Python :s
You need to split the resp text by ;:
out = resp.split(';')
Now out is a list of values instead, use indexes to access various items:
print 'Country: {}'.format(out[3])
Alternatively, add format=json to your query string and receive a JSON response from that API:
import json
resp = urllib2.urlopen('http://api.ipinfodb.com/v3/ip-city/?format=json&key=mykey&ip=someip')
data = json.load(resp)
print data['countryName']

Asking for corrections on parsed text before processing

I have written a parser in Python that takes a tracklist of played songs in (for example) a podcast, and formats the tracks correctly for scrobbling to the last.fm website.
Because some tracklists feature odd tracks or sometimes tracks may be parsed incorrectly I wish to ask a user to correct the parsed input. I know of the raw_input() function, but that doesn't let me print a default text (like the complete parsed tracklist), meaning users would have to copy/paste the entire list before correcting.
Is there a way to print a 'suggestion' to use in the raw_input()?
Not sure if this is exactly what you're trying to do, but if you want to get line-by-line input and have a default value, this is what I did for a similar problem:
def get_input(prompt, default):
result = raw_input('%s [%s]:' % (prompt, default))
result = result or default
return result

Applying Regular Expression To An Instance - From Email

I'm using the imaplib module to log into my gmail account and retrieve emails.
This gives me alot of information aswell as the to/from/subject/body text. According to
type(msg) th object returned is a instance.
My regex wont work when I apply it to the msg object as it expects a string, and this is obviously an instance so doesn't work.
Example of regex to identify the date which works fine when I just give it a string:
match = re.search(r"Time:\s(([0-2]\d):([0-5]\d))", text) # validates hour and minute in a 24 hour clock
So three questions really:
1.) am I going about this the right way or is there a better way to do it?
2.) how can I apply my regex to this 'instance' informtion so I can identify the date/time etc
3.) how can I just retrieve the email body?
result, data = mail.fetch(latest_email_id, "(RFC822)")
raw_email = data[0][1]
email_message = email.message_from_string(raw_email)
msg = email.message_from_string(raw_email)
msg.get_payload()
Thank you again
I think that this problem might be really close to another question I had answered:
payload of an email in string format, python
The main problem for the other person was that get_payload() can return multipart objects which you have to check for. Its not always just a string.
Here is the snippet from the other question about how to handle the object you get from get_payload():
if isinstance(payload, list):
for m in payload:
print str(m).split()
else:
print str(m).split()
Also, you can review the actual extended conversation that I had with the OP of that question here: https://chat.stackoverflow.com/rooms/5963/discussion-between-jdi-and-puneet
Turns out that the body of the email can be accessed via the payload[0], as payload is a list while the msg variable was an instance. I then just converted it to a string with a simple
payload = msg.get_payload()
body = payload[0]
str_body = str(body)
Thanks for your help again

Categories