I am using a simple '.replace()' function on a string to replace some text with nothing as so:
.replace("('ws-stage-stat', ", '')
I have also tried using a regex to do this, like so:
match3a = re.sub("\(\'ws-stage-stat\', ", "", match3a)
This string is extracted from the source code for the following webpage at line 684:
http://www.whoscored.com/Regions/252/Tournaments/26
I have extracted and cleaned up the rest of the code into some usable data, but this one last bit won't co-operate and stubbornly refuses to be replaced. This seems like a very straight forward problem, but it just won't work for me.
Any ideas?
Thanks
The first replacement should work. Make sure that you're assigning the result of the replacement somewhere, for example:
mystring = mystring.replace("('ws-stage-stat', ", '')
I think you aren't escaping the regex correctly.
This is code my "Patterns" app spit out:
re.sub("\\(\\'ws-stage-stat\\', ", "", match3a)
A quick test showed me that it works correctly.
Related
I am parsing a bunch of HTML and am encountering a lot of "\n" and "\t" inside the code. So I am using
"something\t\n here".replace("\t","").replace("\n","")
This works, but I'm using it often. Is there a way to define a string function, along the lines of replace itself (or find, index, format, etc.) that will pretty my code a little, something like
"something\t\n here".noTabsOrNewlines()
I tried
class str:
def noTabNewline(self):
self.replace("\t","").replace("\n","")
but that was no good. Thanks for any help.
While you could do something along these lines (https://stackoverflow.com/a/4698550/1867876), the more Pythonic thing to do would be:
myString = "something\t\n here"
' '.join(myString.split())
You can see this thread for more information:
Strip spaces/tabs/newlines - python
you can try encoding='utf-8'. otherwise in my opinion there is no other way otherthan replacing it . python also replaces it spaces with '/xa0' so in anyway you have to replace it. our you can read it line by line via (readline()) instead of just read() it .
I am new to programming and have already checked other people's questions to make sure that I am using a good method to replace tabs with spaces, know my regex is correct, and also understand what exactly my error is ("Unhashable type 'list'). But even still, I'm at a loss of what to do. Any help would be great!
I have a large file that I have broken up into lines. Ultimately I will need to access the first 3 elements of each line. Currently when I print a line, without the additional re.sub line of code, I get something like this: ['blah\tblah\tblah'], when I want ['blah blah blah'].
My code to do this is
f = open(text.txt)
raw = f.read()
raw = raw.lower()
lines = raw.splitlines()
lines = re.sub(r'\t', lines, '\s')
print lines[0:2] #just to see the first few examples
f.close()
When I print the first few lines without the regex sub bit, it works fine. And then when I add that line in attempt to change the lines, I get the error. I understand that lists are changeable and thus can't be a hashed... but I'm not trying to work with a hash. I'm just trying to replace \t with \s in a large text file to make the program easier to work with. I don't think there is a problem with how I am changing \t's to \s's, because according to this error, any way I change it will break my code. What do I do?! Any help is super appreciated. :')
You need to change the order of params present inside the re.sub function. And also note that you can't use regex \s as a second param in re.sub function. Syntax of re.sub must be re.sub(regex,replacement,string) .
lines = raw.splitlines()
lines = [re.sub(r'\t', ' ', line) for line in lines]
raw.splitlines() returns a list which was then assigned to a variable called lines. So you need to apply the re.sub function to each item present in the list, since re.sub won't directly be applied on a list.
I have an html file that I am reading the below line from. I would like to grab only the number that appears after the ':' and before the ',' using REGEX... THANKS IN ADVANCE
"totalPages":15,"bloodhoundHtml"
"totalPages":([0-9]*),
You can see the Demo here
Then the python code is
import re
p = re.compile('"totalPages":([0-9]*),')
print p.findall('"totalPages":15,"bloodhoundHtml"')
you can try :\d+, to get the ':15,'
then you can trim first':' and trim end ',' to get the pure numbers,
I don't know if python can use variable in the regex, I'm a c# programe, in c#, I can use :(?<id>\d+), to match this string, and get the number directly by result.group["id"]
:\d{1,},
Also works for parsing the line you gave. According to this post, you might run into some trouble parsing the HTML
I'm making an app that uses the Google Places API.
This is the code snippet where I'm building a string for types parameter in the URL.
url = "https://maps.googleapis.com/maps/api/place/nearbysearch/json?"
#required params
requiredparams = "location="+str(lat)+","+str(lon)+"&radius="+str(radius)+"&sensor=true&rankby=distance&types="
place_types = "bakery|bar|beauty_salon|book_store|bowling_alley|cafe|car_dealer|car_rental|car_wash|car_repair|\
clothing_store|convenience_store|department_store|electronics_store|florist|food|furniture_store|\
grocery_or_supermarket|gym|hair_care|hardware_store|health|home_goods_store|jewelry_store|laundry|liquor_store|\
locksmith|meal_delivery|meal_takeaway|night_club|moving_company|pet_store|pharmacy|plumber|restaurant|shoe_store|\
shopping_mall|spa|store|taxi_stand|travel_agency"
When I print it (url+requiredparams+place_types), I'm getting gaps before words that start on a new line.
https://maps.googleapis.com/maps/api/place/nearbysearch/json?location=10,13&radius=500&sensor=true&rankby=distance&types=bakery|bar|beauty_salon|book_store|bowling_alley|cafe|car_dealer|car_rental|car_wash|car_repair| clothing_store|convenience_store|department_store|electronics_store|florist|food|furniture_store| grocery_or_supermarket|gym|hair_care|hardware_store|health|home_goods_st
ore|jewelry_store|laundry|liquor_store| locksmith|meal_delivery|meal_takeaway|night_club|moving_company|pet_store|pharmacy|plumber|restaurant|shoe_store| shopping_mall|spa|store|taxi_stand|travel_agency
I don't get it. What am I doing wrong?
I tried this on the console:
>>> d = "word1|\
... word2|\
... word3"
>>> d
'word1|word2|word3'
That works fine. Why not my code snippet?
Is for the identation inside the string.
Your snippet fixed:
pt = "bakery|bar|beauty_salon|book_store|bowling_alley|cafe|car_dealer|car_rental|car_wash|car_repair|\
clothing_store|convenience_store|department_store|electronics_store|florist|food|furniture_store|\
grocery_or_supermarket|gym|hair_care|hardware_store|health|home_goods_store|jewelry_store|laundry|liquor_store|\
locksmith|meal_delivery|meal_takeaway|night_club|moving_company|pet_store|pharmacy|plumber|restaurant|shoe_store|\
shopping_mall|spa|store|taxi_stand|travel_agency"
or better user multiline strings """:
pt = """bakery|bar|beauty_salon|book_store|bowling_alley|cafe|car_dealer|car_rental|car_wash|car_repair|
clothing_store|convenience_store|department_store|electronics_store|florist|food|furniture_store|
grocery_or_supermarket|gym|hair_care|hardware_store|health|home_goods_store|jewelry_store|laundry|liquor_store|
locksmith|meal_delivery|meal_takeaway|night_club|moving_company|pet_store|pharmacy|plumber|restaurant|shoe_store|
shopping_mall|spa|store|taxi_stand|travel_agency"""
You could do something like this:
place_types = (
"bakery|bar|beauty_salon|book_store|bowling_alley|cafe|car_dealer|car_rental|car_wash|car_repair|"
"clothing_store|convenience_store|department_store|electronics_store|florist|food|furniture_store|"
"grocery_or_supermarket|gym|hair_care|hardware_store|health|home_goods_store|jewelry_store|laundry|liquor_store|"
"locksmith|meal_delivery|meal_takeaway|night_club|moving_company|pet_store|pharmacy|plumber|restaurant|shoe_store|"
"shopping_mall|spa|store|taxi_stand|travel_agency"
)
Because your string looks like "bakery|bar|beauty_salon|book_store|bowling_alley|cafe|car_dealer|car_rental|car_wash|car_repair|\ clothing_store|convenience_store|department_store|electronics_store|florist|food|furniture_store|\ (without any line break) to Python. Don't use indention or maybe concat. your string could solve that issue.
Try using only one newline between lines, looks like you have one extra (on my phone at least). Avoid extra white space in general after the backslash + newline.
So suppose I have a text file of the following contents:
Hello what is up. ^M
^M
What are you doing?
I want to remove the ^M and replace it with the line that follows. So my output would look like:
Hello what is up. What are you doing?
How do I do the above in Python? Or if there's any way to do this with unix commands then please let me know.
''.join(somestring.split(r'\r'))
or
somestring.replace(r'\r','')
This assumes you have carriage return characters in your string, and not the literal "^M". If it is the literal string "^M" then substiture r'\r' with "^M"
If you want the newlines gone then use r'\r\n'
This is very basic string manipulation in python and it is probably worth looking at some basic tutorials http://mihirknows.blogspot.com.au/2008/05/string-manipulation-in-python.html
And as the first commenter said its always helpful to give some indication of what you have tried so far, and what you don't understand about the problem, rather than asking for an straight answer.
Try:
>>> mystring = mystring.replace("\r", "").replace("\n", "")
(where "mystring" contain your text)
use replace
x='Hello what is up. ^M\
^M\
What are you doing?'
print x.replace('^M','') # the second parameter insert what you want replace it with