Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 months ago.
Improve this question
I made a service to record some data from trains company. But over the past few months, they have modified the rules to save the names of train stations. For instance, there is both 'Saint-Charles' and 'Saint - Charles', so my requests are not complete in my database.
I would like to know if there is a quick (and safe) way to unify the both syntax? I would like to change 'Saint - Charles' to 'Saint-Charles' but I don't really know how to do it safely. Indeed, I have other locations 'Saint James' and I don't want to make a rule to replace the space in the word.
Maybe regex expression will help, but I am not familiar with this.
I use Python for my service.
Thank you for your help.
Regards,
my_str = "Saint - Charles"
converted_string = "-".join([substring.strip() for substring in my_str.split("-")])
print(converted_string)
Saint-Charles
How this works is we split the original string by "-", then we use .strip() function to trim out spaces both at the start and end of the substring, then finally joining the substrings back which results in spaces left and right of "-" being removed.
Strings without "-" like "Saint James" will be unaffected.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
Need to create a python code to provide a list of tuples (searched words, list of occurrences).
the searched words are listed in a Thesaurus which need to be searched in a series of documents in a Corpus.
Any suggestion/guidance?
After you read the file, you could simply use split on space to get a list of words. This however would include punctuation. To remove the punctuation you could get a list of punctuation from "string" library's "punctuation" attribute and replace the occurences of punctuation in the words list obtained above with empty string,"". Your words might have special symbols such as "/" to represent or. Then you would need regular expressions to extract the words.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a string. For example :
"This is a string.Is this a question?What is the Question? I Dont know what the question is. Can you please list out the question?"
I want to extract the questions from this text using regex
what i tried
re.findall(r'(how|can|what|where|describe|who|when)(.*?)\s*\?',message,re.I|re.M))
But it gives out other things as well and if I gives the questions it separates the (how what which etc) and the rest of the question
For the above example my output is
[('is', ' is a string.Is this a question'), ('What', ' is the Question'), ('what', ' the question is. Can you please list out the question')]
Where as I want the entire question to be together.
It's totally impractical to search for key words when determining whether a sentence is a question. Given your list: how|can|what|where|describe|who|when, I can easily write sentences containing one of those words, which are not questions!
There are many ways you could tackle matching a sentence. For example, taking this as a baseline:
^\s*[A-Za-z,;'"\s]+[.?!]$
We could first alter it to match multiple sentences in the same string:
(^|(?<=[.?!]))\s*[A-Za-z,;'"\s]+[.?!]
This uses a look-behind to ensure that a sentence has just finished (unless we're at the start of the string).
And then adjust it to match only sentences which end with ?:
(^|(?<=[.?!]))\s*[A-Za-z,;'"\s]+\?
Here is an online demo of my regex, on your original string.
To have the entire question together, you should just enclose the whole pattern in parenthesis.
Here is another, simplified version:
\b([A-Z][^.!]*[?])
Thank you for helping me out
the answer was provided by #Fredrik
and can be found here https://regex101.com/r/rT1mQ0/2
\s*([^.?]*(?:how|can|what|where|describe|who|when)[^.?]*?\s*\?)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am reading in information from a text file which has been ripped from a pdf, so everything is a mess.
Some example variables (columns) that I'm trying to separate include date, action type and summary.
For date, the format is DD/MM/YY, so I know that the first index will always be an int. However, whenever I test the file (using type(xyz)), everything is marked as being an str.
How do I get python to recognize what is, and what is not, a str vs. int vs. double... etc.?
Short answer: use regular expressions and recast the string sections.
Long answer: it's because all of this is coming from a text file, so everything is a string. The date 23/10/90 isn't represented in a .txt as a numerical value, it's a collection of character codes. Depending on exactly what you are trying to get out of that file, your best bet is to regex out the data you want, and recast it. So, for dates, try int(dayString) int(monthString) etc.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I have been using the following code to capitalize words:
with open("capitalize.txt") as f:
for line in f:
print line.title(),
It works fine but I want to be able to capitalize letters in the middle of the string e.g
change javascript to JavaScript, how can I do this using python?
It seems that you're not describing an algorithmic transformation (eg first letter, last letter, word boundaries, etc) but rather an arbitrary capitalization scheme in the context of known words.
As such, you'll probably want a permutation of the following using replace:
with open("capitalize.txt") as f:
for line in f:
print line.replace("javascript", "JavaScript")
If you've got a known set of words, then you can make it fancier, such as creating a dict {'javascript': 'JavaScript'} and then looping through the keys replacing each key with its value, but the basic approach will be more manual than you're envisioning.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I am having a problem with patterns.
I have string like this:
string1 = "27.86.80.76.83.45.66.71.80.45.76.68.80.45.67.97.108.108.45.84.105.116.45.77.97.114.105.111"
The strings appear in the middle of one file, with different lengths.
For instance I am reading a file line by line and I need to know if the line has this pattern.
Can you guys point me in the right direction?
There's two different ways to go about this:
Build a parser - much work, but very flexible and possibly best performance (depending on implementation)
Use a regular expression. In your case this could be something like (\d{2,3}\.)+\d{2,3} (shortest string matched should be "111.11")