How to add spaces after a certain number of lines? - python

Currently the string I want to change looks like ABCDEFGHIJKL
I'm looking to change it to AB CDEF GHIJ KL
checked around but I was only able to find help on entering spaces at regular intervals.

With no particular rules defined, I can only see the following approach at the moment:
string_with_spaces = f"{string[:2]} {string[2:6]} {string[6:10]} {string[10:]}"
Based on what you have said in your question, you are not interested in adding spaces at regular intervals.

Related

Python Conditional Regex

My program is given an object with parameters, and I need to get the parameters' values.
The object my program is given will look like this:
Object = """{{objectName|
parameter1=random text|
parameter2=that may or may not|
parameter3=contain any letter (well, almost)|
parameter4=this is some [[problem|problematic text]], Houston, we have a problem!|
otherParameters=(order of parameters is random, but their name is fixed)}}"""
(all parameters might or might not exist)
I am trying to get the properties values.
In the first 3 lines, its pretty easy. a simple regex will find it:
if "parameter1" in Object:
parameter1 = re.split(r"parameter1=(.*?)[\|\}]", Object)[1]
if "parameter2" in Object:
parameter2 = re.split(r"parameter2=(.*?)[\|\}]", Object)[1]
and so on.
The problem is with parameter4, the above regex (property4=(.*?)[\|\}]) will only return this is some [[problem, since the regex stops at the vertical bar.
Now here is the thing: vertical bar will only appear as part of the text inside "[[]]".
For example, parameter1=a[[b|c]]d might appear, but parameter1=a|bc| will never appear.
I need a regex which will stop at vertical bar, unless it is inside double square brackets. So for example, for parameter4, I will get this is some [[problem|problematic text]], Houston, we have a problem!
Worked here when I removed the "?":
parameter4 = re.split(r"parameter4=(.*)[\|\}]", object_)[1]
I also changed the name of the variable to "object_" because "object" is a built-in object in Python
Best.
Apparently, there is no perfect solution.
For other readers possibly reading this question in the future, the closest solution is, as pointed by Wiktor Stribiżew in the comments, parameter4=([^[}|]*(?:\[\[.*?]][^[}|]*)*).
This regex will only work if the param text does not contain any single [, } and | but may contain [[...]] sub-strings.
If you want to understand this regex better, you might want to have a look here: https://regex101.com/r/bWVvKg/2

Remove dynamic time and name combinations using regex

I am unsuccessfully trying to use regex to remove time stamps and names from the online conversations I am processing.
The pattern I am trying to remove looks like this: [08:03:16] Name:
It is randomly distributed throughout the conversation instances.
The Name portion of the pattern can be lower or uppercase and can contain multiple names, e.g. Dave, adam Jons, Wei-Xing.
I am using the following regex:
[A-Z]([a-z]+|\.)(?:\s+[A-Z]([a-z]+|\.))*(?:\s+[a-z][a-z\-]+){0,2}\s+[A-Z]([a-z]+|\.)
From Find names with Regular Expression, but this only removes names outside the timestamp example provided above (and only works for some names in the timestamps).
I have been looking through SO for a while now to find something that might help me but nothing has worked across all examples so far.
That looks a lot more complicated than it has to be - might be easier to match the timestamp format, then match characters up until the next : is found (assuming that names can't have :s in them):
\[(?:\d{2}:){2}\d{2}\] [^:]+:
https://regex101.com/r/5i4HId/1

To seperate words outside brackets as well as inside brackets and put into separate columns in Python?

I need solution to the following string in my data set. Need to be splitted into various words to get meaningful insights.
a='(Barbecue)Cheese(earthyCamembert,Fontina,nuttyAsiago,Colby,Parmesan)General(Chocolate)Meat(Beef)
Here the first words (Barbecue) - represent cusine
second word - Cheese(earthyCamembert,Fontina,nuttyAsiago,Colby,Parmesan)
third word - General(Chocolate)
fourth word - Meat(Beef)
Like this above example in need to split it into 4 categories. can anyone help me out to code it python. I am new to this. Thanks.
You could probably get what you need just using a.split(')'). This breaks the string up into a list at every ). You would end up with a being ['(Barbecue', 'Cheese(earthyCamembert,Fontina,nuttyAsiago,Colby,Parmesan'…] if that's what you're looking for. You could also fairly easily iterate through the list if you want that final parenthesis. If I had to guess, I'd say that what you want, however, is a dictionary.
Barbecue = {'Cheese': ['earthyCamembert', 'Fontina', 'nuttyAsiago', 'Colby', 'Parmesan'],
'General': ['Chocolate'],
…}
Also, being fairly new to Python and coding myself, I'd recommend checking out Codeacademy's introductory course to Python. It helped me out a lot. After completing it, I bet you could've solved this yourself.

Regular expression for validation of inequality inputted by user

I searched for an answer but I couldn't find a clear one. Please, bear with me as I'm kind of a noob in regex, and this is my first question too. I'm using Python 3, but will also be needing this for Javascript too.
What I'm trying to do is validate an input by the user. The input is an inequality (spaces removed), and the variables are named by the user and given beforehand.
For example, let's say I have this inequality:
x+y+6p<=z+1
The variables x, y, p, z will be given. The problem now if the inequality is like this:
xp+yp+6p<=z+1
The given variables are xp, yp, p, and z.
I'm trying to write a regular expression to match any inequality with such a format, given no spaces in the inequality. I cannot figure out how to check for alternative strings. For example I wrote the following expression:
^([\+\-]?[0-9]*([xpypz]|[0-9]+))+[<>]=([\+\-]?[0-9]*([xpypz]|[0-9]+))+$
I know this is completely wrong and that's not how the parentheses are used, but I don't have a feasible expression and I wanted to show you what I want to achieve. Now I need to know three things (at least, I hope) to fix it:
How to check specifically for xp, and yp as they are literally instead of all characters in the set xypz?
How to make 0-9 after xpypz work as [0-9]+? Meaning that any number can occur instead of a variable?
How can I repeat make the whole group repeated
I'm trying to write this expression to check if the user is adding undeclared variables. I believe this can be done differently without using regex, but it would be nice to do it in a single line. Can you please help me figure out those three point? Thanks.
try this pattern
(^(?=.)(?:(?:[+-]?\d*(?:xp|yp|p|z)*)+)[<>]=(?=.)(?:(?:[+-]?\d*(?:xp|yp|p|z)*)+)$)
Demo
[0-9]*(xp|yp|p|z)*([+-][0-9]*(xp|yp|p|z)*)*(<|>|<=|>=)[0-9]*(xp|yp|p|z)*([+-][0-9]*(xp|yp|p|z)*)*
This is ugly and won't catch mistakes like 1++x<p nor does it allow for other functions like sin or exponents. It matches on xp+yp+6p<=z+1 but does not on xp+yp+6x<=z+1 if xp, yp, p, and z are the variables given.
As Greg Ball mentioned, though, the best thing would be to use parsing if possible. Then you could catch more syntax errors besides using wring variables and you could do so more reliably.

Regex named conditional lookahead (in Python)

I'm hoping to match the beginning of a string differently based on whether a certain block of characters is present later in the string. A very simplified version of this is:
re.search("""^(?(pie)a|b)c.*(?P<pie>asda)$""", 'acaaasda')
Where, if <pie> is matched, I want to see a at the beginning of the string, and if it isn't then I'd rather see b.
I'd use normal numerical lookahead but there's no guarantee how many groups will or won't be matched between these two.
I'm currently getting error: unknown group name. The sinking feeling in my gut tells me that this is because what I want is impossible (look-ahead to named groups isn't exactly a feature of a regular language parser), but I really really really want this to work -- the alternative is scrapping 4 or 5 hours' worth of regex writing and redoing it all tomorrow as a recursive descent parser or something.
Thanks in advance for any help.
Unfortunately, I don't think there is a way to do what you want to do with named groups. If you don't mind duplication too much, you could duplicate the shared conditions and OR the expressions together:
^(ac.*asda|bc.*)$
If it is a complicated expression you could always use string formatting to share it (rather than copy-pasting the shared part):
common_regex = "c.*"
final_regex = "^(a{common}asda|b{common})$".format(common=common_regex)
You can use something like that:
^(?:a(?=c.*(?P<pie>asda)$)|b)c.*$
or without .*$ if you don't need it.

Categories