This question already has answers here:
Split Strings into words with multiple word boundary delimiters
(31 answers)
Closed 3 years ago.
I have the following data in a csv file:
IDA/IDB/type/timestamp valueoftype
A1/B1/a/1575033906 4
A1/B1/b/1575033906 5
A1/B1/c/1575033906 3
A1/B2/a/1575033906 5
A2/B3/a/1575033906 6
A1/B2/b/1575033906 7
A1/B2/c/1575033906 85
A2/B3/b/1575033906 6
A2/B3/c/1575033906 4
.
.
.
A1/B1/a/1575033909 5
A1/B1/b/1575033909 6
A1/B1/c/1575033909 4
I want to use a regular expression so that I can read each line of the file in order to split it based on two delimiters. In my case those delimiters are " " and "/". So in the end, I want to have this :
['A1','B1','a','1575033906','4']
Here is the code I used:
for line in f:
print(line)
x = re.split(r'[ /]+', line)
print(x)
And the results it gives me is this:
['A1','B1','a','1575033906','4\n']
How could I exclude the "\n" character from getting into that last position?
strip or rstrip it away:
x = re.split(r'[ /]+', line.strip())
If there's precious whitespace at the beginning of the line, use rstrip to strip from the right:
>>> ' w t\n'.rstrip()
' w t'
Related
This question already has answers here:
Split a string by a delimiter in python
(5 answers)
Closed 2 years ago.
I am trying to print by breaking a line based on character \n (new line). Below is the example:
line = "This is list\n 1.Cars\n2.Books\n3.Bikes"
I want to print the output as below:
This is list:
1.Cars
2.Books
3.Bikes
I used code as below:
line1 = line.split()
for x in line1:
print(x)
But its printing each word in different line. How to split the string based on only "\n"
Regards,
Arun Varma
The argument to split() specifies the string to use as a delimiter. If you don't give an argument it splits on any whitespace by default.
If you only want to split at newline characters, use
line1 = line.split('\n')
There's also a method specifically for this:
line1 = line.splitlines()
This question already has answers here:
get index of character in python list
(4 answers)
Regular expression to match a dot
(7 answers)
Closed 3 years ago.
I want to find the position of '.', but when i run code below:
text = 'Hello world.'
pattern = '.'
search = re.search(pattern,text)
print(search.start())
print(search.end())
Output is:
0
1
Place of '.' isn't 0 1.
So why is it giving wrong output?
You can use find method for this task.
my_string = "test"
s_position = my_string.find('s')
print (s_position)
Output
2
If you really want to use RegEx be sure to escape the dot character or it will be interpreted as a special character.
The dot in RegEx matches any character except the newline symbol.
text = 'Hello world.'
pattern = '\.'
search = re.search(pattern,text)
print(search.start())
print(search.end())
This question already has answers here:
Check string indentation?
(4 answers)
Closed 4 years ago.
How can I use regex to count the number of spaces beginning of the string. For example:
string = ' area border router'
count_space variable would return me a value of 1 since there is 1 whitespace at the beginning of the string. If my string is:
string = ' router ospf 1'
count_space variable would return me a value of 2 since there is 2 whitespace at the beginning of the string. And so on....
I thing the expression would be something like RE = '^\s' ? But not sure how to formulate it.
You don't need regex, you can just do this:
s = ' area border router'
print(len(s)-len(s.lstrip()))
Output:
1
This question already has answers here:
Python - Count number of words in a list strings
(8 answers)
Closed 8 years ago.
I want to count the total no of words in each line in a file and print them.I tried
with codecs.open('v.out','r',encoding='utf-8') as f:
for line in f.readlines():
words = len(line.strip(' '))
print words
the input file is:
hello
try it
who knows
it may work
the output that I get is:
6
7
10
12
but what I need is:
1
2
2
3
is there any function that I can use? I have to print the first word of each line in a file, and similarly print the middle word and the last word of the line into separate files.
You are stripping spaces from the ends, not splitting the words. You are counting the remaining characters now, not words.
Use str.split() instead:
words = len(line.split())
No arguments required, or use None; it'll strip whitespace from the ends, and split on arbitrary-width whitespace, giving you words:
>>> 'it may work'.split()
['it', 'may', 'work']
>>> len('it may work'.split())
3
You were so close. This line:
words = len(line.strip(' '))
should be:
words = len(line.split(' '))
strip removes characters from the start and end of the string, split breaks the string up into a list of strings.
This question already has answers here:
Split a string by spaces -- preserving quoted substrings -- in Python
(16 answers)
Closed 9 years ago.
I have my data as below
string = ' streptococcus 7120 "File being analysed" rd873 '
I tried to split the line using n=string.split() which gives the below result:
[streptococcus,7120,File,being,analysed,rd873]
I would like to split the string ignoring white spaces in " "
# output expected :
[streptococcus,7120,File being analysed,rd873]
Use re.findall with a suitable regex. I'm not sure what your error cases look like (what if there are an odd number of quotes?), but:
filter(None, it.chain(*re.findall(r'"([^"]*?)"|(\S+)', ' streptococcus 7120 "File being analysed" rd873 "hello!" hi')))
> ['streptococcus',
'7120',
'File being analysed',
'rd873',
'hello!',
'hi']
looks right.
You want shlex.split, which gives you the behavior you want with the quotes.
import shlex
string = ' streptococcus 7120 "File being analysed" rd873 '
items = shlex.split(string)
This won't strip extra spaces embedded in the strings, but you can do that with a list comprehension:
items = [" ".join(x.split()) for x in shlex.split(string)]
Look, ma, no regex!