I'm trying to read data from a text file into Python. The file consists of lines like this:
SAMPLE_0001 2000 57 1 103 51 0 NA
For ease of data management, I'd like to save that line as a list:
[SAMPLE_0001,2000,57,1,103,51,0,NA]
I wrote the following function to do that:
def line_breaker(line):
words=[]
if line[0]==' ':
in_word=False
else:
in_word=True
word=[]
for i in range(len(line)):
if in_word==True and line[i]!=' ':
word.append(line[i])
elif in_word==True and line[i]==' ':
in_word=False
words.append(word)
word=[]
elif in_word==False and line[i]!=' ':
in_word=True
word.append(line[i])
if i==len(line)-1 and line[i]!=' ':
word.append(line[i])
words.append(word)
return words
Unfortunately, this doesn't work as intended. When I apply it to the example above, I get the whole line as one long string. On closer inspection, this was because the condition line[i]==' ' failed to trigger on the blank spaces. I guess I should replace ' ' with something else.
When I ask Python to print the 11th position in the example, it displays nothing. That's totally unhelpful. I then asked it to print the type of the 11th position in the example; I got <class 'str'>.
So what should I use to detect spaces?
You can use split, as usual – you'll just have to remember to not explicitly split on spaces alone, as in:
myNaiveSplit = text.split(' ')
because that will absolutely fail if, as in your case, there may be some other whitespace character between the words.
Instead, don't provide any argument at all. After all, the official documentation on split tells us so:
If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator ...
(my emphasis)
and the 'whitespace' mentioned is everything which is considered "whitespace" by the function isspace (which is fully Unicode-compliant).
So all you need is
mySmartSplit = text.split()
If you want to turn a string seperated by whitespaces into an array the best way (as some above has mentioned is the built in split(' ') function). But if you dont want to use that you could use isspace() and do it manually in a custom function like this:
def line_breaker():
my_array = []
string = list(input("Write Your string:\n"))
last_whitespace = int()
for index, element in enumerate(string):
if(element.isspace()):
my_array.append("".join(string[last_whitespace:index]))
last_whitespace = index + 1
print(my_array)
Why don't you use split?
line = "SAMPLE_0001 2000 57 1 103 51 0 NA"
print(line.split(' '))
['SAMPLE_0001', '2000', '57', '1', '103', '51', '0', 'NA']
Solution:
line = "SAMPLE_0001 2000 57 1 103 51 0 NA"
line = line.split(" ")
You'll get what you want.
split() work perfectly.
strr= "SAMPLE_0001 2000 57 1 103 51 0 NA"
print(strr.split(' '))
split() changes sentences into python list according to your need. For example, strr.split(',') will split by comma.
Do it like this
delimiter = ' '
with open(file) as f:
for line in f.readlines():
split_line = line.split(delimiter)
# do your thing with the list of words
Related
I have made a string without spaces. so instead of spaces, I used 0000000. but there will be no alphabet letters. so for example, 000000020000000050000000190000000200000000 should equal "test". Sorry, I am very new to python and am not good. so if someone can help me out, that would be awesome.
You should be able to achieve the desired effect using regular expressions and re.sub()
If you want to extract the literal word "test" from that string as mentioned in the comments, you'll need to account for the fact that if you have 8 0's, it will match the first 7 from left to right, so a number like 20 followed by 7 0's would cause a few issues. We can get around this by matching the string in reverse (right to left) and then reversing the finished string to undo the initial reverse.
Here's the solution I came up with as my revised answer:
import re
my_string = '000000020000000050000000190000000200000000'
# Substitute a space in place of 7 0's
# Reverse the string in the input, and then reverse the output
new_string = re.sub('0{7}', ' ', my_string[::-1])[::-1]
# >>> new_string
# ' 20 5 19 20 '
Then we can strip the leading and trailing whitespace from this answer and split it into an array
my_array = new_string.strip().split()
# >>> my_array
# ['20', '5', '19', '20']
After that, you can process the array in whatever way you see fit to get the word "test" out of it.
My solution to that would probably be the following:
import string
word = ''.join([string.ascii_lowercase[int(x) - 1] for x in my_array])
# >>> word
# 'test'
NOTE: This answer has been completely rewritten (v2).
I am reading a text file with high scores and trying to find which index of the string is where the name stops, and the score starts. This is the format of the file:
John 15
bob 27
mary 72
videogameplayer99 99
guest 71
How can I do this?
If you are looking to find the index to split the string into 2 separate parts, then you can just use [string].split() (where string is an individual line). If you need to find the index of the space for some other reason, use: [string].index(" ").
You can strip the line to separate it by the space. It will result in a list containing the 2 'words' in the line, in this case the words will be the name and the score (in string). You can get it using:
result = line.split()
name = result[0]
score = int(result[1])
In this case, for each line, you would be looking for the index where you first find the space character " ". In python, you can accomplish this by using the find function on a string. For example, if you have a string s = videogameplayer99 99, then s.find(" ") will return `17'.
If you are using this method to split a name from a number, I would instead recommend using the split function, which will split a string based on some delimiter character. For example, s.split(" ") = ["videogameplayer99", "99"].
I'm trying to iterate over a string and get all the numbers so that I can add them to a list, which I need for another task. I have multiple functions that recurrsively refer to each other and the original input is a list of data. The problem is that when I print the string I get the right output, but if I iterate over it and print all the indexes I get seperate digits, so 1,1 instead of 11 or 9,3 instead of 93. Does anyone have a simple solution to this problem? I'm not Quite experienced in programming so it may seem like a simple task but I can't figure it out at the moment. Here's my code for the problem part.
numbers = names.split('\t')[1].split(' ')[1]
print numbers
some of the output:
8
44
46
86
now if I use the following code:
numbers = names.split('\t')[1].split(' ')[1]
for i in numbers:
print i
I get the following output:
8
4
4
4
6
8
6
or when I convert to a list:
numbers = names.split('\t')[1].split(' ')[1]
print list(numbers)
output:
['8']
['4', '4']
['4', '6']
['8', '6']
The input names is structured in the following way: Andy Gray\t 2807 53
where I have many more names, but they are all structured like this.
I then split by \t to remove the names and then split again by ' ' to get the numbers. I then have 2 numbers and take the second index to get the numbers I want, which are the second numbers next to the name.
My only goal for now is to get the 'complete' digits, so the output as it is like when I print it. I need to be able to get a list of those numbers as integers where every index is the complete digit, so [8,44,46,86] etc. I can then iterate over the numbers and use them. Once I can do that I know what to do, but I'm stuck at this point for now. Any help would be nice.
Link to complete input and python code I am using, in case it makes things more clear:
Demo
str.rsplit()
works like str.split(), but starts from the right end.
s = "Andy Gray\t 2807 53"
_, number = s.rsplit(maxsplit=1)
print(number)
If you know that all your input is structured the same way and you have the guarantee that the string ends with the 2 digits of your interest, why don't just do the following?
names_list = ['Andy gray\t2807 53', 'name surname\t2807 934']
for n in names_list:
print (n[-2:])
On the other hand if you're not sure the last number only contains 2 digits, all the splitting on tab is unnecessary:
import re
names_list = ['Andy gray\t2807 53', 'name surname\t2807 94']
for n in names_list:
try:
if re.compile(r'.*\d+$').match(n) and ' ' in n:
print(n.split()[-1])
except:
pass
EDIT after reading the code added by OP
The code looks good, but the problem is that my input(names) is not a list of strings. I have a string like this: Guillaume van Steen 5855 5 Sven Silvis Cividjian 1539 88 Jan Willem Swarttouw 3911 66 which goes in further. This is why I split at the tab and whitespace to get the final number.
Maybe this code help:
from pathlib import Path
file = Path('text.txt')
text = file.read_text()
Here is were split the file in lines:
lines = text.split('\n')
So can use the function with a little added check
def get_numbers(list_of_strings):
numbers = list()
for string in list_of_strings:
# here check if the line has a "\t" so asume is a valid line
if '\t' in string:
numbers.append(int(string.split()[-1]))
return numbers
numbers = get_numbers(lines)
print(numbers)
You can split the string by whitespaces again and convert each value to int.
For example,
numbers = names.split('\t')[1].split(' ')[1]
list_numbers = [int(x) for x in numbers.split(' ')]
Then you will have your list of 'complete' digits
def extractTemp():
inputFile = open('P2text.txt','r+')
line = inputFile.readline()
for chr in line:
if chr.isdigit():
print(chr)
inputFile.close()
extractTemp()
The text file has the number 95 in it, but it prints as:
9
5
I'm guessing because it is iterating over each character and makes 95 two separate characters.
So my question is, how do I combine them. Or, what can I do to make this program run better?
Bc a big hole in finding the solution to combining the numbers is, what happens when there are two separate numbers like 95 and 90.
Then that would become 9590 if I linked everything together. So what can I do to make this work?
Basically, if given a sentence in a text file a sentence that says "I have the number 95 and 90" I want to be able to print just those two integers and ignore the rest. But the way I'm doing it, it would print 9,5,9,0 on separate lines.
So I'm just wondering how to print them together as in 95 is 95 and not 9,5 and 90 is 90 not 9,0. So the end result I want from reading that sentence is: >> 95 90 after running the program Or, if I can only get a sentence that says "I have the number 95" and for it to print: >> 95 I'd be happy with that too
If you're expecting input like:
Here is 95 and there is 90
and want output of:
95
90
You should probably use regular expressions.
import re
with open('path/to/file.txt') as inf:
text = inf.read() # generally bad practice, but...
numbers = re.findall(r"\d+", text) # ['95', '90']
for number in numbers:
print(number)
But since you seem to be a new programmer, I wouldn't expect you to jump into the wild world of regular expressions just yet. They're massively powerful, but ultimately unnecessary here. You could instead do:
with open('path/to/file.txt') as inf:
text = inf.read()
chars = [ch for ch in text if ch.isspace() or ch.isdigit()]
# [' ', ' ', '9', '5', ' ', ' ', ' ', ' ', '9', '0']
# every space and every digit
chars = ''.join(chars)
# " 95 90"
# join every element with the empty string
numbers = chars.strip().split()
# ['95', '90']
# strip off leading and trailing whitespace, then split on groups of whitespace
for number in numbers:
print(number) # as before
try this:
for num_str in line.split():
try:
print int(num_str)
except ValueError:
pass
You should use split() function as shown below:
inputFile=open('test.txt','r')
line=inputFile.readline().split()
for num in line:
try:
print int(num)
except:
pass
inputFile.close()
and the result will be:
90 95
You can make a list of the digits in the line and print them.
def extract_temp(n):
temp=[]
line=n.readline()
for ch in line:
if ch.isdigit():
temp.append(ch)
print("".join(temp))
I have a huge list of lines, each of which looks as follows
1 01 01 some random text
The 1 01 01 part is a reference number that changes from line to line. I want to remove the two whitespaces between the three reference numbers, so that the lines look as follows.
10101 some random text
Obviously, this calls for a for loop. The question is what I should write inside the loop I can't use strip,
for i in my_list:
i.strip()
because that, if anything, would remove all whitespaces, giving me
10101somerandomtext
which I don't want. But if I write
for i in my_list:
i.remove(4)
i.remove(1)
I get an error message 'str' object has no attribute 'remove'. What is the proper solution in this case.
Thanks in advance.
If the number is always at the beginning, you can use the fact that str.replace function takes an optional argument count:
for l in mylist:
print l.replace(' ', '', 2)
Note that I'm doing print here for a reason: you can't change the strings in-place, because strings are immutable (this is also why they don't have a remove method, and replace returns a modified string, but leaves the initial string intact). So if you need them in a list, it's cleaner to create another list like this:
newlist = [l.replace(' ', '', 2) for l in mylist]
It's also safe to overwrite the list like this:
mylist = [l.replace(' ', '', 2) for l in mylist]
Use the count argument for replace, to replace the first 2 spaces.
a = "1 01 01 some random text"
a.replace(" " , "", 2)
>>> '10101 some random text'
split takes a second argument - the number of splits to make
for i in my_list:
components = i.strip(" ", 3)
refnum = ''.join(components[:3])
text = components[3]
Or in python 3:
for i in my_list:
*components, text = i.strip(" ", 3)
refnum = ''.join(components)