How to unify separate digits when iterating string - python

I'm trying to iterate over a string and get all the numbers so that I can add them to a list, which I need for another task. I have multiple functions that recurrsively refer to each other and the original input is a list of data. The problem is that when I print the string I get the right output, but if I iterate over it and print all the indexes I get seperate digits, so 1,1 instead of 11 or 9,3 instead of 93. Does anyone have a simple solution to this problem? I'm not Quite experienced in programming so it may seem like a simple task but I can't figure it out at the moment. Here's my code for the problem part.
numbers = names.split('\t')[1].split(' ')[1]
print numbers
some of the output:
8
44
46
86
now if I use the following code:
numbers = names.split('\t')[1].split(' ')[1]
for i in numbers:
print i
I get the following output:
8
4
4
4
6
8
6
or when I convert to a list:
numbers = names.split('\t')[1].split(' ')[1]
print list(numbers)
output:
['8']
['4', '4']
['4', '6']
['8', '6']
The input names is structured in the following way: Andy Gray\t 2807 53
where I have many more names, but they are all structured like this.
I then split by \t to remove the names and then split again by ' ' to get the numbers. I then have 2 numbers and take the second index to get the numbers I want, which are the second numbers next to the name.
My only goal for now is to get the 'complete' digits, so the output as it is like when I print it. I need to be able to get a list of those numbers as integers where every index is the complete digit, so [8,44,46,86] etc. I can then iterate over the numbers and use them. Once I can do that I know what to do, but I'm stuck at this point for now. Any help would be nice.
Link to complete input and python code I am using, in case it makes things more clear:
Demo

str.rsplit()
works like str.split(), but starts from the right end.
s = "Andy Gray\t 2807 53"
_, number = s.rsplit(maxsplit=1)
print(number)

If you know that all your input is structured the same way and you have the guarantee that the string ends with the 2 digits of your interest, why don't just do the following?
names_list = ['Andy gray\t2807 53', 'name surname\t2807 934']
for n in names_list:
print (n[-2:])
On the other hand if you're not sure the last number only contains 2 digits, all the splitting on tab is unnecessary:
import re
names_list = ['Andy gray\t2807 53', 'name surname\t2807 94']
for n in names_list:
try:
if re.compile(r'.*\d+$').match(n) and ' ' in n:
print(n.split()[-1])
except:
pass

EDIT after reading the code added by OP
The code looks good, but the problem is that my input(names) is not a list of strings. I have a string like this: Guillaume van Steen 5855 5 Sven Silvis Cividjian 1539 88 Jan Willem Swarttouw 3911 66 which goes in further. This is why I split at the tab and whitespace to get the final number.
Maybe this code help:
from pathlib import Path
file = Path('text.txt')
text = file.read_text()
Here is were split the file in lines:
lines = text.split('\n')
So can use the function with a little added check
def get_numbers(list_of_strings):
numbers = list()
for string in list_of_strings:
# here check if the line has a "\t" so asume is a valid line
if '\t' in string:
numbers.append(int(string.split()[-1]))
return numbers
numbers = get_numbers(lines)
print(numbers)

You can split the string by whitespaces again and convert each value to int.
For example,
numbers = names.split('\t')[1].split(' ')[1]
list_numbers = [int(x) for x in numbers.split(' ')]
Then you will have your list of 'complete' digits

Related

how to remove all zeros that are 7 characters long python

I have made a string without spaces. so instead of spaces, I used 0000000. but there will be no alphabet letters. so for example, 000000020000000050000000190000000200000000 should equal "test". Sorry, I am very new to python and am not good. so if someone can help me out, that would be awesome.
You should be able to achieve the desired effect using regular expressions and re.sub()
If you want to extract the literal word "test" from that string as mentioned in the comments, you'll need to account for the fact that if you have 8 0's, it will match the first 7 from left to right, so a number like 20 followed by 7 0's would cause a few issues. We can get around this by matching the string in reverse (right to left) and then reversing the finished string to undo the initial reverse.
Here's the solution I came up with as my revised answer:
import re
my_string = '000000020000000050000000190000000200000000'
# Substitute a space in place of 7 0's
# Reverse the string in the input, and then reverse the output
new_string = re.sub('0{7}', ' ', my_string[::-1])[::-1]
# >>> new_string
# ' 20 5 19 20 '
Then we can strip the leading and trailing whitespace from this answer and split it into an array
my_array = new_string.strip().split()
# >>> my_array
# ['20', '5', '19', '20']
After that, you can process the array in whatever way you see fit to get the word "test" out of it.
My solution to that would probably be the following:
import string
word = ''.join([string.ascii_lowercase[int(x) - 1] for x in my_array])
# >>> word
# 'test'
NOTE: This answer has been completely rewritten (v2).

How to replace some numbers with other in from one line?

I have line include some numbers with underscore like this
1_0_1_A2C_1A_2BE_DCAAFFC_0_0_0
I need code to check (DCAAFFC) and if the last 4 numbers not (0000) then the code should be replacing (0000) in place of last 4 numbers (AFFC) like this (DCA0000)
So should be line become like this
1_0_1_A2C_1A_2BE_DCA0000_0_0_0
I need code work on python2 and 3 please !!
P.S the code of (DCAAFFC) is not stander always changing.
code=1_0_1_A2C_1A_2BE_DCAAFFC_0_0_0
I will assume that the format is strictly like this. Then you can get the DCAAFFC by code.split('_')[-4]. Finally, you can replace the last string with 0000 by replace.
Here is the full code
>>> code="1_0_1_A2C_1A_2BE_DCAAFFC_0_0_0"
>>> frag=code.split("_")
['1', '0', '1', 'A2C', '1A', '2BE', 'DCAAFFC', '0', '0', '0']
>>> frag[-4]=frag[-4].replace(frag[-4][-4:],"0000") if frag[-4][-4:] != "0000" else frag[-4]
>>> final_code="_".join(frag)
>>> final_code
'1_0_1_A2C_1A_2BE_DCA0000_0_0_0'
Try regular expressions i.e:
import re
old_string = '1_0_1_A2C_1A_2BE_DCAAFFC_0_0_0'
match = re.search('_([a-zA-Z]{7})_', old_string)
span = match.span()
new_string = old_string[:span[0]+4] + '0000_' + old_string[span[1]:]
print(new_string)
Is this a general string or just some hexadecimal representation of a number? For numbers in Python 3, '_' underscores are used just for adding readability and do not affect the number value in any way.
Say you have one such general string as you've given, and would like to replace ending 4 characters of every possible subgroup bounded within '_' underscores having length more than 4 by '0000', then one simple one-liner following your hexadecimal_string would be:
hexadecimal_string = "1_0_1_A2C_1A_2BE_DCAAFFC_0_0_0"
hexadecimal_string = "_".join([ substring if len(substring)<=4 else substring[:-4]+'0'*4 for substring in hexadecimal_string.split('_')])
Here,
hexadecimal_string.split('_') separates all groups by '_' as separator,
substring if len(substring)<=4 else substring[:-4]+'0'*4 takes care of every such substring group having length more than 4 to have ending 4 characters replaced by '0'*4 or '0000',
such for loop usage is a list comprehension feature of Python.
'_'.join() joins the subgroups back into one main string using '_' as separator in string.
Other answers posted here work specifically well for the given string in the question, I'm sharing this answer to ensure your one-liner requirement in Python 3.
If the length of the string is always the same, and the position of the part that needs to be replaced with zero is always the same, you can just do this,
txt = '1_0_1_A2C_1A_2BE_DCAAFFC_0_0_0'
new = txt[0:20]+'0000'+txt[-6:]
print(new)
The output will be
'1_0_1_A2C_1A_2BE_DCA0000_0_0_0'
It would help if you gave us some other examples of the strings.

Breaking line into a list, detecting spaces in Python

I'm trying to read data from a text file into Python. The file consists of lines like this:
SAMPLE_0001 2000 57 1 103 51 0 NA
For ease of data management, I'd like to save that line as a list:
[SAMPLE_0001,2000,57,1,103,51,0,NA]
I wrote the following function to do that:
def line_breaker(line):
words=[]
if line[0]==' ':
in_word=False
else:
in_word=True
word=[]
for i in range(len(line)):
if in_word==True and line[i]!=' ':
word.append(line[i])
elif in_word==True and line[i]==' ':
in_word=False
words.append(word)
word=[]
elif in_word==False and line[i]!=' ':
in_word=True
word.append(line[i])
if i==len(line)-1 and line[i]!=' ':
word.append(line[i])
words.append(word)
return words
Unfortunately, this doesn't work as intended. When I apply it to the example above, I get the whole line as one long string. On closer inspection, this was because the condition line[i]==' ' failed to trigger on the blank spaces. I guess I should replace ' ' with something else.
When I ask Python to print the 11th position in the example, it displays nothing. That's totally unhelpful. I then asked it to print the type of the 11th position in the example; I got <class 'str'>.
So what should I use to detect spaces?
You can use split, as usual – you'll just have to remember to not explicitly split on spaces alone, as in:
myNaiveSplit = text.split(' ')
because that will absolutely fail if, as in your case, there may be some other whitespace character between the words.
Instead, don't provide any argument at all. After all, the official documentation on split tells us so:
If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator ...
(my emphasis)
and the 'whitespace' mentioned is everything which is considered "whitespace" by the function isspace (which is fully Unicode-compliant).
So all you need is
mySmartSplit = text.split()
If you want to turn a string seperated by whitespaces into an array the best way (as some above has mentioned is the built in split(' ') function). But if you dont want to use that you could use isspace() and do it manually in a custom function like this:
def line_breaker():
my_array = []
string = list(input("Write Your string:\n"))
last_whitespace = int()
for index, element in enumerate(string):
if(element.isspace()):
my_array.append("".join(string[last_whitespace:index]))
last_whitespace = index + 1
print(my_array)
Why don't you use split?
line = "SAMPLE_0001 2000 57 1 103 51 0 NA"
print(line.split(' '))
['SAMPLE_0001', '2000', '57', '1', '103', '51', '0', 'NA']
Solution:
line = "SAMPLE_0001 2000 57 1 103 51 0 NA"
line = line.split(" ")
You'll get what you want.
split() work perfectly.
strr= "SAMPLE_0001 2000 57 1 103 51 0 NA"
print(strr.split(' '))
split() changes sentences into python list according to your need. For example, strr.split(',') will split by comma.
Do it like this
delimiter = ' '
with open(file) as f:
for line in f.readlines():
split_line = line.split(delimiter)
# do your thing with the list of words

Python program that is given a line read from a text file and displays the one number found

def extractTemp():
inputFile = open('P2text.txt','r+')
line = inputFile.readline()
for chr in line:
if chr.isdigit():
print(chr)
inputFile.close()
extractTemp()
The text file has the number 95 in it, but it prints as:
9
5
I'm guessing because it is iterating over each character and makes 95 two separate characters.
So my question is, how do I combine them. Or, what can I do to make this program run better?
Bc a big hole in finding the solution to combining the numbers is, what happens when there are two separate numbers like 95 and 90.
Then that would become 9590 if I linked everything together. So what can I do to make this work?
Basically, if given a sentence in a text file a sentence that says "I have the number 95 and 90" I want to be able to print just those two integers and ignore the rest. But the way I'm doing it, it would print 9,5,9,0 on separate lines.
So I'm just wondering how to print them together as in 95 is 95 and not 9,5 and 90 is 90 not 9,0. So the end result I want from reading that sentence is: >> 95 90 after running the program Or, if I can only get a sentence that says "I have the number 95" and for it to print: >> 95 I'd be happy with that too
If you're expecting input like:
Here is 95 and there is 90
and want output of:
95
90
You should probably use regular expressions.
import re
with open('path/to/file.txt') as inf:
text = inf.read() # generally bad practice, but...
numbers = re.findall(r"\d+", text) # ['95', '90']
for number in numbers:
print(number)
But since you seem to be a new programmer, I wouldn't expect you to jump into the wild world of regular expressions just yet. They're massively powerful, but ultimately unnecessary here. You could instead do:
with open('path/to/file.txt') as inf:
text = inf.read()
chars = [ch for ch in text if ch.isspace() or ch.isdigit()]
# [' ', ' ', '9', '5', ' ', ' ', ' ', ' ', '9', '0']
# every space and every digit
chars = ''.join(chars)
# " 95 90"
# join every element with the empty string
numbers = chars.strip().split()
# ['95', '90']
# strip off leading and trailing whitespace, then split on groups of whitespace
for number in numbers:
print(number) # as before
try this:
for num_str in line.split():
try:
print int(num_str)
except ValueError:
pass
You should use split() function as shown below:
inputFile=open('test.txt','r')
line=inputFile.readline().split()
for num in line:
try:
print int(num)
except:
pass
inputFile.close()
and the result will be:
90 95
You can make a list of the digits in the line and print them.
def extract_temp(n):
temp=[]
line=n.readline()
for ch in line:
if ch.isdigit():
temp.append(ch)
print("".join(temp))

Summing elements of string in Python

I am new comer to python. I started to read a book published by MIT's professor about python. I got an exercise from this book. I tried to solve this but i could not.
Problem: Let s be a string that contains a sequence of decimal numbers
separated by commas, e.g., s = '1.23,2.4,3.123' . Write a program that prints
the sum of the numbers in s.
i have to find out the sum of 1.23,2.4, and 3.123
So far i made some codes to solve this problem and my codes are follwoing:
s = '1.23,2.4,3.123'
total = 0
for i in s:
print i
if i == ',':
Please,someone help me how can go further?
All you need is first splitting your string with , then you'll have a list of string digits :
>>> s.split(',')
['1.23', '2.4', '3.123']
Then you need to convert this strings to float object till you can calculate those sum, for that aim you have 2 choice :
First is using map function :
>>> sum(map(float, s.split(',')))
6.753
Second way is using a generator expression within sum function :
>>> sum(float(i) for i in s.split(','))
6.753
It is much simpler to use str.split(), like in
s = '1.23,2.4,3.123'
total = 0
for i in s.split(','):
total += float(i)
It's much less Pythonic, and more effort, to walk through the string and build up the numbers as you go, but it's probably more in keeping with the spirit of a beginner working things out, and more of a continuation of your original code:
s = '1.23,2.4,3.123'
total = 0
number_holder_string = ''
for character in s:
if character == ',': # found the end of a number
number_holder_value = float(number_holder_string)
total = total + number_holder_value
number_holder_string = ''
else:
number_holder_string = number_holder_string + character
print total
That way, number_holder_string goes:
''
'1'
'1.'
'1.2'
'1.23'
found a comma -> convert 1.23 from string to value and add it.
''
'2'
'2.'
etc.

Categories