Best way to remove spaces between digits and sum python <class 'str'>? - python

I am a learner and currently working on a task now and almost finished, but I got stuck with how finish up.
I have a < class 'str'> that looks like:
I need to remove the spaces in-between the 2-digit numbers, so I can easily sum the whole numbers.
Below is the exact string:
9 7 9 7 9 0 9 0 8 8 8 7 8 7 8 0
7 9 7 9 7 8 7 6 7 6 7 2 7 2 6 6 6 6 6 5 6 5 6 4 6
1 6 1 5 9 5 8 5 7 5 7 5 4 5 1 4 9 4 7 4 0 3 8 3 7
3 6 3 6 3 2 2 5 2 4 2 2 2 1 1 9 1 8 1 8 1 4 1 2
1 2 9 7 3 2
I have checked similar answers here with Regex, and every solution I tried seems to just either remove all the spaces(leaving one long string of digits) or separate them into single digits.
What is the best way to solve this problem?

You can use a regex to look for a digit '\d' that is optionally followed by a single space and then another digit '(?: \d)?'. Then in a list comprehension remove the middle whitespace if there is one
>>> [i.replace(' ', '') for i in re.findall(r'(\d(?: \d)?)', s)]
['97', '97', '90', '90', '88', '87', '87', '80', '79', '79', '78', '76', '76', '72', '72', '66', '66', '65', '65', '64', '6', '1', '61', '59', '58', '57', '57', '54', '51', '49', '47', '40', '38', '37', '36', '36', '32', '25', '24', '22', '21', '19', '18', '18', '14', '12', '12', '9', '7', '3', '2']
To convert these into int types
>>> [int(i.replace(' ', '')) for i in re.findall(r'(\d(?: \d)?)', s)]
[97, 97, 90, 90, 88, 87, 87, 80, 79, 79, 78, 76, 76, 72, 72, 66, 66, 65, 65, 64, 6, 1, 61, 59, 58, 57, 57, 54, 51, 49, 47, 40, 38, 37, 36, 36, 32, 25, 24, 22, 21, 19, 18, 18, 14, 12, 12, 9, 7, 3, 2]
and to sum them
>>> sum(int(i.replace(' ', '')) for i in re.findall(r'(\d(?: \d)?)', s))
2499

Related

How to take average in a timeframe python?

I am beginner in Python so I kindly ask your help. I would like to have a document where I have the first column as 2011.01 and the second column is the number of ARD 'events' in that month and the third column is the average of all of the ARD displayed in that month. If not, that e.g. 2012.07 0 0
I've already tried for 3 hours and now I am getting nervous.
I really much appreciate your help
import pandas as pd
from numpy import mean
from numpy import std
from numpy import cov
from matplotlib import pyplot
from scipy.stats import pearsonr
from scipy.stats import spearmanr
data = pd.read_csv('ARD.txt',delimiter= "\t")
month = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12']
day = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31']
year = ['2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021']
ertek = data[:1].iloc[0].values
print(ertek)
print(data.head)
def list_to_string ( y, m, d):
str = ""
s = [y, m, d]
str.join(s)
return str
for x in year:
for y in month:
for i in day:
x = 1
ertek = data[:x].iloc[0].values
list_to_string(x, y, i)
if ertek[0] == list_to_string[x, y, i]:
print("")
x += 1
else:
print("")
Result:
['2011.01.05.' 0.583333333]
<bound method NDFrame.head of Date ARB
0 2011.01.05. 0.583333
1 2011.01.06. 0.583333
2 2011.01.07. 0.590909
3 2011.01.09. 0.625000
4 2011.01.10. 0.142857
... ... ...
1284 2020.12.31. 0.900000
1285 2020.12.31. 0.900000
1286 2020.12.31. 0.900000
1287 2020.12.31. 0.900000
1288 2020.12.31. 0.900000
[1289 rows x 2 columns]>
Traceback (most recent call last):
File "C:\Users\Kókai Dávid\Desktop\python,java\python\stock-trading-ml-master\venv\Scripts\orosz\oroszpred.py", line 29, in <module>
list_to_string(x, y, i)
File "C:\Users\Kókai Dávid\Desktop\python,java\python\stock-trading-ml-master\venv\Scripts\orosz\oroszpred.py", line 21, in list_to_string
str.join(s)
TypeError: sequence item 0: expected str instance, int found
Process finished with exit code 1
I'm not quite certain I'm tracking your intent with the list_to_string function; if it's for string date comparison, let's sidestep that entirely by
df.iloc[:,0] = pd.to_datetime(df.iloc[:,0]
df.set_index('Date')
df['Month Average'] = df.Date.resample('M').mean()

set add() method sorts the set?

Using the python set add method i have noticed that the method sorts the content based on value and the content of the set.
Based on the docstring the following method description is found:
Why is this happining ? And is there a method for this not to occur ?
I am using Python 3.6.
Please don't count on this behavior:
>>> x = set()
>>> for i in range(10):
... x.add(i)
...
>>> x
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
>>> for i in range(1000, 1020):
... x.add(i)
...
>>> x
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019}
>>> x.remove(2)
>>> x
{0, 1, 3, 4, 5, 6, 7, 8, 9, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019}
>>> x.add(2)
>>> x
{0, 1, 3, 4, 5, 6, 7, 8, 9, 2, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019}
Even if you see this "ordered" behavior once, does not mean it is always so.
Trivial example:
w = set()
for i in range(100):
w.add(i)
w.add(str(i))
print(w)
Output:
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, '20', 21, 22, 23, 24, 25, 26, 27, 28, 29,
30, 32, 33, 34, 35, 36, 37, '9', 38, 31, 39, 40, 41, 42,
43, 44, 45, 46, 47, 48, 49, 50, 51, 52, '52', 53, 54, 55,
56, 57, 58, 59, 60, 61, '61', 62, 63, 64, '26', 65, 66,
67, '58', '36', 68, '6', '68', 69, '18', 71, 72, '4', 74,
75, 76, 77, '77', 79, 80, 81, 82, '12', '46', 85, 86, 87,
'33', 89, 90, 91, 92, 93, 94, 95, '23', '24', 98, 99, '49',
'92', '30', '44', '7', '21', '93', '86', '2', '67', '57',
'13', '79', '80', '96', '38', '32', '15', '45', '64', '83',
'65', '54', '88', '48', '75', '99', '71', '5', '0', '28',
'87', '43', '94', '90', '72', '42', '37', '59', '35', '8',
'17', '10', 70, 73, '98', '22', '19', '11', '27', '34', '14',
'56', '55', '69', '66', 78, '3', '1', '53', '84', '16', '25',
'76', 83, '82', '29', 84, '95', '31', '70', 88, '97', '40',
'47', '51', '85', '91', '60', '81', '89', 96, '78', '62',
'73', '74', 97, '41', '39', '50', '63'}
If it really sorted anything it should either
alternate the int or the string value (insert order)
show all ints sorted first, then all strings sorted
or some other kind of "detectable" pattern.
Using a very small samle set (range(10)) or very restricted values (all ints) can/might depending on the sets internal bucketing strategy lead to "ordered" outputs.

python - converting a list of 2 digit string numbers to a list of 2 digit integers

I have a list of 2 character strings of numbers,
I'm trying to write a function to convert this to a list of 2 digit integers without using int() or knowing the length of the list, this is my code so far:
intslist = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,
87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
numslist = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12',
'13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23',
'24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34',
'35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45',
'46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56',
'57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67',
'68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78',
'79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89',
'90', '91', '92', '93', '94', '95', '96', '97', '98', '99']
def convert_num(numlist,list1,list2):
returnlist = []
templist = []
convertdict = {k:v for k,v in zip(list1,list2)}
p = 0
num = ''.join(numlist)
for c in num:
templist.append(convertdict[num[p]])
p += 2
for i in templist:
if templist[i] % 2 == 0:
returnlist.append()
return returnlist
this works but only returns a list of the individual digits, not the 2 digits i want.
I'm only a beginner and don't really know how to proceed.
Any help appreciated!!
An integer is an integer. "Two digit integers" don't exist as a concept.
Without using int or len, to return an integer from a string, you can reverse a string, use ord instead of int, multiply by 10k and sum:
x = '84'
res = sum((ord(val)-48)*10**idx for idx, val in enumerate(reversed(x))) # 84
You can use map to apply the logic to every string in a list:
def str_to_int(x):
return sum((ord(val)-48)*10**idx for idx, val in enumerate(reversed(x)))
res = list(map(str_to_int, numslist))
print(res)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
...
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
The core of your solution will be taking the string and converting it to an integer:
def str_to_int(number):
return sum((ord(c) - 48) * (10 ** i) for i, c in enumerate(number[::-1]))
This method takes your number in, enumerates over it from the end and then converts the ASCII value of each character to its numeric representation and then makes sure it will occupy the proper digit in the overall number.
From there, you can use map to convert the entire list:
intsList = list(map(str_to_int, numsList))
The very simple solution:
dd={ str(i):i for i in range(10) } # {"0":0,"1":1,..."9":9}
rslt=[]
for ns in numslist:
n=0
for i in range(len(ns)):
n=10*n+dd[ns[i]]
rslt.append(n)

Shuffling text from file by group of data

I was looking for some approach in Python / Unix Command to shuffle large data set of text by grouping based on first words value like below-
Input Text:
"ABC", 21, 15, 45
"DEF", 35, 3, 35
"DEF", 124, 33, 5
"QQQ" , 43, 54, 35
"XZZ", 43, 35 , 32
"XZZ", 45 , 35, 32
So it would be randomly shuffled but keep the group together like below
Output Sample-
"QQQ" , 43, 54, 35
"XZZ", 43, 35 , 32
"XZZ", 45 , 35, 32
"ABC", 21, 15, 45
"DEF", 35, 3, 35
"DEF", 124, 33, 5
I found solution by normal shuffling, but I am not getting the idea to keep the group while shuffling.
It is possible to do it using collections.defaultdict. By identifying each line by its first sequence you can sort through them easily and then only sample over the dictionary's keys, like so:
import random
from collections import defaultdict
# Read all the lines from the file
lines = defaultdict(list)
with open("/path/to/file", "r") as in_file:
for line in in_file:
s_line = line.split(",")
lines[s_line[0]].append(line)
# Randomize the order
rnd_keys = random.sample(lines.keys(), len(lines))
# Write back to the file?
with open("/path/to/file", "w") as out_file:
for k in rnd_keys:
for line in lines[k]:
out_file.write(line)
Hope this helps in your endeavor.
You could also store each line from the file into a nested list:
lines = []
with open('input_text.txt') as in_file:
for line in in_file.readlines():
line = [x.strip() for x in line.strip().split(',')]
lines.append(line)
Which gives:
[['"ABC"', '21', '15', '45'], ['"DEF"', '35', '3', '35'], ['"DEF"', '124', '33', '5'], ['"QQQ"', '43', '54', '35'], ['"XZZ"', '43', '35', '32'], ['"XZZ"', '45', '35', '32']]
Then you could group these lists by the first item with itertools.groupby():
import itertools
from operator import itemgetter
grouped = [list(g) for _, g in itertools.groupby(lines, key = itemgetter(0))]
Which gives a list of your grouped items:
[[['"ABC"', '21', '15', '45']], [['"DEF"', '35', '3', '35'], ['"DEF"', '124', '33', '5']], [['"QQQ"', '43', '54', '35']], [['"XZZ"', '43', '35', '32'], ['"XZZ"', '45', '35', '32']]]
Then you could shuffle this with random.shuffle():
import random
random.shuffle(grouped)
Which gives a randomized list of your grouped items intact:
[[['"QQQ"', '43', '54', '35']], [['"ABC"', '21', '15', '45']], [['"XZZ"', '43', '35', '32'], ['"XZZ"', '45', '35', '32']], [['"DEF"', '35', '3', '35'], ['"DEF"', '124', '33', '5']]]
And now all you have to do is flatten the final list and write it to a new file, which you can do with itertools.chain.from_iterable():
with open('output_text.txt', 'w') as out_file:
for line in itertools.chain.from_iterable(grouped):
out_file.write(', '.join(line) + '\n')
print(open('output_text.txt').read())
Which a gives new shuffled version of your file:
"QQQ", 43, 54, 35
"ABC", 21, 15, 45
"XZZ", 43, 35, 32
"XZZ", 45, 35, 32
"DEF", 35, 3, 35
"DEF", 124, 33, 5

How to put a text file into a 2d list in tuples?

I am having a problem with passing a text file into a 2 dimensional array of tuples. Here is what my input file looks like, it's really big so this is just part of it.
26 54 94 25 53 93 24 52 92 25 53 93 25 53 93 25 53 93 25 53 93 27 55 95 28 55 98 26 53 96 25 52 95 26 53 96 27 54 97 28 55 98 27 54 97 26 53 96 26 55 97 26 55 97 26 55 97 26 55 97 25 54 96 25 54 96 25 54 96 26 55 97 26 55 99 27 56 100 28 57 101 26 55 99 25 54 98 26 55 99 26 55 99 26 55 99 25 54 98 26 55 99 27 56 100 27 56 100 26 55 99 26 55 99 26 55 99 27 56 100 28 57 101 29 58 102 29 58 102
Here is the function that is reading in the file and putting it in the 2d array
def load_image_data(infile):
'''
Accepts an input file object
Reads the data in from the input PPM image into a 2-dimensional list of RGB tuples
Returns the 2-dimensional list
'''
print("Loading image data...\n")
for line in infile.readlines():
line = line.strip()
values = line.split(" ")
new_line = []
for j in range(int(len(new_line) / 3) + 1):
for i in range(len(new_line) // 3):
r = new_line[0]
g = new_line[1]
b = new_line[2]
t = (r, g, b)
t_list.append(t)
del new_line[0]
del new_line[0]
del new_line[0]
new_line.append(t)
print(new_line)
print("done")
return new_line`
And here is main:
def main():
'''
Runs Program
'''
mods = ["vertical_flip", "horizontal_flip", "horizontal_blur", "negative",
"high_contrast", "random_noise", "gray_scale", "remove_color"]
# ** finish adding string modifications to this list
for mod in mods:
# get infile name
#file = (input("Please enter the input file name: "))
# get outfile name
#out = (input("Please enter the output file name: "))
infile = open("ny.ppm", "r") # ** get the filename from the user
outfile = open("ny_negative.ppm", "w") # ** change to use mod and user-spec filename
process_header(infile, outfile)
load_image_data(infile)
process_body(infile, outfile, mod)
outfile.close()
infile.close()
read the data from your file as inf and the split it to get the list of data. With that, iterate through the items for range(number_of_items//3) and then get the desired length appended to your list and return the same!
print("Loading image data...\n")
inf=infile.readlines()
inf = inf[0].split()
new_line=[]
for i in range(len(inf)//3):
r,b,g=inf[i*3:i*3+3]
print r,g,b
t = (r, g, b)
t_list.append(t)
new_line.append(inf[i*3:i*3+3])
return new_line
And in your main()
infile = open("ny.ppm", "r") # ** get the filename from the user
print load_image_data(infile)
infile.close()
Sample Output:
[['26', '54', '94'], ['25', '53', '93'], ['24', '52', '92'], ['25', '53', '93'], ['25', '53', '93'], ['25', '53', '93'], ['25', '53', '93'], ['27', '55', '95'], ['28', '55', '98'], ['26', '53', '96'], ['25', '52', '95'], ['26', '53', '96'], ['27', '54', '97'], ['28', '55', '98'], ['27', '54', '97'], ['26', '53', '96'], ['26', '55', '97'], ['26', '55', '97'], ['26', '55', '97'], ['26', '55', '97'], ['25', '54', '96'], ['25', '54', '96'], ['25', '54', '96'], ['26', '55', '97'], ['26', '55', '99'], ['27', '56', '100'], ['28', '57', '101'], ['26', '55', '99'], ['25', '54', '98'], ['26', '55', '99'], ['26', '55', '99'], ['26', '55', '99'], ['25', '54', '98'], ['26', '55', '99'], ['27', '56', '100'], ['27', '56', '100'], ['26', '55', '99'], ['26', '55', '99'], ['26', '55', '99'], ['27', '56', '100'], ['28', '57', '101'], ['29', '58', '102'], ['29', '58', '102']]
Hope it helps!
Here is how you could chunk into tuples:
In [8]: from itertools import islice
In [9]: with open("yourfile.DATA") as f:
...: data = f.read().split()
...: size = len(data)
...: it = map(int, data)
...: data = [tuple(islice(it,0,3)) for _ in range(0, size, 3)]
...:
The output:
In [10]: data
Out[10]:
[(26, 54, 94),
(25, 53, 93),
(24, 52, 92),
(25, 53, 93),
(25, 53, 93),
(25, 53, 93),
(25, 53, 93),
(27, 55, 95),
(28, 55, 98),
(26, 53, 96),
(25, 52, 95),
(26, 53, 96),
(27, 54, 97),
(28, 55, 98),
(27, 54, 97),
(26, 53, 96),
(26, 55, 97),
(26, 55, 97),
(26, 55, 97),
(26, 55, 97),
(25, 54, 96),
(25, 54, 96),
(25, 54, 96),
(26, 55, 97),
(26, 55, 99),
(27, 56, 100),
(28, 57, 101),
(26, 55, 99),
(25, 54, 98),
(26, 55, 99),
(26, 55, 99),
(26, 55, 99),
(25, 54, 98),
(26, 55, 99),
(27, 56, 100),
(27, 56, 100),
(26, 55, 99),
(26, 55, 99),
(26, 55, 99),
(27, 56, 100),
(28, 57, 101),
(29, 58, 102),
(29, 58, 102)]
That list comprehension could be written a little more verbosely as:
In [11]: with open('yourfile.DATA') as f:
...: data = f.read().split()
...: size = len(data)
...: it = map(int, data)
...: data = []
...: for _ in range(0, size, 3):
...: data.append(tuple(islice(it, 0, 3)))
...:
Note that I used a with block, which is advisable when dealing with files, not only do they close the file for you, but they make sure the file is closed (in case of an exception handling even, for example).
One piece of advice, be careful passing file-handlers around. When you do stuff like this:
infile = open("ny.ppm", "r") # ** get the filename from the user
outfile = open("ny_negative.ppm", "w") # ** change to use mod and user-spec filename
process_header(infile, outfile)
load_image_data(infile)
process_body(infile, outfile, mod)
outfile.close()
infile.close()
Be aware that file handlers like infile act sort of like one-pass iterators, and can only do stuff like .readlines() once. So if you use infile.readlines() in process_header, when you pass that same infile to process_body, subsequent calls to infile.readlines() will raise an error unless unless you reset the file cursor explicitly using infile.seek(0) -- which is why I say they are "sort of" like one-pas iterators. But I suggest not dealing with that and instead passing around a string of the path to the file, and using a with-block to open your files.
Something like this would read (and return) the image data as a list-of-lists-of-tuples:
try:
from itertools import izip
except ImportError: # Python 3
izip = zip
def load_image_data(infile):
rows = []
for line in infile:
values = [int(v) for v in line.split()]
tuples = [t for t in izip(*[iter(values)]*3)]
rows.append(tuples)
return rows
def main():
with open("ny.ppm", "r") as infile, open("ny_negative.ppm", "w") as outfile:
process_header(infile, outfile)
image_data = load_image_data(infile)
print(image_data)
# etc ...
main()
Sample of output format:
[[(255, 0, 0), (0, 255, 0), (0, 0, 255), ...],
[(255, 255, 0), (255, 255, 255), (0, 0, 0), ...],
...
]

Categories