set add() method sorts the set? - python

Using the python set add method i have noticed that the method sorts the content based on value and the content of the set.
Based on the docstring the following method description is found:
Why is this happining ? And is there a method for this not to occur ?
I am using Python 3.6.

Please don't count on this behavior:
>>> x = set()
>>> for i in range(10):
... x.add(i)
...
>>> x
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
>>> for i in range(1000, 1020):
... x.add(i)
...
>>> x
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019}
>>> x.remove(2)
>>> x
{0, 1, 3, 4, 5, 6, 7, 8, 9, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019}
>>> x.add(2)
>>> x
{0, 1, 3, 4, 5, 6, 7, 8, 9, 2, 1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010, 1011, 1012, 1013, 1014, 1015, 1016, 1017, 1018, 1019}

Even if you see this "ordered" behavior once, does not mean it is always so.
Trivial example:
w = set()
for i in range(100):
w.add(i)
w.add(str(i))
print(w)
Output:
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, '20', 21, 22, 23, 24, 25, 26, 27, 28, 29,
30, 32, 33, 34, 35, 36, 37, '9', 38, 31, 39, 40, 41, 42,
43, 44, 45, 46, 47, 48, 49, 50, 51, 52, '52', 53, 54, 55,
56, 57, 58, 59, 60, 61, '61', 62, 63, 64, '26', 65, 66,
67, '58', '36', 68, '6', '68', 69, '18', 71, 72, '4', 74,
75, 76, 77, '77', 79, 80, 81, 82, '12', '46', 85, 86, 87,
'33', 89, 90, 91, 92, 93, 94, 95, '23', '24', 98, 99, '49',
'92', '30', '44', '7', '21', '93', '86', '2', '67', '57',
'13', '79', '80', '96', '38', '32', '15', '45', '64', '83',
'65', '54', '88', '48', '75', '99', '71', '5', '0', '28',
'87', '43', '94', '90', '72', '42', '37', '59', '35', '8',
'17', '10', 70, 73, '98', '22', '19', '11', '27', '34', '14',
'56', '55', '69', '66', 78, '3', '1', '53', '84', '16', '25',
'76', 83, '82', '29', 84, '95', '31', '70', 88, '97', '40',
'47', '51', '85', '91', '60', '81', '89', 96, '78', '62',
'73', '74', 97, '41', '39', '50', '63'}
If it really sorted anything it should either
alternate the int or the string value (insert order)
show all ints sorted first, then all strings sorted
or some other kind of "detectable" pattern.
Using a very small samle set (range(10)) or very restricted values (all ints) can/might depending on the sets internal bucketing strategy lead to "ordered" outputs.

Related

Best way to remove spaces between digits and sum python <class 'str'>?

I am a learner and currently working on a task now and almost finished, but I got stuck with how finish up.
I have a < class 'str'> that looks like:
I need to remove the spaces in-between the 2-digit numbers, so I can easily sum the whole numbers.
Below is the exact string:
9 7 9 7 9 0 9 0 8 8 8 7 8 7 8 0
7 9 7 9 7 8 7 6 7 6 7 2 7 2 6 6 6 6 6 5 6 5 6 4 6
1 6 1 5 9 5 8 5 7 5 7 5 4 5 1 4 9 4 7 4 0 3 8 3 7
3 6 3 6 3 2 2 5 2 4 2 2 2 1 1 9 1 8 1 8 1 4 1 2
1 2 9 7 3 2
I have checked similar answers here with Regex, and every solution I tried seems to just either remove all the spaces(leaving one long string of digits) or separate them into single digits.
What is the best way to solve this problem?
You can use a regex to look for a digit '\d' that is optionally followed by a single space and then another digit '(?: \d)?'. Then in a list comprehension remove the middle whitespace if there is one
>>> [i.replace(' ', '') for i in re.findall(r'(\d(?: \d)?)', s)]
['97', '97', '90', '90', '88', '87', '87', '80', '79', '79', '78', '76', '76', '72', '72', '66', '66', '65', '65', '64', '6', '1', '61', '59', '58', '57', '57', '54', '51', '49', '47', '40', '38', '37', '36', '36', '32', '25', '24', '22', '21', '19', '18', '18', '14', '12', '12', '9', '7', '3', '2']
To convert these into int types
>>> [int(i.replace(' ', '')) for i in re.findall(r'(\d(?: \d)?)', s)]
[97, 97, 90, 90, 88, 87, 87, 80, 79, 79, 78, 76, 76, 72, 72, 66, 66, 65, 65, 64, 6, 1, 61, 59, 58, 57, 57, 54, 51, 49, 47, 40, 38, 37, 36, 36, 32, 25, 24, 22, 21, 19, 18, 18, 14, 12, 12, 9, 7, 3, 2]
and to sum them
>>> sum(int(i.replace(' ', '')) for i in re.findall(r'(\d(?: \d)?)', s))
2499

combine same of nested dictionaries with different values in python

I have a list of dictionaries with same key but different values like :
[{190: {'1': [113, 1, 1551076176, 2, '9', 1]}}, {190: {'2': [113, 1, 1551076176, 3, '13', 1]}}, {190: {'3': [113, 1, 1551076176, 5, '20', 1]}}]
What I require is this format :
[{190: {'1': [113, 1, 1551076176, 2, '9', 1]},{'2': [113, 1, 1551076176, 3, '13', 1]},{'3': [113, 1, 1551076176, 5, '20', 1]}}]
How to do this?
OutputObj = {}
InputObj = [{190: {'1': [113, 1, 1551076176, 2, '9', 1]}}, {190: {'2': [113, 1, 1551076176, 3, '13', 1]}}, {190: {'3': [113, 1, 1551076176, 5, '20', 1]}}]
for i in InputObj:
for k,v in i.items():
if k in OutputObj:
OutputObj[k].append(v)
else:
OutputObj[k] = [v]
print (OutputObj)
#{190: [{'1': [113, 1, 1551076176, 2, '9', 1]}, {'2': [113, 1, 1551076176, 3, '13', 1]}, {'3': [113, 1, 1551076176, 5, '20', 1]}]}

python - converting a list of 2 digit string numbers to a list of 2 digit integers

I have a list of 2 character strings of numbers,
I'm trying to write a function to convert this to a list of 2 digit integers without using int() or knowing the length of the list, this is my code so far:
intslist = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86,
87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
numslist = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12',
'13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23',
'24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34',
'35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45',
'46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56',
'57', '58', '59', '60', '61', '62', '63', '64', '65', '66', '67',
'68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '78',
'79', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89',
'90', '91', '92', '93', '94', '95', '96', '97', '98', '99']
def convert_num(numlist,list1,list2):
returnlist = []
templist = []
convertdict = {k:v for k,v in zip(list1,list2)}
p = 0
num = ''.join(numlist)
for c in num:
templist.append(convertdict[num[p]])
p += 2
for i in templist:
if templist[i] % 2 == 0:
returnlist.append()
return returnlist
this works but only returns a list of the individual digits, not the 2 digits i want.
I'm only a beginner and don't really know how to proceed.
Any help appreciated!!
An integer is an integer. "Two digit integers" don't exist as a concept.
Without using int or len, to return an integer from a string, you can reverse a string, use ord instead of int, multiply by 10k and sum:
x = '84'
res = sum((ord(val)-48)*10**idx for idx, val in enumerate(reversed(x))) # 84
You can use map to apply the logic to every string in a list:
def str_to_int(x):
return sum((ord(val)-48)*10**idx for idx, val in enumerate(reversed(x)))
res = list(map(str_to_int, numslist))
print(res)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
...
81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
The core of your solution will be taking the string and converting it to an integer:
def str_to_int(number):
return sum((ord(c) - 48) * (10 ** i) for i, c in enumerate(number[::-1]))
This method takes your number in, enumerates over it from the end and then converts the ASCII value of each character to its numeric representation and then makes sure it will occupy the proper digit in the overall number.
From there, you can use map to convert the entire list:
intsList = list(map(str_to_int, numsList))
The very simple solution:
dd={ str(i):i for i in range(10) } # {"0":0,"1":1,..."9":9}
rslt=[]
for ns in numslist:
n=0
for i in range(len(ns)):
n=10*n+dd[ns[i]]
rslt.append(n)

Shuffling text from file by group of data

I was looking for some approach in Python / Unix Command to shuffle large data set of text by grouping based on first words value like below-
Input Text:
"ABC", 21, 15, 45
"DEF", 35, 3, 35
"DEF", 124, 33, 5
"QQQ" , 43, 54, 35
"XZZ", 43, 35 , 32
"XZZ", 45 , 35, 32
So it would be randomly shuffled but keep the group together like below
Output Sample-
"QQQ" , 43, 54, 35
"XZZ", 43, 35 , 32
"XZZ", 45 , 35, 32
"ABC", 21, 15, 45
"DEF", 35, 3, 35
"DEF", 124, 33, 5
I found solution by normal shuffling, but I am not getting the idea to keep the group while shuffling.
It is possible to do it using collections.defaultdict. By identifying each line by its first sequence you can sort through them easily and then only sample over the dictionary's keys, like so:
import random
from collections import defaultdict
# Read all the lines from the file
lines = defaultdict(list)
with open("/path/to/file", "r") as in_file:
for line in in_file:
s_line = line.split(",")
lines[s_line[0]].append(line)
# Randomize the order
rnd_keys = random.sample(lines.keys(), len(lines))
# Write back to the file?
with open("/path/to/file", "w") as out_file:
for k in rnd_keys:
for line in lines[k]:
out_file.write(line)
Hope this helps in your endeavor.
You could also store each line from the file into a nested list:
lines = []
with open('input_text.txt') as in_file:
for line in in_file.readlines():
line = [x.strip() for x in line.strip().split(',')]
lines.append(line)
Which gives:
[['"ABC"', '21', '15', '45'], ['"DEF"', '35', '3', '35'], ['"DEF"', '124', '33', '5'], ['"QQQ"', '43', '54', '35'], ['"XZZ"', '43', '35', '32'], ['"XZZ"', '45', '35', '32']]
Then you could group these lists by the first item with itertools.groupby():
import itertools
from operator import itemgetter
grouped = [list(g) for _, g in itertools.groupby(lines, key = itemgetter(0))]
Which gives a list of your grouped items:
[[['"ABC"', '21', '15', '45']], [['"DEF"', '35', '3', '35'], ['"DEF"', '124', '33', '5']], [['"QQQ"', '43', '54', '35']], [['"XZZ"', '43', '35', '32'], ['"XZZ"', '45', '35', '32']]]
Then you could shuffle this with random.shuffle():
import random
random.shuffle(grouped)
Which gives a randomized list of your grouped items intact:
[[['"QQQ"', '43', '54', '35']], [['"ABC"', '21', '15', '45']], [['"XZZ"', '43', '35', '32'], ['"XZZ"', '45', '35', '32']], [['"DEF"', '35', '3', '35'], ['"DEF"', '124', '33', '5']]]
And now all you have to do is flatten the final list and write it to a new file, which you can do with itertools.chain.from_iterable():
with open('output_text.txt', 'w') as out_file:
for line in itertools.chain.from_iterable(grouped):
out_file.write(', '.join(line) + '\n')
print(open('output_text.txt').read())
Which a gives new shuffled version of your file:
"QQQ", 43, 54, 35
"ABC", 21, 15, 45
"XZZ", 43, 35, 32
"XZZ", 45, 35, 32
"DEF", 35, 3, 35
"DEF", 124, 33, 5

parsing this csv file in python(pylab) and converting it into a dictionary

I have this code:
data = np.genfromtxt('csv_data.csv', dtype=None, names=True)
print data
It results in the following output
[('westin,390,291,70,43,19,215,27,813',)
('ramada,136,67,53,30,24,149,49,310',)
('sutton,489,293,106,39,20,299,24,947',)
('loden,681,134,17,5,0,199,4,837',) ('hampton,241,166,26,5,1,159,21,439',)
('shangrila,332,45,20,8,2,325,8,407',) ('mariott,22,15,5,0,0,179,35,42',)
('pan_pacific,475,262,86,29,16,249,15,868',)
('sheraton,277,346,150,80,26,249,45,879',)
('westin_bayshore,390,291,70,43,19,199,27,813',)]
It didn't copy the column headers:
Hotel,excellent,verygood,average,poor,terrible,cheapest,rank,reviews
from the file. What Im trying to do is save the output to a dicationary data structure in python. Is there a way to convert this output inot a dictionary ?
I can write a function to parse this but I was wondering if there is a built in function in Python.
Thanks
You didn't give a value to the delimiter parameter. therefore, np.genfromtxt uses the default None and try to separate the fields using spaces.
You need to use
np.genfromtxt(your_file, dtype=None, delimiter=',', names=True)
Process the file yourself using the csv module.
The following takes the file, and creates a dictionary called by_hotel whose key is the hotel name, and whose values is a dictionary of fieldname->value of the original row (note it also includes the hotel name, but anyway...)
import csv
with open('csv_data.csv') as fin:
csvin = csv.DictReader(fin)
headers = csvin.fieldnames
by_hotel = {row['Hotel']: row for row in csvin}
print by_hotel['sutton']['excellent']
# 489
If you wanted a list back in the original order, then you could do:
print [hotel['sutton'][fname] for fname in headers]
NB: You may want to convert your values to integers for computation purposes though.
Simple version :
d = { item[0].split(',')[0] : item[0].split(',')[1:] for item in data }
return :
{'sutton': ['489', '293', '106', '39', '20', '299', '24', '947'], 'hampton': ['241', '166', '26', '5', '1', '159', '21', '439'], 'westin_bayshore': ['390', '291', '70', '43', '19', '199', '27', '813'], 'sheraton': ['277', '346', '150', '80', '26', '249', '45', '879'], 'ramada': ['136', '67', '53', '30', '24', '149', '49', '310'], 'mariott': ['22', '15', '5', '0', '0', '179', '35', '42'], 'loden': ['681', '134', '17', '5', '0', '199', '4', "837'"], 'shangrila': ['332', '45', '20', '8', '2', '325', '8', '407'], 'pan_pacific': ['475', '262', '86', '29', '16', '249', '15', '868']}
and more complicated (dict of dict) :
d = { item[0].split(',')[0] : { headers[i] : int( item[0].split(',')[i+1].strip("'") ) for i in range(len( item[0].split(',')[1:] ) ) } for item in data }
return :
{'sutton': {'poor': 39, 'cheapest': 299, 'average': 106, 'terrible': 20, 'rank': 24, 'reviews': 947, 'excellent': 489, 'verygood': 293}, 'hampton': {'poor': 5, 'cheapest': 159, 'average': 26, 'terrible': 1, 'rank': 21, 'reviews': 439, 'excellent': 241, 'verygood': 166}, 'westin_bayshore': {'poor': 43, 'cheapest': 199, 'average': 70, 'terrible': 19, 'rank': 27, 'reviews': 813, 'excellent': 390, 'verygood': 291}, 'sheraton': {'poor': 80, 'cheapest': 249, 'average': 150, 'terrible': 26, 'rank': 45, 'reviews': 879, 'excellent': 277, 'verygood': 346}, 'ramada': {'poor': 30, 'cheapest': 149, 'average': 53, 'terrible': 24, 'rank': 49, 'reviews': 310, 'excellent': 136, 'verygood': 67}, 'mariott': {'poor': 0, 'cheapest': 179, 'average': 5, 'terrible': 0, 'rank': 35, 'reviews': 42, 'excellent': 22, 'verygood': 15}, 'loden': {'poor': 5, 'cheapest': 199, 'average': 17, 'terrible': 0, 'rank': 4, 'reviews': 837, 'excellent': 681, 'verygood': 134}, 'shangrila': {'poor': 8, 'cheapest': 325, 'average': 20, 'terrible': 2, 'rank': 8, 'reviews': 407, 'excellent': 332, 'verygood': 45}, 'pan_pacific': {'poor': 29, 'cheapest': 249, 'average': 86, 'terrible': 16, 'rank': 15, 'reviews': 868, 'excellent': 475, 'verygood': 262}}
import csv
f = open("csv_data",'r')
holder = csv.reader(f,delimiter = ',')
data_dict = {}
headers = []
first_row = True
for row in holder:
if first_row:
first_row = False
for header in row:
colname = str(header)
headers.append(colname)
data_dict[colname] = []
else:
colnum = 0
for datapoint in row:
data_dict[headers[colnum]].append(int(datapoint))
colnum += 1
Thus you can have a dictionary variable having keys which are column headers(which are first row of csv file) and values associated with those keys as list(remaining data in csv file).
Moreover, header is a list of all the column headers.

Categories