How to use linear search to arrange nested lists? - python

Currently working with a text file that is read into python and then must be made into lists with a list (nested I guess?) So far I've tried using linear searching code but it only checks one of the lists in the nested list:
def populationreader():
with open("PopulationofAnnecy", "r") as in_file:
nested = [line.strip().split(',') for line in in_file][1:]
print nested
This yields the following nested list:
[['Alabama', '126', '79', '17'], ['Alaska', '21', '100', '10'], ['Arizona', '190', '59', '16'], ['Arkansas', '172', '49', '28'], ['California', '4964', '76', '22'] etc …. ]
But it should look something more like:
[[California 4964,76,22],[Texas 3979,62,23],[New York 1858,69,20],[Virginia 1655,60,19]etc …. ]
I've tried using something along the lines of this (pseudo):
for index in range(1,len(alist)):
currentvalue = alist[index]
position = index
while position>0 and alist[position-1]>currentvalue:
alist[position]=alist[position-1]
position = position-1
alist[position]=currentvalue
Trying to do it without using the built in python sort() or sorted() functions but I'm just having trouble sorting things within a list

Once you have your list read in from the file, you can use sort or sorted, but you want to make sure you sort by the second element [1] and make sure to reverse also. Otherwise the default is to sort by the first element of the list (the state name) and alphabetically since it is a string.
l = [['Alabama', '126', '79', '17'],
['Alaska', '21', '100', '10'],
['Arizona', '190', '59', '16'],
['Arkansas', '172', '49', '28'],
['California', '4964', '76', '22'],
['Texas', '3979','62','23'],
['New York', '1858','69','20'],
['Virginia', '1655','60','19']]
sorted(l, key = lambda i: int(i[1]), reverse=True)
Output
[['California', '4964', '76', '22'],
['Texas', '3979', '62', '23'],
['New York', '1858', '69', '20'],
['Virginia', '1655', '60', '19'],
['Arizona', '190', '59', '16'],
['Arkansas', '172', '49', '28'],
['Alabama', '126', '79', '17'],
['Alaska', '21', '100', '10']]

Related

how can I combine list elements inside list based on element value?

If I want to combine lists inside list based on element value how can I achieve that?
suppose if list
lis = [['steve','reporter','12','34','22','98'],['megan','arch','44','98','32','22'],['jack','doctor','80','32','65','20'],['steve','dancer','66','31','54','12']]
here list containing 'steve' appears twice so I want to combine them as below
new_lis = [['steve','reporter','12','34','22','98','dancer','66','31','54','12'],['megan','arch','44','98','32','22'],['jack','doctor','80','32','65','20']]
I tried below code to achieve this
new_dic = {}
for i in range(len(lis)):
name = lis[i][0]
if name in new_dic:
new_dic[name].append([lis[i][1],lis[i][2],lis[i][3],lis[i][4],lis[i][5]])
else:
new_dic[name] = [lis[i][1],lis[i][2],lis[i][3],lis[i][4],lis[i][5]]
print(new_dic)
I ended up creating a dictionary with multiple values of lists as below
{'steve': ['reporter', '12', '34', '22', '98', ['dancer', '66', '31', '54', '12']], 'megan': ['arch', '44', '98', '32', '22'], 'jack': ['doctor', '80', '32', '65', '20']}
but I wanted it as single list so I can convert into below format
new_lis = [['steve','reporter','12','34','22','98','dancer','66','31','54','12'],['megan','arch','44','98','32','22'],['jack','doctor','80','32','65','20']]
is there a way to tackle this in different way?
There is a differnet way to do it using groupby function from itertools. Also there are ways to convert your dict to a list also. It totally depends on what you want.
from itertools import groupby
lis = [['steve','reporter','12','34','22','98'],['megan','arch','44','98','32','22'],['jack','doctor','80','32','65','20'],['steve','dancer','66','31','54','12']]
lis.sort(key = lambda x: x[0])
output = []
for name , groups in groupby(lis, key = lambda x: x[0]):
temp_list = [name]
for group in groups:
temp_list.extend(group[1:])
output.append(temp_list)
print(output)
OUTPUT
[['jack', 'doctor', '80', '32', '65', '20'], ['megan', 'arch', '44', '98', '32', '22'], ['steve', 'reporter', '12', '34', '22', '98', 'dancer', '66', '31', '54', '12']]
Not sure whether this snippet answers your question or not. This is not a fastest approach in terms to time complexity. I will update this answer if I can solve in a better way.
lis = [['steve','reporter','12','34','22','98'],['megan','arch','44','98','32','22'],['jack','doctor','80','32','65','20'],['steve','dancer','66','31','54','12']]
new_lis = []
element_value = 'steve'
for inner_lis in lis:
if element_value in inner_lis:
if not new_lis:
new_lis+=inner_lis
else:
inner_lis.remove(element_value)
new_lis+=inner_lis
lis.remove(inner_lis)
print([new_lis] + lis)
Output
[['steve', 'reporter', '12', '34', '22', '98', 'dancer', '66', '31', '54', '12'], ['megan', 'arch', '44', '98', '32', '22'], ['jack', 'doctor', '80', '32', '65', '20']]

Iterate on the elements of nested list at particular index together with index of element of another normal list

I have these two lists:
date_list = ['01-01-2020', '01-02-2020', '01-03-2020', '01-04-2020']
values_list = [['00:00:00', '20', '1', '5000'],
['00:01:00', '23', '70', '6000', '00:02:00', '56', '48', '8000'],
['00:03:00', '32', '90', '5800', '00:04:00', '666', '486', '9000'],
['00:05:00', '776', '68', '950']]
I want that the date get added to every time values like this:
[['01-01-2020 00:00:00', '20', '1', '5000'],
['01-02-2020 00:01:00', '23', '70', '6000', '01-02-2020 00:02:00', '56', '48', '8000'],
['01-03-2020 00:03:00', '32', '90', '5800', '01-03-2020 00:04:00', '666', '486', '9000'],
['01-04-2020 00:05:00', '776', '68', '950']]
I have time at every 27th index in my original values list. I am doing this till now, that i add date to every element of nested list and then use pandas to clear it after.
[ [date_list[i] + ' '+ j for j in sub] for i, sub in enumerate(values_list) ]
Any optimal way to do this?
I'd zip both data together (not combine them) and rebuild the list, altering the element if matches date (every 4th element), keeping others
result = [["{} {}".format(dl,x) if i%4==0 else x for i,x in enumerate(vl)]
for dl,vl in zip(date_list,values_list)]
>>> result
[['01-01-2020 00:00:00', '20', '1', '5000'],
['01-02-2020 00:01:00',
'23',
'70',
'6000',
'01-02-2020 00:02:00',
'56',
'48',
'8000'],
['01-03-2020 00:03:00',
'32',
'90',
'5800',
'01-03-2020 00:04:00',
'666',
'486',
'9000'],
['01-04-2020 00:05:00', '776', '68', '950']]
I'd use a loop.
for date, values in zip(date_list, values_list):
values[::4] = (date + ' ' + value for value in values[::4])
ans = [[v if ':' not in v else dl+' '+v for v in vl] for dl,vl in zip(date_list, values_list)]
The code appends the date to all attributes that have a : which only dates have.
Output :
[['01-01-2020 00:00:00', '20', '1', '5000'],
['01-02-2020 00:01:00',
'23',
'70',
'6000',
'01-02-2020 00:02:00',
'56',
'48',
'8000'],
['01-03-2020 00:03:00',
'32',
'90',
'5800',
'01-03-2020 00:04:00',
'666',
'486',
'9000'],
['01-04-2020 00:05:00', '776', '68', '950']]

How to sort list of lists with alternate alphabetical value maintaining the association with adjacent numeric values

I have a python list of lists like this
[['CCND1', '67', 'FAS', '99', 'IRAK3', '92', 'ALG14', '86', 'ADRBK1', '10'], ['PTRX', '95', 'CCNA', '33']]
Each alphabetical value is associated with the numeric value , i.e IRAK3 and 92 are associated (92, should appear after IRAK3) and PTRX and 95 are associated (95 should appear after PTRX ). Now , I want to alphabetically sort this list of lists so that the sorted list looks like this:
[['ADRBK1', '10', 'ALG14', '86', 'CCND1', '67', 'FAS', '99', 'IRAK3', '92' ], ['CCNA', '33', 'PTRX', '95' ]]
Note that in the sorted list, the alphabetical values are sorted but again, note that 92, appear after IRAK3 AND 95 appear after PTRX i.e the association is maintained.
How could I do that ?
This is one approach.
Ex:
from itertools import chain
data = [['CCND1', '67', 'FAS', '99', 'IRAK3', '92', 'ALG14', '86', 'ADRBK1', '10'], ['PTRX', '95', 'CCNA', '33']]
#pair elements --> [('CCND1', '67'), ('FAS', '99')....
data = [zip(i[::2], i[1::2]) for i in data]
#sort and flatten
data = [list(chain.from_iterable(sorted(i, key=lambda x: x[0]))) for i in data]
print(data)
"Lists should generally be homogeneous. Use tuples and dictionaries for heterogeneous collections of related data." [1]
One approach would be you arrange them as tuples and then sorting them would also be easier.
mylist = [['CCND1', '67', 'FAS', '99', 'IRAK3', '92', 'ALG14', '86', 'ADRBK1', '10'], ['PTRX', '95', 'CCNA', '33']]
list_of_tuples = []
for l in mylist:
if len(l) % 2 is not 0:
raise ValueError()
list_of_tuples.append([(l[i], l[i+1]) for i in range(0, len(l), 2)])
for l in list_of_tuples:
l.sort(key=lambda tup: tup[0])
print(list_of_tuples)
# [[('ADRBK1', '10'), ('ALG14', '86'), ('CCND1', '67'), ('FAS', '99'), ('IRAK3', '92')], [('CCNA', '33'), ('PTRX', '95')]]
You can try this:
>>> list_ = [['CCND1', '67', 'FAS', '99', 'IRAK3', '92', 'ALG14', '86', 'ADRBK1', '10'], ['PTRX', '95', 'CCNA', '33']]
>>> paired_list = (zip(*[iter(l)]*2) for l in list_)
>>> sorted_list = [list(sum(sorted(l, key=lambda x: x[0]),())) for l in paired_list]
>>> sorted_list
[['ADRBK1', '10', 'ALG14', '86', 'CCND1', '67', 'FAS', '99', 'IRAK3', '92'],
['CCNA', '33', 'PTRX', '95']]
References:
zip(*[iter(iterable)]*n)
sorted
I have used sum to concatenate the list of tuples. However, if you are prepared to use other modules, it is always better to use itertools.chain.from_iterables.

Python + regex: How to extract values between two underscores in Python?

I am trying to extract values between two underscores. For that I have written this code:
patient_ids = []
for file in files:
print(file)
patient_id = re.findall("_(.*?)_", file)
patient_ids.append(patient_id)
print(patient_ids)
Output:
PT_112_NIM 26-04-2017_merged.csv
PT_114_NIM_merged.csv
PT_115_NIM_merged.csv
PT_116_NIM_merged.csv
PT_117_NIM_merged.csv
PT_118_NIM_merged.csv
PT_119_NIM_merged.csv
[['112'], ['114'], ['115'], ['116'], ['117'], ['118'], ['119'], ['120'], ['121'], ['122'], ['123'], ['124'], ['125'], ['126'], ['127'], ['128'], ['129'], ['130'], ['131'], ['132'], ['133'], ['134'], ['135'], ['136'], ['137'], ['138'], ['139'], ['140'], ['141'], ['142'], ['143'], ['144'], ['145'], ['146'], ['147'], ['150'], ['151'], ['152'], ['153'], ['154'], ['155'], ['156'], ['157'], ['158'], ['159'], ['160'], ['161'], ['162'], ['163'], ['165']]
So extracted values are in this form: ['121']. I want them in this form: 121 , i.e., just the number inside two underscores.
What change should I make to my code?
Really, an easy way would be, instead of appending a list to another list, just make that list equivalent:
patient_ids = []
for file in files:
print(file)
patient_ids.extend(re.findall("_(.*?)_", file))
print(patient_ids)
Just replace the last line of your for loop by :
patient_ids.extend(int(patient_id))
extend will flatten your results, and int(patient_id) will convert the string to int
You need to flatten your results, e.g. like that:
patient_ids = [item for sublist in patient_ids for item in sublist]
print flat_list
# => ['112', '114', '115', '116', '117', '118', '119', '120', '121', '122', '123', '124', '125', '126', '127', '128', '129', '130', '131', '132', '133', '134', '135', '136', '137', '138', '139', '140', '141', '142', '143', '144', '145', '146', '147', '150', '151', '152', '153', '154', '155', '156', '157', '158', '159', '160', '161', '162', '163', '165']
You have a list of findall results (which only ever is 1 result per file it seems) - you can either just convert the strings to integers or also flatten the result:
patient_ids= [['112'], ['114','4711'], ['115'], ['116'], ['117'], ['118'], ['119']]
# ^^^^^ ^^^^^^ modified to have 2 ids for demo-purposes
# if you want to keep the boxing
numms = [ list(map(int,m)) for m in patient_ids]
# converted and flattened
numms2 = [ x for y in [list(map(int,m)) for m in patient_ids] for x in y]
print(numms)
print(numms2)
Output:
# this keeps the findall results together in inner lists
[[112], [114, 4711], [115], [116], [117], [118], [119]]
# this flattens all results
[112, 114, 4711, 115, 116, 117, 118, 119]
Doku:
you can find the doku for map() and int() at Overview of built in functions

Sorting a list which has numbers in the form of string in python [duplicate]

This question already has answers here:
How to sort a list of strings numerically?
(14 answers)
Closed 6 years ago.
I have list of numbers which is in the form of string and I want to sort the numbers in it. How do I sort such numbers in python?
Here is my list.
key_s=['115', '114', '117', '116', '111', '110', '113', '112', '68', '119', '118', '44', '45', '42', '43', '41', '76', '108', '109', '71', '107', '79', '13', '15', '14', '17', '16', '37']
I tried using key_s.sort().It returns None instead of sorted array. I even tried sorted(key_s) which is even not working. So what is the solution to sort it?
Yes, list.sort() sorts in place, and returns None to indicate it is the list itself that has been sorted. The sorted() function returns a new sorted list, leaving the original list unchanged.
Use int as a key:
sorted(key_s, key=int)
This returns a new list, sorted by the numeric value of each string, but leaving the type of the values themselves unchanged.
Without a key, strings are sorted lexicographically instead, comparing character by character. Thus '9' is sorted after '10', because the character '1' comes before '9' in character sets, just like 'a' comes before 'z'.
The key argument lets you apply a Schwartzian_transform, informing the sorting algorithm what to sort by. Each value in the list is sorted according to key(value) (so int(value) here) instead of the original value itself.
Demo:
>>> key_s = ['115', '114', '117', '116', '111', '110', '113', '112', '68', '119', '118', '44', '45', '42', '43', '41', '76', '108', '109', '71', '107', '79', '13', '15', '14', '17', '16', '37']
>>> sorted(key_s, key=int)
['13', '14', '15', '16', '17', '37', '41', '42', '43', '44', '45', '68', '71', '76', '79', '107', '108', '109', '110', '111', '112', '113', '114', '115', '116', '117', '118', '119']

Categories