I am new to python and would appreciate a little help.
How does one do the following:
Having converted each line within a file to a nested list,
e.g. [['line 1', 'a'], ['line 2','b']] how do I flatten the list so that each line is associated with a variable. Assume that the first member in each list, i.e. i[:][0], is known.
Is it possible to associate more than one list with one variable, i.e. can x = [list1], [list2]?
Having used a for loop on a list, how those one associate aspects of that list with a variable? See example below.
Example:
for i in list_1:
if i[:][0] == 'm':
i[2] = a
i[3] = b
i[4] = c
The above returns NameError, a, b, c, not defined. How does one define variables resulting from iterations in a for loop or loops in general?
Hope I was clear and succinct as I am perplexed!
Update:
To clarify:
I have a nested list, where each list within the nest holds strings. These strings are actually numbers. I wish to convert the strings to integers in order to perform arithmetic operations.
Example:
[['1', '2', '3'], ['4', '5', '6'], ['7', '8', '9']]
Now, to convert each string to an integer, is abs() appropriate? How should this be implemented?
Also, to sum the third item of each list within the nest and assign the total to a variable? Should I define a function for this?
Any suggestions on how to deal with this are much appreciated!
Also, the earlier suggestions, made me realise that my thinking was creating the problem! Thanks!
# Answer to question 1 - just use the built-in functionality of lists.
#
# There is no need to use variables when lists let you do so much more
# in a quick and organised fashion.
lines = []
for line in open_file:
lines.append(line)
Since Li0liQ already answered questions 2 and 3, I'd just like to add a recommendation regarding question 3. You really don't need to make a copy of the list via i[:] since you're just testing a value in the list.
No. 2: I can't see how that would be possible - surely you can only assign one value to a variable?
Why do you want to associate each
item in a list with a variable? You
cannot tell the number of list
entries beforehand thus you do not
know the exact number of variables
to use.
You can use tuple: x = ([list1],
[list2])
You should write assignment vice-a-versa:
for i in list_1:
if i[:][0] == 'm':
a = i[2]
b = i[3]
c = i[4]
do you want:
a, b, c = i[2:5]
if I understand well, you have a list of lists, which can have length 2 or 1 (when the variable name is not known)
you would probably want to use a dict to store the lines
yet to mention i[:][0] means something different you wanted, it's the same as i[0] (i[:] would be a copy of list i)
list_1 = [['line 1', 'a'], ['line 2','b'], ['line 3']]
d = {}
for i in list_1:
if len(i) != 2:
continue
key = i[1]
value = i[0]
d[key] = value
then for a, you would use d[a]
if you eventually want to convert them to variables, you can call locals().update(d)
Related
I need to check whether a given list is equal to the result of substituting some lists for some elements in another list. Concretely, I have a dictionary, say f = {'o': ['a', 'b'], 'l': ['z'], 'x': ['y']} and a list list1 = ['H', 'e', 'l', 'l', 'o'], so I want to check if some list2 is equal to ['H', 'e', 'z', 'z', 'a', 'b'].
Below, I first write a function apply to compute the image of list1 under f. Then, it suffices to write list2 == apply(list1, f). Since this function will be called thousands of times in my program, I need to make it very fast. Therefore, I thought of the second function below, which should be faster but turns out not to be. So my questions (detailed below) are: why? And: is there a faster method?
First function:
def apply(l, f):
result = []
for x in l:
if x in f:
result.extend(f[x])
else:
result.append(x)
return result
Second function:
def apply_equal(list1, f, list2):
i = 0
for x in list1:
if x in f:
sublist = f[x]
length = len(sublist)
if list2[i:i + length] != substmt:
return False
i += length
else:
if list2[i] != x:
return False
i += 1
return i == len(list2)
I thought the second method would be faster since it does not construct the list which is the image of the first list by the function and then checks equality with the second list. On the contrary, it checks equality "on the fly" without constructing a new list. So I was surprised to see that it is not faster (and even: a bit slower). For the record: list1, list2, and the lists which are values in the dictionary are all small (typically under 50 elements), as well as the number of keys of the dictionary.
So my questions are: why isn't the second method faster ? And: are there ways to do this faster (possibly using other data structures)?
Edit in response to the comments: list1 and list2 will most often be different, but f may be common to some of them. Typically, there could be around 100,000 checks in batches of around 50 consecutive checks with a common f. The elements of the lists are short strings. It is expected that all checks return True (so the whole lists have to be iterated over).
Without proper data for benchmarking it's hard to say, so I tested various solutions with various "sizes" of data.
Replacing result.extend(f[x]) with result += f[x] always made it faster.
This was faster for longer lists (using itertools.chain):
list2 == list(chain.from_iterable(map(f.get, list1, zip(list1))))
If the data allows it, explicitly storing all possible keys and always accessing with f[x] would speed it up. That is, set f[k] = k, for all "missing" keys in advance, so you don't have to check with in or use get.
You need to use profiling tools like scalene to see what's slow in your code, don't try to guess.
In case you want to read it, I was able to produce an even slower version based on your idea of stoping as soon as possible, but while keeping the first readable apply implementation:
def apply(l, f):
for x in l:
if x in f:
yield from f[x]
else:
yield x
def apply_equal(l1, f, l2):
return all(left == right for left, right in zip(apply(l1, f), l2, strict=True))
Beware it needs Python 3.10 for zip's strict=True.
As the comments told, speed highly depends on your data here, constructing the whole list may look faster on small datasets, but halting soon may be faster on a bigger list.
I would like to create a parallel list from my original list and then use sort_together
original = ['4,d', '3,b']
parallel list should create 2 lists like this:
lis1 = ['4', '3']
list2 = ['d', 'b']
I've tried using split but was only able to obtain a single list :(
[i.split(",", 1) for i in original]
You can use the zip(*...) trick together with .split:
list1, list2 = zip(*(x.split(",") for x in original))
Now this actually gives you two tuples instead of lists but that should be easy to fix if you really need lists.
You can use map and zip:
lis1, list2 = zip(*map(lambda x:x.split(","), original))
map will apply the function, passed as first argument (in this case, it simply splits strings on the comma separator) to every element of the iterable (list in this case) passed as second argument. After this, you'll have a map object which contains ['4', 'd'] and ['3', 'b']
the zip operator takes two (or more) lists and puts them side by side (like a physical zip would do), creating a lists for elements next to each other. For example, list(zip([1,2,3],[4,5,6])) is [[1.4],[2,5],[3.6]].
The unpacking * is necessary given that you want to pass the two sublists in the returned map object.
ini_list = [[4,'d'], [3,'b']]
print ("initial list", str(ini_list))
res1, res2 = map(list, zip(*ini_list))
print("final lists", res1, "\n", res2)
you can use this code for sort and get two list
list1 , list2 = list(sorted(t) for t in (zip(*(item.split(',') for item in orignal))))
for i in range(2):
s[i] = sorted([oi.split(',')[i] for oi in o])
I have a list
current_list = [#,'1','2','3','4','5','6','7','8','9']
I want to create a new list by indexing current_list such that
new_list = ['1','5','9']
I have tried
new_list = current_list[1] + current_list[5] + current_list[9]
but I get
>>> 159
and not
>>> ['1','5','9']
How do I create new_list from current_list such that
new_list = ['1','5','9'] ?
New to programming and appreciate your patience.
you are adding list items by using + sign . Try:
new_list = [current_list[1] , current_list[5] , current_list[9]]
your list must contain at least 10 item otherwise you will get index out of bound error
You can do this if you want your result. For the new list the elements needs to be arranged as list.'+' sign is used esp. in strings (concationation) or simple addition process. So,
current_list = ['#','1','2','3','4','5','6','7','8','9']
new_list=[current_list[1],current_list[5],current_list[9]]
Instead of hard-coding the indexes (e.g. [current_list[1],current_list[5],current_list[9]]), I would recommend programatically inserting the indexes so that it is easy to modify in the future, or you can easily generate the indexes you want from a function
indexes = [1, 5, 9]
current_list = ['#','1','2','3','4','5','6','7','8','9']
new_list = [current_list[i] for i in indexes]
## gives ['1','5','9']
Now, if you need to change the indexes, you can just modify the indexes line.
Or, if down the road a user needs to specify the indexes from a file, you can read those numbers from a file. Either way, the way you generate new_list from current_list stays the same. (As a new programmer, it is important that you learn early the importance of writing code so that it is easy to modify in the future.)
from operator import itemgetter
def make_list_from(*indices, lst):
# Create a function that will get values from given indices
values = itemgetter(*indices)
# Get those values as a tuple and convert them into a list
return list(values(lst))
current_list = ['0', '1','2','3','4','5','6','7','8','9']
print(make_list_from(1, 5, 9, lst=current_list))
# ['1', '5', '9']
you can use itemgetter:
from operator import itemgetter
my_indices = [1, 5, 9]
new_list = list(itemgetter(*my_indices)(current_list))
or you can pick the elements by your indeces using list comprehension:
new_list = [current_list[i] for i in my_indices]
similar with:
new_list = []
for index in my_indices:
new_list.append(current_list[index])
print(new_list)
output:
['1', '5', '9']
I have 2 lists. One is a list of words and their frequencies and the other is a list of words.
a = [('country',3478), ('island',2900),('river',5)]
b = ['river','mountain','bank']
There are thousands of entries in a but only hundreds in b.
How can I subset list a so that i return:
c=[('river',5)]
For loops would take too long given the number of entries and i imagine list comprehension is the solution but cannot get it right.
My main goal is to then create a wordcloud with my final list. Any help would be appreciated
**Edited because I made a mistake as pointed out by some commenters. I want to return
c=[('river',5)]
instead of
c=['river',5]
as i originally wrote. Apologies and thanks for pointing it out
I assume you actually want:
c = [('river',5)] # a list with one tuple
You better first construct a set of values in b:
bd = set(b)
then you can use list comprehension:
c = [(x,y) for x,y in a if x in bd]
That being said, if you want to lookup the frequency of a word, I advice you not to construct a list of tuples, but a dictionary. You can do this with dictionary comprehension:
c = {x: y for x,y in a if x in bd} # dictionary variant
You can try this:
a = [('country',3478), ('island',2900),('river',5)]
b = ['river','mountain','bank']
final_data = list([i for i in a if i[0] in b][0])
Output:
['river', 5]
For this problem I am dealing with a big list,that it was imported from a CSV file, but let's say
I have a list like this:
[['name','score1','score2''score3''score4']
['Mike','5','1','6','2']
['Mike','1','1','1','1']
['Mike','3','0','3','0']
['jose','0','1','2','3']
['jose','2','3','4','5']
['lisa','4','4','4','4']]
and I want to have another list with this form(the sum of all score for each student):
[['Mike','9','2','10','3']
['jose','2','4','6','8']
['lisa','4','4','4','4']]
any ideas how this can be done?
I've been trying many ways, and I could not make it.
I was stuck when there where more than 2 same names, my solution only kept the last 2 lines to add.
I am new in python and programming in general.
If you are just learning Python I always recommend try to implement things without relying on external libraries. A good starting step is to start by trying to break the problem up into smaller components:
Remove the first entry (the column titles) from the input list. You don't need it for your result.
For each remaining entry:
Convert every entry except the first to an integer (so you can add them).
Determine if you have already encountered an entry with the same name (first column value). If not: add the entry to the output list. Otherwise: merge the entry with the one already in the output list (by adding values in the columns).
One possible implementation follows (untested):
input_list = [['name','score1','score2''score3''score4'],
['Mike','5','1','6','2'],
['Mike','1','1','1','1'],
['Mike','3','0','3','0'],
['jose','0','1','2','3'],
['jose','2','3','4','5'],
['lisa','4','4','4','4']]
print input_list
# Remove the first element
input_list = input_list[1:]
# Initialize an empty output list
output_list = []
# Iterate through each entry in the input
for val in input_list:
# Determine if key is already in output list
for ent in output_list:
if ent[0] == val[0]:
# The value is already in the output list (so merge them)
for i in range(1, len(ent)):
# We convert to int and back to str
# This could be done elsewhere (or not at all...)
ent[i] = str(int(ent[i]) + int(val[i]))
break
else:
# The value wasn't in the output list (so add it)
# This is a useful feature of the for loop, the following
# is only executed if the break command wasn't reached above
output_list.append(val)
#print input_list
print output_list
The above is not as efficient as using a dictionary or importing a library that can perform the same operation in a couple of lines, however it demonstrates a few features of the language. Be careful when working with lists though, the above modifies the input list (try un-commenting the print statement for the input list at the end).
Let us say you have
In [45]: temp
Out[45]:
[['Mike', '5', '1', '6', '2'],
['Mike', '1', '1', '1', '1'],
['Mike', '3', '0', '3', '0'],
['jose', '0', '1', '2', '3'],
['jose', '2', '3', '4', '5'],
['lisa', '4', '4', '4', '4']]
Then, you can use Pandas ...
import pandas as pd
temp = pd.DataFrame(temp)
def test(m):
try: return int(m)
except: return m
temp = temp.applymap(test)
print temp.groupby(0).agg(sum)
If you are importing it from a cvs file, you can directly read the file using pd.read_csv
You could use better solution as suggested but if you'd like to implement yourself and learn, you can follow and I will explain in comments:
# utilities for iteration. groupby makes groups from a collection
from itertools import groupby
# implementation of common, simple operations such as
# multiplication, getting an item from a list
from operator import itemgetter
def my_sum(groups):
return [
ls[0] if i == 0 else str(sum(map(int, ls))) # keep first one since it's name, sum otherwise
for i, ls in enumerate(zip(*groups)) # transpose elements and give number to each
]
# list comprehension to make a list from another list
# group lists according to first element and apply our function on grouped elements
# groupby reveals group key and elements but key isn't needed so it's set to underscore
result = [my_sum(g) for _, g in groupby(ls, key=itemgetter(0))]
To understand this code, you need to know about list comprehension, * operator, (int, enumerate, map, str, zip) built-ins and some handy modules, itertools and operator.
You edited to add header which will break our code so we need to remove it such that we need to pass ls[1:] to groupby instead of ls. Hope it helps.
As a beginner I would consider turning your data into a simpler structure like a dictionary, so that you are just summing a list of list. Assuming you get rid of the header row then you can turn this into a dictionary:
>>> data_dict = {}
>>> for row in data:
... data_dict.setdefault(row[0], []).append([int(i) for i in row[1:]])
>>> data_dict
{'Mike': [[5, 1, 6, 2], [1, 1, 1, 1], [3, 0, 3, 0]],
'jose': [[0, 1, 2, 3], [2, 3, 4, 5]],
'lisa': [[4, 4, 4, 4]]}
Now it should be relatively easy to loop over the dict and sum up the lists (you may want to look a sum and zip as a way to do that.
This is well suited for collections.Counter
from collections import Counter, defaultdict
csvdata = [['name','score1','score2','score3','score4'],
['Mike','5','1','6','2'],
['Mike','1','1','1','1'],
['Mike','3','0','3','0'],
['jose','0','1','2','3'],
['jose','2','3','4','5'],
['lisa','4','4','4','4']]
student_scores = defaultdict(Counter)
score_titles = csvdata[0][1:]
for row in csvdata[1:]:
student = row[0]
scores = dict(zip(score_titles, map(int, row[1:])))
student_scores[student] += Counter(scores)
print(student_scores["Mike"])
# >>> Counter({'score3':10, 'score1':9, 'score4':3, 'score2':2})
collections.defaultdict