Related
I'm quite new to Python and I would like some help. I'm currently trying to store information from the first line of a txt file in a tuple and I am having some trouble. The txt file's second line is:
Water: 0 0 4 2 1 3
I want to store the numbers only so my current code is:
water = []
with open(file_name) as f:
lines = f.readlines()
water_values = lines[1].strip()
splitted = water_values.split(" ")
splitted.remove("Water:")
water.append(splitted)
However, when I call for water[1], expecting to receive 0, I find that the index is out of range and that the len(water) is only 1. When I print it, it says:
[['0', '0', '4', '2', '1', '3']]
How can I change it so that I can call for each element?
When you call water.append(splitted) you are adding a new element to the end of the list and splitted is a list so you get a list of lists.
If you want to combine two lists, you should instead call water += splitted. The += operator means that you are adding to the left side value, what ever is on the right side and is analogous to water = water + splitted.
You should use .extend rather than .append, i.e. instead of
water.append(splitted)
do
water.extend(splitted)
Simple example to show difference:
a = []
b = []
a.append([1,2,3])
b.extend([1,2,3])
print(a)
print(b)
output:
[[1, 2, 3]]
[1, 2, 3]
If you know to want more about handling lists in python read More on Lists in docs
your code water.append(splitted) just adds splitted (which is a list) as a first element of water list. To add values from splitted you could just do following:
water += splitted
instead of
water.append(splitted)
Doing so - you will get water = ['0', '0', '4', '2', '1', '3'].
You can read more here How do I concatenate two lists in Python?
I'm currently working on a program that takes a group of lists from a csv file, and groups them together. The program I came up with is:
List_one = []
with open("trees.csv") as f:
skiplines = f.readline()
for line in f:
res = line.split(" ")
List_one.append(res)
for i in List_one:
(i[0]) = (i[0]).rstrip("\n")
print (List_one)
What I get now are a group of lists, but the problem is that these lists are strings and I want them as floats. The lists look like this:
[['1,8.3,70,10.3'], ['2,8.6,65,10.3'], ['3,8.8,63,10.2'], ['4,10.5,72,16.4'], ['5,10.7,81,18.8'], ['6,10.8,83,19.7'], ['7,11.0,66,15.6'], ['8,11.0,75,18.2'], ['9,11.1,80,22.6'], ['10,11.2,75,19.9'], ['11,11.3,79,24.2'], ['12,11.4,76,21.0'], ['13,11.4,76,21.4'], ['14,11.7,69,21.3'], ['15,12.0,75,19.1'], ['16,12.9,74,22.2'], ['17,12.9,85,33.8'], ['18,13.3,86,27.4'], ['19,13.7,71,25.7'], ['20,13.8,64,24.9'], ['21,14.0,78,34.5'], ['22,14.2,80,31.7'], ['23,14.5,74,36.3'], ['24,16.0,72,38.3'], ['25,16.3,77,42.6'], ['26,17.3,81,55.4'], ['27,17.5,82,55.7'], ['28,17.9,80,58.3'], ['29,18.0,80,51.5'], ['30,18.0,80,51.0'], ['31,20.6,87,77.0']]
As you guys can see I also can't use float() on list one either, because the list is a whole string on its own. Is there a way I can split the lists by indexing so I get:
['1', '8.3', '70', '10.3'].....
Any help is welcome.
"line.split(',')" split the string with "," and returns list.
for string '1,8.3,70,10.3' it will return [1, 8.3, 70, 10.3]
You can split the strings by the commas if you want. You should probably do everything before you append them to List_one though.
res = [float(x) for x in line.split(" ")[0].split(",")]
List_one.append(res)
Does this work how you want it to? Sorry I'm not sure what format the input is in so I'm kind of guessing
You could say:
res = line.split(" ")
# map takes a function as the first arg and a list as the second
list_of_floats = list(map(lambda n: float(n), res.split(",")))
# then you can
List_one.append(list_of_floats)
Which will still give you a nested list because you are pushing a list during each iteration of for line in f:, but each list would at least be floats as you've specified.
If you wanted to just get one flat list of floats instead of doing the initial line.split(' ') you could use regex to split the line read from the csv:
import re # at the top of your file
res = re.split(r'[\s\,]', line)
list_of_floats = list(map(lambda n: float(n), res))
List_one.append(list_of_floats)
This might help:
l =[['1,8.3,70,10.3'], ['2,8.6,65,10.3'], ['3,8.8,63,10.2'], ['4,10.5,72,16.4']]
l2 =[]
for x in l:
a =x[0].split(",")
l2.append(a)
print(l2)
Enjoy!
For this problem I am dealing with a big list,that it was imported from a CSV file, but let's say
I have a list like this:
[['name','score1','score2''score3''score4']
['Mike','5','1','6','2']
['Mike','1','1','1','1']
['Mike','3','0','3','0']
['jose','0','1','2','3']
['jose','2','3','4','5']
['lisa','4','4','4','4']]
and I want to have another list with this form(the sum of all score for each student):
[['Mike','9','2','10','3']
['jose','2','4','6','8']
['lisa','4','4','4','4']]
any ideas how this can be done?
I've been trying many ways, and I could not make it.
I was stuck when there where more than 2 same names, my solution only kept the last 2 lines to add.
I am new in python and programming in general.
If you are just learning Python I always recommend try to implement things without relying on external libraries. A good starting step is to start by trying to break the problem up into smaller components:
Remove the first entry (the column titles) from the input list. You don't need it for your result.
For each remaining entry:
Convert every entry except the first to an integer (so you can add them).
Determine if you have already encountered an entry with the same name (first column value). If not: add the entry to the output list. Otherwise: merge the entry with the one already in the output list (by adding values in the columns).
One possible implementation follows (untested):
input_list = [['name','score1','score2''score3''score4'],
['Mike','5','1','6','2'],
['Mike','1','1','1','1'],
['Mike','3','0','3','0'],
['jose','0','1','2','3'],
['jose','2','3','4','5'],
['lisa','4','4','4','4']]
print input_list
# Remove the first element
input_list = input_list[1:]
# Initialize an empty output list
output_list = []
# Iterate through each entry in the input
for val in input_list:
# Determine if key is already in output list
for ent in output_list:
if ent[0] == val[0]:
# The value is already in the output list (so merge them)
for i in range(1, len(ent)):
# We convert to int and back to str
# This could be done elsewhere (or not at all...)
ent[i] = str(int(ent[i]) + int(val[i]))
break
else:
# The value wasn't in the output list (so add it)
# This is a useful feature of the for loop, the following
# is only executed if the break command wasn't reached above
output_list.append(val)
#print input_list
print output_list
The above is not as efficient as using a dictionary or importing a library that can perform the same operation in a couple of lines, however it demonstrates a few features of the language. Be careful when working with lists though, the above modifies the input list (try un-commenting the print statement for the input list at the end).
Let us say you have
In [45]: temp
Out[45]:
[['Mike', '5', '1', '6', '2'],
['Mike', '1', '1', '1', '1'],
['Mike', '3', '0', '3', '0'],
['jose', '0', '1', '2', '3'],
['jose', '2', '3', '4', '5'],
['lisa', '4', '4', '4', '4']]
Then, you can use Pandas ...
import pandas as pd
temp = pd.DataFrame(temp)
def test(m):
try: return int(m)
except: return m
temp = temp.applymap(test)
print temp.groupby(0).agg(sum)
If you are importing it from a cvs file, you can directly read the file using pd.read_csv
You could use better solution as suggested but if you'd like to implement yourself and learn, you can follow and I will explain in comments:
# utilities for iteration. groupby makes groups from a collection
from itertools import groupby
# implementation of common, simple operations such as
# multiplication, getting an item from a list
from operator import itemgetter
def my_sum(groups):
return [
ls[0] if i == 0 else str(sum(map(int, ls))) # keep first one since it's name, sum otherwise
for i, ls in enumerate(zip(*groups)) # transpose elements and give number to each
]
# list comprehension to make a list from another list
# group lists according to first element and apply our function on grouped elements
# groupby reveals group key and elements but key isn't needed so it's set to underscore
result = [my_sum(g) for _, g in groupby(ls, key=itemgetter(0))]
To understand this code, you need to know about list comprehension, * operator, (int, enumerate, map, str, zip) built-ins and some handy modules, itertools and operator.
You edited to add header which will break our code so we need to remove it such that we need to pass ls[1:] to groupby instead of ls. Hope it helps.
As a beginner I would consider turning your data into a simpler structure like a dictionary, so that you are just summing a list of list. Assuming you get rid of the header row then you can turn this into a dictionary:
>>> data_dict = {}
>>> for row in data:
... data_dict.setdefault(row[0], []).append([int(i) for i in row[1:]])
>>> data_dict
{'Mike': [[5, 1, 6, 2], [1, 1, 1, 1], [3, 0, 3, 0]],
'jose': [[0, 1, 2, 3], [2, 3, 4, 5]],
'lisa': [[4, 4, 4, 4]]}
Now it should be relatively easy to loop over the dict and sum up the lists (you may want to look a sum and zip as a way to do that.
This is well suited for collections.Counter
from collections import Counter, defaultdict
csvdata = [['name','score1','score2','score3','score4'],
['Mike','5','1','6','2'],
['Mike','1','1','1','1'],
['Mike','3','0','3','0'],
['jose','0','1','2','3'],
['jose','2','3','4','5'],
['lisa','4','4','4','4']]
student_scores = defaultdict(Counter)
score_titles = csvdata[0][1:]
for row in csvdata[1:]:
student = row[0]
scores = dict(zip(score_titles, map(int, row[1:])))
student_scores[student] += Counter(scores)
print(student_scores["Mike"])
# >>> Counter({'score3':10, 'score1':9, 'score4':3, 'score2':2})
collections.defaultdict
I apologise for the complexity of this question, but it is a massive challenge for me being very new to Python:
I have an external file that stores lines of text: input.txt
min: 1,2,3,5,6
max: 1,2,3,5,6
avg: 1,2,3,5,6
I read the content of the file into various lists in a new variable called input_data like this:
input_data = []
with open('input.txt') as inputfile:
for line in inputfile:
input_data.append(line.strip().split(','))
The result for input_data is as follows:
[['min: 1', '2', '3', '5', '6'], ['max: 1', '2', '3', '5', '6'], ['avg: 1', '2', '3', '5', '6']]
So I have one variable with 3 lists stored in it.
How do I remove the ":" after 'min', 'max' and 'avg'?
I have tried:
input_data = input_data.replace(":",",")
Also, how do I keep min, max and avg as strings, but change the numbers in the lists to integers? eg.
['min', 1, 2, 3, 5, 6]
'min' string and all numbers integers
Just split on the colon then map the rest to int after splitting on a comma:
with open("in.txt") as f:
for line in f:
a, rest = line.split(":",1)
print([a] + map(int,rest.split(",")))
Output:
['min', 1, 2, 3, 5, 6]
['max', 1, 2, 3, 5, 6]
['avg', 1, 2, 3, 5, 6]
To start with, I'd suggest split it differently. To keep the word and first value separated, convert the space to a comma so it'll split correctly. In this case, you could probably convert ": " to a comma so it'll automatically remove the colon.
input_data = line.strip().replace(': ', ',').split(',')
Then to convert all necessary values to integers, you could do it in loads of ways, but here's two examples:
input_data = [input_data[0]] + [int(i) for i in input_data[1:]]
input_data = [int(i) for i if i.isdigit() else i for i in input_data]
Alternatively if you didn't do the bit at the start and still have a colon, this is how you could get rid of it with a tweak to one of the above methods:
input_data = [int(i) for i if i.isdigit() else i.replace(':', '') for i in input_data]
And finally, this should hopefully work with your code:
input_data = []
with open('input.txt') as inputfile:
for line in inputfile:
input_data.append([int(i) for i if i.isdigit() else i for i in line.strip().split(',')])
Or if needed, a slightly shorter version:
with open('input.txt') as inputfile:
input_data = [[int(i) for i if i.isdigit() else i for i in line.strip().split(',')] for line in inputfile]
Symmitchry's answer is probably a bit better though, splitting it into the two seconds didn't cross my mind.
input_data = []
with open('input.txt') as inputfile:
for line in inputfile:
row = []
sections = line.strip().split(':') # First split out the title
kind = sections[0]
row.append(kind)
data = sections[1].split(',')
for entry in data:
row.append(int(entry)) # Use int to convert to integer
input_data.append(row)
Try that. First I just split the line using the colon :. The first part is the headers ('min', 'max' and 'avg'), which I add to my new 'row' of output data.
Then I split the second part (the numbers) the exact same way that you did. I then used the built in function int to convert string numbers into actual integer values.
I made the code very explicit so you should be able to understand every line!
If you actually wanted to make a list comprehension, the (very ugly) direct translation of my code above looks like this:
with open('input.txt') as f:
result = [[line.split(':')[0]] + [int(x) for x in line.split(':')[1].split(',')] for line in f]
I have a file format like this:
9 8 1
3 4 1
...
...
Now, I want to get each line as three integers.
When I used
for line in f.readlines():
print line.split(" ")
The script printed this:
['9', '8', '1\r\n']
['3', '4', '1\r\n']
...
...
How can I get each line as three integers?
Using the code you have and addressing your specific question of how to convert your list to integers:
You can iterate through each line and convert the strings to int with the following example using list comprehension:
Given:
line =['3', '4', '1\r\n']
then:
int_list = [int(i) for i in line]
will yield a list of integers
[3, 4, 1]
that you can then access via subscripts (0 to 2). e.g.
int_list[0] contains 3,
int_list[1] contains 4,
etc.
A more streamlined version for your consideration:
with open('data.txt') as f:
for line in f:
int_list = [int(i) for i in line.split()]
print int_list
The advantage of using with is that it will automatically close your file for you when you are done, or if you encounter an exception.
UPDATE:
Based on your comments below, if you want the numbers in 3 different variables, say a, b and c, you can do the following:
for line in f:
a, b, c = [int(i) for i in line.split()]
print 'a = %d, b = %d, c = %d\n' %(a, b, c)
and get this:
a = 9, b = 8, c = 1
This counts on there being 3 numbers on each line.
Aside:
Note that in place of "list comprehension" (LC) you can also use a "generator expression" (GE) of this form:
a, b, c = (int(i) for i in line.split())
for your particular problem with 3 integers this doesn't make much difference, but I show it for completeness. For larger problems, LC requires more memory as it generates a complete list in memory at once, while GE generate a value one by one as needed. This SO question Generator Expressions vs. List Comprehension will give you more information if you are curious.
with open("myfile.txt") as f:
for line in f:
int_list = [int(x) for x in line.split()]
You don't say what you want to do with the list of integers, there may be a better way to iterate over them, depending.
If you "need the values as three different variables," then"
a, b, c = int_list
though you could also use:
int_list[0]
int_list[1]
int_list[2]
as desired.
line.strip().split(" ")
would do.
more complete, with all lines still intact in one large string:
data = f.read().strip() # loose final \n
[ int(x.split(" ")) for x in data.split('\n')]
would give you a list with answers you want for each line.
If you wanna store the integers in three variables :
with open('data1.txt') as f:
for line in f:
a,b,c=(int(x) for x in line.split())
print a,b,c
output:
9 8 1
3 4 1
This block of code should solve your problem:
f = open(filepath)
for line in f:
intList = map(int, line.strip().split())
print intList
f.close()