Convert first 2 elements of tuple to integer - python

I have a csv file with the following structure:
1234,5678,"text1"
983453,2141235,"text2"
I need to convert each line to a tuple and create a list. Here is what I did
with open('myfile.csv') as f1:
mytuples = [tuple(line.strip().split(',')) for line in f1.readlines()]
However, I want the first 2 columns to be integers, not strings. I was not able to figure out how to continue with this, except by reading the file line by line once again and parsing it. Can I add something to the code above so that I transform str to int as I convert the file to list of tuples?

This is a csv file. Treat it as such.
import csv
with open("test.csv") as csvfile:
reader = csv.reader(csvfile)
result = [(int(a), int(b), c) for a,b,c in reader]
If there's a chance your input may not be what you think it is:
import csv
with open('test.csv') as csvfile:
reader = csv.reader(csvfile)
result = []
for line in reader:
this_line = []
for col in line:
try:
col = int(col)
except ValueError:
pass
this_line.append(col)
result.append(tuple(this_line))

Instead of trying to cram all of the logic in a single line, just spread it out so that it is readable.
with open('myfile.csv') as f1:
mytuples = []
for line in f1:
tokens = line.strip().split(',')
mytuples.append( (int(tokens[0]), int(tokens[1]), tokens[2]) )
Real python programmers aren't afraid of using multiple lines.

You can use isdigit() to check if all letters within element in row is digit so convert it to int , so replace the following :
tuple(line.strip().split(','))
with :
tuple(int(i) if i.isdigit() else i for i in (line.strip().split(','))

You can cram this all into one line if you really want, but god help me I don't know why you'd want to. Try giving yourself room to breathe:
def get_tuple(token_list):
return (int(token_list[0]), int(token_list[1]), token_list[2])
mytuples = []
with open('myfile.csv') as f1:
for line in f1.readlines():
token_list = line.strip().split(',')
mytuples.append(get_tuple(token_list))
Isn't that way easier to read? I like list comprehension as much as the next guy, but I also like knowing what a block of code does when I sit down three weeks later and start reading it!

Related

Python3 - list index out of range - extracting data from file

I want to extract data from a file and change the value of an entry with a 'for-loop'.
f = open(r"C:\Users\Measurement\LOGGNSS.txt", "r")
x=0
content = [[],[]]
for line in f:
actualline = line.strip()
content.append(actualline.split(","))
x+=1
f.close
print(x)
for z in range(x):
print(z)
print(content[z][1])
IndexError: list index out of range
Using a real value instead of the variable 'z' works fine. But I need to change all first entries in the whole 2D-Array.
Why it does not work?
Your code has several problems.
First of all, use the with statement to open/close files correctly.
Then, you don't need to use a variable like x to keep track of the number of lines, just use enumerate() instead!
Here is how I would refactor your code to make it slimmer and more readable.
input_file = r"C:\Users\Measurement\LOGGNSS.txt"
content = []
with open(input_file, 'r') as f:
for line in f:
clean_line = line.strip().split(",")
content.append(clean_line)
for z, data in enumerate(content):
print(z,'\n',data)
Note that you could print the content while reading the file in one single loop.
with open(input_file, 'r') as f:
for z, line in enumerate(f):
clean_line = line.strip().split(",")
content.append(clean_line)
print(z,'\n', clean_line)
Finally, if you are dealing with a plain and simple csv file, then use the csv module from the standard library.
import csv
with open(input_file, 'r') as f:
content = csv.reader(f, delimiter=',')
You initialize your content with two empty arrays, so both of these will fail to find the first index ([1]), just initialize it with an empty array
content = []

Nested lists in python containing a single string and not single letters

I need to load text from a file which contains several lines, each line contains letters separated by coma, into a 2-dimensional list. When I run this, I get a 2 dimensional list, but the nested lists contain single strings instead of separated values, and I can not iterate over them. how do I solve this?
def read_matrix_file(filename):
matrix = []
with open(filename, 'r') as matrix_letters:
for line in matrix_letters:
line = line.split()
matrix.append(line)
return matrix
result:
[['a,p,p,l,e'], ['a,g,o,d,o'], ['n,n,e,r,t'], ['g,a,T,A,C'], ['m,i,c,s,r'], ['P,o,P,o,P']]
I need each letter in the nested lists to be a single string so I can use them.
thanks in advance
split() function splits on white space by default. You can fix this by passing the string you want to split on. In this case, that would be a comma. The code below should work.
def read_matrix_file(filename):
matrix = []
with open(filename, 'r') as matrix_letters:
for line in matrix_letters:
line = line.split(',')
matrix.append(line)
return matrix
The input format you described conforms to CSV format. Python has a library just for reading CSV files. If you just want to get the job done, you can use this library to do the work for you. Here's an example:
Input(test.csv):
a,string,here
more,strings,here
Code:
>>> import csv
>>> lines = []
>>> with open('test.csv') as file:
... reader = csv.reader(file)
... for row in reader:
... lines.append(row)
...
>>>
Output:
>>> lines
[['a', 'string', 'here'], ['more', 'strings', 'here']]
Using the strip() function will get rid of the new line character as well:
def read_matrix_file(filename):
matrix = []
with open(filename, 'r') as matrix_letters:
for line in matrix_letters:
line = line.split(',')
line[-1] = line[-1].strip()
matrix.append(line)
return matrix

How to append from file into list in Python?

I have a sample file called 'scores.txt' which holds the following values:
10,0,6,3,7,4
I want to be able to somehow take each value from the line, and append it to a list so that it becomes sampleList = [10,0,6,3,7,4].
I have tried doing this using the following code below,
score_list = []
opener = open('scores.txt','r')
for i in opener:
score_list.append(i)
print (score_list)
which partially works, but for some reason, it doesn't do it properly. It just sticks all the values into one index instead of separate indexes. How can I make it so all the values get put into their own separate index?
You have CSV data (comma separated). Easiest is to use the csv module:
import csv
all_values = []
with open('scores.txt', newline='') as infile:
reader = csv.reader(infile)
for row in reader:
all_values.extend(row)
Otherwise, split the values. Each line you read is a string with the ',' character between the digits:
all_values = []
with open('scores.txt', newline='') as infile:
for line in infile:
all_values.extend(line.strip().split(','))
Either way, all_values ends up with a list of strings. If all your values are only consisting of digits, you could convert these to integers:
all_values.extend(map(int, row))
or
all_values.extend(map(int, line.strip().split(',')))
That is an efficient way how to do that without using any external package:
with open('tmp.txt','r') as f:
score_list = f.readline().rstrip().split(",")
# Convert to list of int
score_list = [int(v) for v in score_list]
print score_list
Just use split on comma on each line and add the returned list to your score_list, like below:
opener = open('scores.txt','r')
score_list = []
for line in opener:
score_list.extend(map(int,line.rstrip().split(',')))
print( score_list )

FIlter a csv file with a list of search terms

I feel like I'm missing something. I have a csv file, and a list of search terms. I just want all rows of data in the csv file that meet a certain condition to be returned.
import csv
search = open("example.txt", "rb")
searchlist = []
for x in search:
searchlist.append(x)
with open("test.csv", "rb") as f:
reader = csv.reader(f)
rows = [row for row in reader]
I create two lists, one containing the search terms, the other containing every row of data of the csv file in a list. I've tried for looping through them both but I feel like this isn't right:
for row in rows:
for z in searchlist:
if z not in row:
print row
Feeling pretty stuck as to how to compare one list to another. If there's an easy way to do this/more pythonic way of writing it, I'd much appreciate an example, as well as why the code above doesn't work.
EDIT:
Okay all sorted thanks to all who inputted, adding finished code for reference:
import re
searchlist = []
with open("example.txt") as g:
for line in g:
searchlist.append(line.strip())
pattern = re.compile("|".join(searchlist))
with open("test.csv") as f:
for line in f:
if re.search(pattern,line):
print line
#line = line.split(",")
#print line[5]
I would do this:
import re
...
pattern = re.compile("|".join(searchlist))
with open("your_file") as f:
for line in f:
if not re.search(pattern, line):
print(line)
Since it worked for you, I'm adding it as an answer so you can mark it:
for row in rows:
for z in searchlist:
if z not in row:
print row
break
There are multiple ways to do "fuzzier" matches. It just depends on what you're going for.

TypeError in for loop

I'm having trouble with some code, where I have a text file with 633,986 tuples, each with 3 values (example: the first line is -0.70,0.34,1.05). I want to create an array where I take the magnitude of the 3 values in the tuple, so for elements a,b,c, I want magnitude = sqrt(a^2 + b^2 + c^2).
However, I'm getting an error in my code. Any advice?
import math
fname = '\\pathname\\GerrysTenHz.txt'
open(fname, 'r')
Magn1 = [];
for i in range(0, 633986):
Magn1[i] = math.sqrt((fname[i,0])^2 + (fname[i,1])^2 + (fname[i,2])^2)
TypeError: string indices must be integers, not tuple
You need to open the file properly (use the open file object and the csv module to parse the comma-separated values), read each row and convert the strings into float numbers, then apply the correct formula:
import math, csv
fname = '\\pathname\\GerrysTenHz.txt'
magn1 = []
with open(fname, 'rb') as inputfile:
reader = csv.reader(inputfile)
for row in reader:
magn1.append(math.sqrt(sum(float(c) ** 2 for c in row)))
which can be simplified with a list comprehension to:
import math, csv
fname = '\\pathname\\GerrysTenHz.txt'
with open(fname, 'rb') as inputfile:
reader = csv.reader(inputfile)
magn1 = [math.sqrt(sum(float(c) ** 2 for c in row)) for row in reader]
The with statement assigns the open file object to inputfile and makes sure it is closed again when the code block is done.
We add up the squares of the column values with sum(), which is fed a generator expression that converts each column to float() before squaring it.
You need to use the lines of the file and the csv module (as Martijn Pieters points out) to examine each value. This can be done with a list comprehension and with:
with open(fname) as f:
reader = csv.reader(f)
magn1 = [math.sqrt(sum(float(i)**2 for i in row)) for row in reader]
just make sure you import csv as well
To explain the issues your having (there are quite a few) I'll walk through a more drawn out way to do this.
you need to use what openreturns. open takes a string and returns a file object.
f = open(fname)
I'm assuming the range in your for loop is suppose to be the number of lines in the file. You can instead iterate over each line of the file one by one
for line in f:
Then to get the numbers on each line, use the str.split method of to split the line on the commas
x, y, z = line.split(',')
convert all three to floats so you can do math with them
x, y, z = float(x), float(y), float(z)
Then use the ** operator to raise to a power, and take the sqrt of the sum of the three numbers.
n = math.sqrt(x**2 + y**2 + z**2)
Finally use the append method to add to the back of the list
Magn1.append(n)
Let's look at fname. That's a string. So if you try to subscript it (i.e., fname[i, 0]), you should use an integer, and you'll get back the character at index i. Since you're using [i, 0] as the string indices, you're passing a tuple. That's no integer!
Really, you should be reading a line from the file, then doing things with that. So,
with(open(fname, 'r')) as f: # You're also opening the file and doing nothing with it
for line in f:
print('doing something with %s' % line)

Categories