convert csv string of numbers into values - python

I am trying to read a line of numbers in a csv file, then call on them individually to compute with them. Here is my code so far:
import sys
import os
import csv
import numpy as np
with open('data.csv') as csv_file:
csv_reader = csv.reader(csv_file)
for line in csv_reader:
x = np.array(line[3])
print(x)
Within this line of the csv file there are numbers with decimals like 4.65 that I need to use in the calculations. I tried things like:
print(5 + x[14]) but it won't work.
I assume I need to convert the line into a list of integers or something, please help.
Thank you in advance.

According to your example line you want to add delimiter=' ' to the csv.reader()
csv_data = csv.reader(csv_file, delimiter=' ')
Taking a guess at the structure of your csv, but under the assumption that it contains no headings and you want to keep the decimals:
with open('new_data.txt') as csv_file:
csv_data = csv.reader(csv_file, delimiter=' ')
for line in csv_data:
x = [float(n) for n in line.split(',')]
print(x)
This will fail if you have string values, such as 'A string'

Here is an alternative to #GiantsLoveDeathMetal's solution with map (also it shows a way to provide us a copy/paste-able piece of code containing a sample of your csv file with io.StringIO) :
EDITED the StringIO to contain data in columns and with empty rows
import csv
import io
f = io.StringIO("""
7.057
7.029
5.843
5.557
4.186
4.1
2.286""")
csv_reader = csv.reader(f, delimiter=' ')
for line in csv_reader:
line = list(map(float, filter(None, line)))
print(line)
In python 2 (and in some cases in python 3 if your code works with a generator, which is not the case in this example), you don't need to convert the result of map to a list.
So line = list(map(float, line)) can be replaced by the much cleaner map(float, line). Which can be considered cleaner and more explicit than a list comprehension (or a generator expression).
For instance this will work :
import csv
import io
f = io.StringIO("""7.057 7.029 5.843 5.557 4.186 4.1 2.286""")
csv_reader = csv.reader(f, delimiter=' ')
for line in csv_reader:
line = map(float, line)
print(sum(line))
# 36.05800000000001
If you're interested in the map vs list comprehension debate, here you go for more details.

Related

Converting CSV into Array in Python

I have a csv file like below. A small csv file and I have uploaded it here
I am trying to convert csv values into array.
My expectation output like
My solution
results = []
with open("Solutions10.csv") as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC) # change contents to floats
for row in reader: # each row is a list
results.append(row)
but I am getting a
ValueError: could not convert string to float: ' [1'
There is a problem with your CSV. It's just not csv (coma separated values). To do this you need some cleaning:
import re
# if you expect only integers
pattern = re.compile(r'\d+')
# if you expect floats (uncomment below)
# pattern = re.compile(r'\d+\.*\d*')
result = []
with open(filepath) as csvfile:
for row in csvfile:
result.append([
int(val.group(0))
# float(val.group(0))
for val in re.finditer(pattern, row)
])
print(result)
You can also solve this with substrings if it's easier for you and you know the format exactly.
Note: Also I see there is "eval" suggestion. Please, be careful with it as you can get into a lot of trouble if you scan unknown/not trusted files...
You can do this:
with open("Solutions10.csv") as csvfile:
result = [eval(k) for k in csvfile.readlines()]
Edit: Karl is cranky and wants you todo this:
with open("Solutions10.csv") as csvfile:
result = []
for line in csvfile.readlines():
line = line.replace("[","").replace("]","")
result.append([int(k) for k in line.split(",")]
But you're the programmer so you can do what you want. If you trust your input file eval is fine.

Store in array instead of printing

I am trying to create an array after I put a for loop through an if argument that is read from a csv file. In this code below, I print the results. Instead of printing the results, I would like to store them in an array. How do I do this?
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
if float(line['Rel Volume']) > 2.5:
print(line['tckr'],line['Rel Volume'])
Try it like this:
arr = []
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
if float(line['Rel Volume']) > 2.5:
arr.append(line['tckr'],line['Rel Volume'])
If you want a nested list of lists (where each inner list contains two elements), do this:
my_list = []
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
if float(line['Rel Volume']) > 2.5:
my_list.append([line['tckr'],line['Rel Volume']])
If you want a flat (one-dimensional) list, do this:
my_list = []
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
if float(line['Rel Volume']) > 2.5:
my_list.extend([line['tckr'],line['Rel Volume']])
The only difference between the two examples is that one uses extend() and the other uses append(). Note that in each case, we pass a single list to whichever method we choose, by putting square brackets around line['tckr'],line['Rel Volume'].
You can use list comprehension to do this! My example below will use a namedtuple, but you don't have to include that, it's not necessary.
from collections import namedtuple
CSVLine = namedtuple('CSVLine', ['tckr', 'Rel_Volume'])
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
csvfilter = [ CSVLine(line['tckr'], line['Rel Volume']) for line in csv_reader if line['Rel Volume'] > 2.5]
Using the namedtuple, your array data will look something like this:
CSVLine(tckr='abcd', Rel_Volume=3.2), CSVLine(tckr='efgh', Rel_Volume=3.0), CSVLine(tckr='ijkl', Rel_Volume=4.2)
Without the namedtuple, it will simply look like this:
with open('_Stocks.csv') as csv_file:
csv_reader = csv.DictReader(csv_file)
csvfilter = [(line['tckr'], line['Rel Volume']) for line in csv_reader if line['Rel Volume'] > 2.5 ]
I used a tuple in both examples, because I assumed you would want to pair the data from each line together, for later use.
Pandas module is great for munging:
# -*- coding: utf-8 -*-
"""
Created on Fri Mar 09 17:21:57 2018
#author: soyab
"""
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#%%
## Load as a Pandas Dataframe ; Select rows based on Column logic
new_df = pd.read_csv('_Stocks.csv')
df_over_two_five = new_df.loc[new_df['Rel Volume'] > 2.5].copy()
df_over_two_five
It's good to post a chuck of your data with a question like this. Then I can make sure to catch silly errors.

How to read just the first column of each row of a CSV file [duplicate]

This question already has answers here:
Read in the first column of a CSV in Python
(5 answers)
Closed 3 years ago.
How to read just the first column of each row of a CSV file in Python?
My data is something like this:
1 abc
2 bcd
3 cde
and I only need to loop trough the values of the first column.
Also, when I open the csv File in calc the data in each row is all in the same cell, is that normal?
import csv
with open(file) as f:
reader = csv.reader(f, delimiter="\t")
for i in reader:
print i[0]
OR
change the delimter to space if necessary.
reader = csv.reader(f, delimiter=" ")
without csv module,
import csv
with open(file) as f:
for line in f:
print line.split()[0]
You can use itertools.izip to crate a generator contains the columns and use next to get the first column.Its more efficient if you have a large data and you want to refuse of multi-time indexing!
import csv
from itertools import izip
with open('ex.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ')
print next(izip(*spamreader))
To get just the first column as a list:
with open('myFile.csv') as f:
firstColumn = [line.split(',')[0] for line in f]
for the second part of your question:
when opening csv-documents in LibreOffice Calc (openoffice should work the same way) I get a Dialog where I am asked a few things about that document, like charakter encoding and as well the type of separator. If you select "space", it should work. You have a preview at the bottom of this dialog.

Python and excel reading files problem

I am sorry if this is a silly question but I have been working on this for hours and I cannot make it work. Please help!
I have a .txt file that originated from Excel. The file contains strings and numbers but I am only interested in the numbers, which is why I skip the first line and I only read from column 2 on.
from numpy import *
I load it into Python doing
infile = open('europenewMatrix.txt','r')
infile.readline() # skip the first line
numbers = [line.split(',')[2:] for line in infile.readlines()]
infile.close()
because I need to do computations with this, I convert it into a matrix:
travelMat = array(numbers)
ok, but this didn't convert the strings into integers, so I manually do it:
for i in xrange(len(numbers)):
for j in xrange(len(numbers)):
travelMat[i,j] = int(self.travelMat[i,j])
#end for
At this point, I was hoping that all my entries would be integers
but if I do
print 'type is',type(self.travelMat[1,2])
the answer is:
type is <type 'numpy.string_'>
how can I really convert all my entries into integers?
thanks a lot!
convert the numbers as you read them, before creating the array:
infile = open('europenewMatrix.txt','r')
infile.readline() # skip the first line
numbers = []
for line in infile:
numbers.append([int(val) for val in line.split(',')[2:]])
infile.close()
travelMat = array(numbers)
If you're working with a csv or csv-like file, use the csv standard library module.
from numpy import *
import csv
infile = open('europenewMatrix.txt', 'r')
reader = csv.reader(infile)
reader.next() # skip the first line
numbers = [[int(num) for num in row[2:]] for row in reader]
infile.close()
travelmat = array(numbers)
http://docs.python.org/library/csv.html
if someone has a question that could have the same title but uses real Excel (.xls) files, try this (using module xlrd):
import xlrd
import numpy as np
sheet = xlrd.open_workbook('test_readxls.xls').sheet_by_name('sheet1')
n_rows, n_cols = 5,2
data = np.zeros((n_rows, n_cols))
for row in range(n_rows):
for col in range(n_cols):
data[row,col] = float(sheet.cell(row,col).value)

How can I get a specific field of a csv file?

I need a way to get a specific item(field) of a CSV. Say I have a CSV with 100 rows and 2 columns (comma seperated). First column emails, second column passwords. For example I want to get the password of the email in row 38. So I need only the item from 2nd column row 38...
Say I have a csv file:
aaaaa#aaa.com,bbbbb
ccccc#ccc.com,ddddd
How can I get only 'ddddd' for example?
I'm new to the language and tried some stuff with the csv module, but I don't get it...
import csv
mycsv = csv.reader(open(myfilepath))
for row in mycsv:
text = row[1]
Following the comments to the SO question here, a best, more robust code would be:
import csv
with open(myfilepath, 'rb') as f:
mycsv = csv.reader(f)
for row in mycsv:
text = row[1]
............
Update: If what the OP actually wants is the last string in the last row of the csv file, there are several aproaches that not necesarily needs csv. For example,
fulltxt = open(mifilepath, 'rb').read()
laststring = fulltxt.split(',')[-1]
This is not good for very big files because you load the complete text in memory but could be ok for small files. Note that laststring could include a newline character so strip it before use.
And finally if what the OP wants is the second string in line n (for n=2):
Update 2: This is now the same code than the one in the answer from J.F.Sebastian. (The credit is for him):
import csv
line_number = 2
with open(myfilepath, 'rb') as f:
mycsv = csv.reader(f)
mycsv = list(mycsv)
text = mycsv[line_number][1]
............
#!/usr/bin/env python
"""Print a field specified by row, column numbers from given csv file.
USAGE:
%prog csv_filename row_number column_number
"""
import csv
import sys
filename = sys.argv[1]
row_number, column_number = [int(arg, 10)-1 for arg in sys.argv[2:])]
with open(filename, 'rb') as f:
rows = list(csv.reader(f))
print rows[row_number][column_number]
Example
$ python print-csv-field.py input.csv 2 2
ddddd
Note: list(csv.reader(f)) loads the whole file in memory. To avoid that you could use itertools:
import itertools
# ...
with open(filename, 'rb') as f:
row = next(itertools.islice(csv.reader(f), row_number, row_number+1))
print row[column_number]
import csv
def read_cell(x, y):
with open('file.csv', 'r') as f:
reader = csv.reader(f)
y_count = 0
for n in reader:
if y_count == y:
cell = n[x]
return cell
y_count += 1
print (read_cell(4, 8))
This example prints cell 4, 8 in Python 3.
There is an interesting point you need to catch about csv.reader() object. The csv.reader object is not list type, and not subscriptable.
This works:
for r in csv.reader(file_obj): # file not closed
print r
This does not:
r = csv.reader(file_obj)
print r[0]
So, you first have to convert to list type in order to make the above code work.
r = list( csv.reader(file_obj) )
print r[0]
Finaly I got it!!!
import csv
def select_index(index):
csv_file = open('oscar_age_female.csv', 'r')
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
l = line['Index']
if l == index:
print(line[' "Name"'])
select_index('11')
"Bette Davis"
Following may be be what you are looking for:
import pandas as pd
df = pd.read_csv("table.csv")
print(df["Password"][row_number])
#where row_number is 38 maybe
import csv
inf = csv.reader(open('yourfile.csv','r'))
for row in inf:
print row[1]

Categories