Python read csv single specific cell - python

Im trying to read just a single cell, in order to bring in the date to use elsewhere.
Using pandas I get an error If I try to do this, generally just that the dataframe cant be read because it expects a workable dataframe and not a single cell value prior to the actual convertable dataframe far below the initial line. How can I just get the cell i.e. [A2]
CSV example

Yes, pandas is not very good at files with inconsistent format (like varying number of columns). For that purpose, I recommend you should use csv the standard library.
The code below should give you the desired value.
import csv
row = 2; col = 1 # A2 = (2,1) cell
with open("yourfile.csv") as f:
reader = csv.reader(f)
for i in range(row):
row = next(reader)
value = row[col-1] # because python index starts at zero
print(value)
Demo (using string instead of a file):
import csv
row = 2; col = 1 # A2 = (2,1) cell
input_str = ["a,b", "c,d,e", "f,g,h,i"]
reader = csv.reader(input_str)
for i in range(row):
row = next(reader)
value = row[col-1] # because python index starts at zero
print(value)

To be able to access csv like a two-dimensional object you need to convert it first to 2D python List
import csv
with open("imdb.csv") as f:
csv_as_list = list(csv.reader(f, delimiter=","))
print(csv_as_list[3][2])
The format for accessing element is csv_as_list[row_index][column_index].

Related

How can I create an endless array?

I'm trying to create an array in Python, so I can access the last cell in it without defining how many cells there are in it.
Example:
from csv import reader
a = []
i = -1
with open("ccc.csv","r") as f:
csv_reader = reader(f)
for row in csv_reader:
a[i] = row
i = i-1
Here I'm trying to take the first row in the CSV file and put it in the last cell on the array, in order to put it in reverse order on another file.
In this case, I don't know how many rows are in the CSV file, so I can not set the cells in the array as the number of the rows in the file
I tried to use f.append(row), but it inserts the values to the first cell of the array, and I want it to insert the values to the last cell of the array.
Read all the rows in the normal order, and then reverse the list:
from csv import reader
with open('ccc.csv') as f:
a = list(reader(f))
a.reverse()
First up, your current code is going to raise an index error on account of there being no elements, so a[-1] points to nothing at all.
The function you're looking for is list.insert which it inherits from the generic sequence types. list.insert takes two arguments, the index to insert a value in and the value to be inserted.
To rewrite your current code for this, you'd end up with something like
import dbf
from csv import reader
a = []
with open("ccc.csv", "r") as f:
csv_reader = reader(f)
for row in csv_reader:
a.insert(0, row)
This would reverse the contents of the csv file, which you can then write to a new file or use as you need

Finding Row with No Empty Strings

I am trying to determine the type of data contained in each column of a .csv file so that I can make CREATE TABLE statements for MySQL. The program makes a list of all the column headers and then grabs the first row of data and determines each data type and appends it to the column header for proper syntax. For example:
ID Number Decimal Word
0 17 4.8 Joe
That would produce something like CREATE TABLE table_name (ID int, Number int, Decimal float, Word varchar());.
The problem is that in some of the .csv files the first row contains a NULL value that is read as an empty string and messes up this process. My goal is to then search each row until one is found that contains no NULL values and use that one when forming the statement. This is what I have done so far, except it sometimes still returns rows that contains empty strings:
def notNull(p): # where p is a .csv file that has been read in another function
tempCol = next(p)
tempRow = next(p)
col = tempCol[:-1]
row = tempRow[:-1]
if any('' in row for row in p):
tempRow = next(p)
row = tempRow[:-1]
else:
rowNN = row
return rowNN
Note: The .csv file reading is done in a different function, whereas this function simply uses the already read .csv file as input p. Also each row is ended with a , that is treated as an extra empty string so I slice the last value off of each row before checking it for empty strings.
Question: What is wrong with the function that I created that causes it to not always return a row without empty strings? I feel that it is because the loop is not repeating itself as necessary but I am not quite sure how to fix this issue.
I cannot really decipher your code. This is what I would do to only get rows without the empty string.
import csv
def g(name):
with open('file.csv', 'r') as f:
r = csv.reader(f)
# Skip headers
row = next(r)
for row in r:
if '' not in row:
yield row
for row in g('file.csv'):
print('row without empty values: {}'.format(row))

Python - CSV to Matrix

Can you help me with this problem?
I`m new in programming and want to find out how to create a matrix, which looks like this:
matrix = {"hello":["one","two","three"],
"world": ["five","six","seven"],
"goodbye":["one","two","three"]}
I want to import a csv, which has all the strings (one, two three,...) in it and I tried with the split method, but I`m not getting there...
Another problems are the names of the categories (hello, world, goodbye)
Do you have any suggestions?
have you looked into the csv module?
https://docs.python.org/2/library/csv.html
import csv
TEST_TEXT = """\
hello,one,two,three
world,four,five,six
goodbye,one,two,three"""
TEST_FILE = TEST_TEXT.split("\n")
#file objects iterate over newlines anyway
#so this is how it would be when opening a file
#this would be the minimum needed to use the csv reader object:
for row in csv.reader(TEST_FILE):
print(row)
#or to get a list of all the rows you can use this:
as_list = list(csv.reader(TEST_FILE))
#splitting off the first element and using it as the key in a dictionary
dict_I_call_matrix = {row[0]:row[1:] for row in csv.reader(TEST_FILE)}
print(dict_I_call_matrix)
without_csv = [row.split(",") for row in TEST_FILE] #...row in TEST_TEXT.split("\n")]
matrix = [row[1:] for row in without_csv]
labels = [row[0] for row in without_csv]

Add to Values in An Array in a CSV File

I imported my CSV File and made the data into an array. Now I was wondering, what can I do so that I'm able to print a specific value in the array? For instance if I wanted the value in the 2nd row, 2nd column.
Also how would I go about adding the two values together? Thanks.
import csv
import numpy as np
f = open("Test.csv")
csv_f = csv.reader(f)
for row in csv_f:
print(np.array(row))
f.close()
There is no need to use csv module.
This code reads csv file and prints value of cell in second row and second column. I am assuming that fields are separated by commas.
with open("Test.csv") as fo:
table = [row.split(",") for row in fo.read().replace("\r", "").split("\n")]
print table[1][1]
So, I grabbed a dataset ("Company Funding Records") from here. Then, I just rewrote a little...
#!/usr/bin/python
import csv
#import numpy as np
csvaslist = []
f = open("TechCrunchcontinentalUSA.csv")
csv_f = csv.reader(f)
for row in csv_f:
# print(np.array(row))
csvaslist.append(row)
f.close()
# Now your data is in a dict. Everything past this point is just playing
# Add together a couple of arbitrary values...
print int(csvaslist[2][7]) + int(csvaslist[11][7])
# Add using a conditional...
print "\nNow let's see what Facebook has received..."
fbsum = 0
for sublist in csvaslist:
if sublist[0] == "facebook":
print sublist
fbsum += int(sublist[7])
print "Facebook has received", fbsum
I've commented lines at a couple points to show what's being used and what was unneeded. Notice at the end that referring to a particular datapoint is simply a matter of referencing what is, effectively, original_csv_file[line_number][field_on_that_line], and then recasting as int, float, whatever you need. This is because the csv file has been changed to a list of lists.
To get specific values within your array/file, and add together:
import csv
f = open("Test.csv")
csv_f = list(csv.reader(f))
#returns the value in the second row, second column of your file
print csv_f[1][1]
#returns sum of two specific values (in this example, value of second row, second column and value of first row, first column
sum = int(csv_f[1][1]) + int(csv_f[0][0])
print sum

Get Column from .txt File using CSV.Reader in Python

I've got a text file that looks like this:
162.8008 EXP Set primary_image image=stimulus/Faces/face046.jpg
162.8008 EXP Set secondary_image image=stimulus/Scenes/scene57.jpg
162.8008 EXP Set primary_image opacity=1.0
162.8008 EXP Set secondary_image opacity=0.0
162.8008 EXP Set stimulus_instr text=press for repeated faces
And I've read it in like this:
log_data = []
with open('../filename.log.txt', 'rb') as f:
reader = csv.reader(f, delimiter = '\t')
for row in reader:
log_data.append(row)
I want to access JUST that third column, right now when I say:
print log_data[2][:]
I'm returned all of a single row, like:
['8.8093', 'EXP', 'Started presenting text_2']
And when I switched and instead do:
print log_data[:][2]
I get the same exact result! I've been trying to convert it to an array with numpy and using a .split() function with no luck. Any expertise would be greatly appreciated - thanks a lot!
How about
print [row[2] for row in log_data]
If you use numpy, the following should be OK
#!/usr/bin/env python
import numpy as np
dat = np.genfromtxt('data.txt', delimiter='\t', dtype=str)
print dat[:,2]
Result: ['Set' 'Set' 'Set' 'Set' 'Set']
This post How to use numpy.genfromtxt when first column is string and the remaining columns are numbers? might be of some help.
This is the equivalent of writing
n = len(log_data[2] - 1
print log_data[2][0:n]
That is, you are telling it to print every element within row 2. If you want to access only column 3 of row then you need to use
print log_data[2][3]
If you want to loop over the data
for row in log_data:
# process row
for col in row:
# process each column
The reverse case that you mention, log_data[:][2], is printing row 2 of the slice which is the equivalent of
n = len(log_data) - 1
print log_data[0:n][2]
Numpy is not needed, and for the given data set I don't see why you would choose to use it.
def get_column(n, data):
return [row[n] for row in data]
print(get_column(2, log_data)) # => ["Set", "Set", "Set", ...]

Categories