Python and excel reading files problem - python

I am sorry if this is a silly question but I have been working on this for hours and I cannot make it work. Please help!
I have a .txt file that originated from Excel. The file contains strings and numbers but I am only interested in the numbers, which is why I skip the first line and I only read from column 2 on.
from numpy import *
I load it into Python doing
infile = open('europenewMatrix.txt','r')
infile.readline() # skip the first line
numbers = [line.split(',')[2:] for line in infile.readlines()]
infile.close()
because I need to do computations with this, I convert it into a matrix:
travelMat = array(numbers)
ok, but this didn't convert the strings into integers, so I manually do it:
for i in xrange(len(numbers)):
for j in xrange(len(numbers)):
travelMat[i,j] = int(self.travelMat[i,j])
#end for
At this point, I was hoping that all my entries would be integers
but if I do
print 'type is',type(self.travelMat[1,2])
the answer is:
type is <type 'numpy.string_'>
how can I really convert all my entries into integers?
thanks a lot!

convert the numbers as you read them, before creating the array:
infile = open('europenewMatrix.txt','r')
infile.readline() # skip the first line
numbers = []
for line in infile:
numbers.append([int(val) for val in line.split(',')[2:]])
infile.close()
travelMat = array(numbers)

If you're working with a csv or csv-like file, use the csv standard library module.
from numpy import *
import csv
infile = open('europenewMatrix.txt', 'r')
reader = csv.reader(infile)
reader.next() # skip the first line
numbers = [[int(num) for num in row[2:]] for row in reader]
infile.close()
travelmat = array(numbers)
http://docs.python.org/library/csv.html

if someone has a question that could have the same title but uses real Excel (.xls) files, try this (using module xlrd):
import xlrd
import numpy as np
sheet = xlrd.open_workbook('test_readxls.xls').sheet_by_name('sheet1')
n_rows, n_cols = 5,2
data = np.zeros((n_rows, n_cols))
for row in range(n_rows):
for col in range(n_cols):
data[row,col] = float(sheet.cell(row,col).value)

Related

Converting CSV into Array in Python

I have a csv file like below. A small csv file and I have uploaded it here
I am trying to convert csv values into array.
My expectation output like
My solution
results = []
with open("Solutions10.csv") as csvfile:
reader = csv.reader(csvfile, quoting=csv.QUOTE_NONNUMERIC) # change contents to floats
for row in reader: # each row is a list
results.append(row)
but I am getting a
ValueError: could not convert string to float: ' [1'
There is a problem with your CSV. It's just not csv (coma separated values). To do this you need some cleaning:
import re
# if you expect only integers
pattern = re.compile(r'\d+')
# if you expect floats (uncomment below)
# pattern = re.compile(r'\d+\.*\d*')
result = []
with open(filepath) as csvfile:
for row in csvfile:
result.append([
int(val.group(0))
# float(val.group(0))
for val in re.finditer(pattern, row)
])
print(result)
You can also solve this with substrings if it's easier for you and you know the format exactly.
Note: Also I see there is "eval" suggestion. Please, be careful with it as you can get into a lot of trouble if you scan unknown/not trusted files...
You can do this:
with open("Solutions10.csv") as csvfile:
result = [eval(k) for k in csvfile.readlines()]
Edit: Karl is cranky and wants you todo this:
with open("Solutions10.csv") as csvfile:
result = []
for line in csvfile.readlines():
line = line.replace("[","").replace("]","")
result.append([int(k) for k in line.split(",")]
But you're the programmer so you can do what you want. If you trust your input file eval is fine.

Can I print lines randomly from a csv in Python?

I'm trying print lines randomly from a csv.
Lets say the csv has the below 10 lines -
1,One
2,Two
3,Three
4,Four
5,Five
6,Six
7,Seven
8,Eight
9,Nine
10,Ten
If I write a code like below, it prints each line as a list in the same order as present in the CSV
import csv
with open("MyCSV.csv") as f:
reader = csv.reader(f)
for row_num, row in enumerate(reader):
print(row)
Instead, I'd like it to be random.
Its just a print for now. I'll later pass each line as a List to a Function.
This should work. You can reuse the lines list in your code as it is shuffled.
import random
with open("tmp.csv", "r") as f:
lines = f.readlines()
random.shuffle(lines)
print(lines)
import csv
import random
csv_elems = []
with open("MyCSV.csv") as f:
reader = csv.reader(f)
for row_num, row in enumerate(reader):
csv_elems.append(row)
random.shuffle(csv_elems)
print(csv_elems[0])
As you can see I'm just printing the first elem, you can iterate over the list, keep shuffling & print
Well you can define a list, append all elements of csv file into it, then shuffle it and print them, assume that the name of this list is temp
import csv
import random
temp = []
with open("your csv file.csv") as file:
reader = csv.reader(file)
for row_num, row in enumerate(reader):
temp.append(row)
random.shuffle(temp)
for i in range(len(temp)):
print(temp[i])
Why better don't you use pandas to handle csv?
import pandas as pd
data = pd.read_csv("MyCSV.csv")
And to get the samples you are looking for just write:
data.sample() # print one sample
data.sample(5) # to write 5 samples
Also if you want to pass each line to a function.
data_after_function = data.appy(function_name)
and inside the function you can cast the line into a list with list()
Hope this helps!
Couple of things to do:
Store CSV into a sequence of some sort
Get the data randomly
For 1, it’s probably best to use some form of sequence comprehension (I’ve gone for nested tuple in a list as it seems you want the row numbers and we can’t use dictionaries for shuffle).
We can use the random module for number 2.
import random
import csv
with open("MyCSV.csv") as f:
reader = csv.reader(f)
my_csv = [(row_num, row) for row_num, row in enumerate(reader)]
# get only 1 item from the list at random
random_row = random.choice(my_csv)
# randomise the order of all the rows
shuffled_csv = random.shuffle(my_csv)

Better way to parse CSV into list or array

Is there a better way to create a list or a numpy array from this csv file? What I'm asking is how to do it and parse more gracefully than I did in the code below.
fname = open("Computers discovered recently by discovery method.csv").readlines()
lst = [elt.strip().split(",")[8:] for elt in fname if elt != "\n"][4:]
lst2 = []
for row in lst:
print(row)
if row[0].startswith("SMZ-") or row[0].startswith("MTR-"):
lst2.append(row)
print(*lst2, sep = "\n")
You can always use Pandas. As an example,
import pandas as pd
import numpy as np
df = pd.read_csv('pandas_dataframe_importing_csv/example.csv')
To convert it, you will have to convert it to your favorite numeric type. I guess you can write the whole thing in one line:
result = numpy.array(list(df)).astype("float")
You can also do the following:
from numpy import genfromtxt
my_data = genfromtxt('my_file.csv', delimiter=',')
You can use pandas and specify header column to make it work correctly on you sample file
import pandas as pd
df = pd.read_csv('Computers discovered recently by discovery method.csv', header=2)
You can check your content using:
>>> df.head()
You can check headers using
>>> df.columns
And to convert it to numpy array you can use
>>> np_arr = df.values
It comes with a lot of options to parse and read csv files. For more information please check the docs
I am not sure what you want but try this
import csv
with open("Computers discovered recently by discovery method.csv", 'r') as f:
reader = csv.reader(f)
ll = list(reader)
print (ll)
this should read the csv line by line and store it as a list
You should never parse CSV structures manually unless you want to tackle all possible exceptions and CSV format oddities. Python has you covered in that regard with its csv module.
The main problem, in your case, stems from your data - there seems to be two different CSV structures in a single file so you first need to find where your second structure begins. Plus, from your code, it seems you want to filter out all columns before Details_Table0_Netbios_Name0 and include only rows whose Details_Table0_Netbios_Name0 starts with SMZ- or MTR-. So something like:
import csv
with open("Computers discovered recently by discovery method.csv") as f:
reader = csv.reader(f) # create a CSV reader
for row in reader: # skip the lines until we encounter the second CSV structure/header
if row and row[0] == "Header_Table0_Netbios_Name0":
break
index = row.index("Details_Table0_Netbios_Name0") # find where your columns begin
result = [] # storage for the rows we're interested in
for row in reader: # read the rest of the CSV row by row
if row and row[index][:4] in {"SMZ-", "MTR-"}: # only include these rows
result.append(row[index:]) # trim and append to the `result` list
print(result[10]) # etc.
# ['MTR-PC0BXQE6-LB', 'PR2', 'anisita', 'VALUEADDCO', 'VALUEADDCO', 'Heartbeat Discovery',
# '07.12.2017 17:47:51', '13']
should do the trick.
Sample Code
import csv
csv_file = 'sample.csv'
with open(csv_file) as fh:
reader = csv.reader(fh)
for row in reader:
print(row)
sample.csv
name,age,salary
clado,20,25000
student,30,34000
sam,34,32000

How to import a csv-file into a data array?

I have a line of code in a script that imports data from a text file with lots of spaces between values into an array for use later.
textfile = open('file.txt')
data = []
for line in textfile:
row_data = line.strip("\n").split()
for i, item in enumerate(row_data):
try:
row_data[i] = float(item)
except ValueError:
pass
data.append(row_data)
I need to change this from a text file to a csv file. I don't want to just change this text to split on commas (since some values can have commas if they're in quotes). Luckily I saw there is a csv library I can import that can handle this.
import csv
with open('file.csv', 'rb') as csvfile:
???
How can I load the csv file into the data array?
If it makes a difference, this is how the data will be used:
row = 0
for row_data in (data):
worksheet.write_row(row, 0, row_data)
row += 1
Assuming the CSV file is delimited with commas, the simplest way using the csv module in Python 3 would probably be:
import csv
with open('testfile.csv', newline='') as csvfile:
data = list(csv.reader(csvfile))
print(data)
You can specify other delimiters, such as tab characters, by specifying them when creating the csv.reader:
data = list(csv.reader(csvfile, delimiter='\t'))
For Python 2, use open('testfile.csv', 'rb') to open the file.
You can use pandas library or numpy to read the CSV file. If your file is tab-separated then use '\t' in place of comma in both sep and delimiter arguments below.
import pandas as pd
myFile = pd.read_csv('filepath', sep=',')
Or
import numpy as np
myFile = np.genfromtxt('filepath', delimiter=',')
I think the simplest way to do this is via Pandas:
import pandas as pd
data = pd.read_csv(FILE).values
This returns a Numpy array of values from a DataFrame created from the CSV. See the documentation here.
This method also works for me.
Example: Having random data, and each data point starting on a newline like below:
'dog',5,2
'cat',5,7,1
'man',5,7,3,'banana'
'food',5,8,9,4,'girl'
import csv
with open('filePath.csv', 'r') as readData:
readCsv = csv.reader(readData)
data = list(readCsv)

How can I get a specific field of a csv file?

I need a way to get a specific item(field) of a CSV. Say I have a CSV with 100 rows and 2 columns (comma seperated). First column emails, second column passwords. For example I want to get the password of the email in row 38. So I need only the item from 2nd column row 38...
Say I have a csv file:
aaaaa#aaa.com,bbbbb
ccccc#ccc.com,ddddd
How can I get only 'ddddd' for example?
I'm new to the language and tried some stuff with the csv module, but I don't get it...
import csv
mycsv = csv.reader(open(myfilepath))
for row in mycsv:
text = row[1]
Following the comments to the SO question here, a best, more robust code would be:
import csv
with open(myfilepath, 'rb') as f:
mycsv = csv.reader(f)
for row in mycsv:
text = row[1]
............
Update: If what the OP actually wants is the last string in the last row of the csv file, there are several aproaches that not necesarily needs csv. For example,
fulltxt = open(mifilepath, 'rb').read()
laststring = fulltxt.split(',')[-1]
This is not good for very big files because you load the complete text in memory but could be ok for small files. Note that laststring could include a newline character so strip it before use.
And finally if what the OP wants is the second string in line n (for n=2):
Update 2: This is now the same code than the one in the answer from J.F.Sebastian. (The credit is for him):
import csv
line_number = 2
with open(myfilepath, 'rb') as f:
mycsv = csv.reader(f)
mycsv = list(mycsv)
text = mycsv[line_number][1]
............
#!/usr/bin/env python
"""Print a field specified by row, column numbers from given csv file.
USAGE:
%prog csv_filename row_number column_number
"""
import csv
import sys
filename = sys.argv[1]
row_number, column_number = [int(arg, 10)-1 for arg in sys.argv[2:])]
with open(filename, 'rb') as f:
rows = list(csv.reader(f))
print rows[row_number][column_number]
Example
$ python print-csv-field.py input.csv 2 2
ddddd
Note: list(csv.reader(f)) loads the whole file in memory. To avoid that you could use itertools:
import itertools
# ...
with open(filename, 'rb') as f:
row = next(itertools.islice(csv.reader(f), row_number, row_number+1))
print row[column_number]
import csv
def read_cell(x, y):
with open('file.csv', 'r') as f:
reader = csv.reader(f)
y_count = 0
for n in reader:
if y_count == y:
cell = n[x]
return cell
y_count += 1
print (read_cell(4, 8))
This example prints cell 4, 8 in Python 3.
There is an interesting point you need to catch about csv.reader() object. The csv.reader object is not list type, and not subscriptable.
This works:
for r in csv.reader(file_obj): # file not closed
print r
This does not:
r = csv.reader(file_obj)
print r[0]
So, you first have to convert to list type in order to make the above code work.
r = list( csv.reader(file_obj) )
print r[0]
Finaly I got it!!!
import csv
def select_index(index):
csv_file = open('oscar_age_female.csv', 'r')
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
l = line['Index']
if l == index:
print(line[' "Name"'])
select_index('11')
"Bette Davis"
Following may be be what you are looking for:
import pandas as pd
df = pd.read_csv("table.csv")
print(df["Password"][row_number])
#where row_number is 38 maybe
import csv
inf = csv.reader(open('yourfile.csv','r'))
for row in inf:
print row[1]

Categories