Load all rows in csv file - Python - python

I want to load a csv file into python. The csv file contains grades for a random number of students and a random number of assignments.
I want python to delete the header and the first column (Name of student) and this is my code:
with open("testgrades.csv") as f:
ncols = len(f.readline().split(','))
nrows = sum(1 for row in f)
grades = np.loadtxt("testgrades.csv", delimiter=',', skiprows=1, usecols=range(1,ncols+1))
print(document1)
The code works for columns but can't handle if I add one or more rows in the csv file?
My CSV file:
csv
And output from Python:
Output

Your csv image looks like a messed up spread sheet image. It isn't a copy of the csv file itself, which is plain text. You should be able to copy-n-paste that text to your question.
The Output image is an array, with numbers that correspond to the first 6 rows of the csv image.
Your question is not clear. I'm guessing you added the last 2 rows to the spread sheet, and are having problems loading those into numpy. I don't see anything wrong with those numbers in the spread sheet image. But if you show the actual csv file content, we might identify the problem. Maybe you aren't actually writing those added rows to the csv file.
Your code sample, with corrected indentation is:
with open("testgrades.csv") as f:
ncols = len(f.readline().split(','))
nrows = sum(1 for row in f)
grades = np.loadtxt("testgrades.csv", delimiter=',', skiprows=1, usecols=range(1,ncols+1))
print(grades)
I can see using the ncols to determine the number of columns. The usecols parameter needs an explicit list of columns, not some sort of all-but-first. You could have also gotten that number from a plain loadtxt (or genfromtxt).
But why calculate nrows? You don't appear to use it. And it isn't needed in the loadtxt. genfromtxt allows a max_rows parameter if you need to limit the number of rows read.

Python has a special module for reading and writing CSV files Python CSV
Python 2
import csv
with open('testgrades.csv', 'rb') as f:
Python 3
import csv
with open('testgrades.csv', newline='') as f:

Related

How to write multiple arrays into a csv file

I am trying to write a code to write multiple arrays into one single data frame in panda where I append the data frame row by row . For example I have a row of [1,2,3,4,5] and next row of [6,7,8,9,10].
I want to print it as :
1,2,3,4,5
6,7,8,9,10
in a csv file. I want to write multiple rows like this in single csv file but all codes can be found only for appending a data frame column by column. Can I write this array row by row too?
Please help.
I tried using pandas library but couldn't fine relevant command.
the next code snippet might help you:
import csv
with open('file.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows([[1,2,3], [4,5,6]])

What would be an effective way to merge/join a csv file and txt file based on a common key?

Let's say I have a csv file that is such that:
Dog
Cat
Bird
is a common column with a txt file I have that has two columns of interest:
Cat 8.9
Bird 12.2
Dog 2.1
One column being the identifiers (species name), the other being let's say, speed in mph.
I want to parse through the rows of the csv file, and lookup the species speed from my txt file, and then add it as an extra column to the csv file. Ie, I want to join these two files based on the common species key, and I specifically JUST want to append the speed from the txt (and not other columns that may be in that file) to the csv file. The csv file also has superfluous columns that I would like to keep; but I don't want any of the superfluous columns in the txt file (let's say it had heights, weights, and so on. I don't want to add those to my csv file).
Let's say that this file is very large; I have 1,000,000 species in both files.
What would be the shortest and cleanest way to do this?
Should I write some type of a nested loop, should I use a dictionary, or would Pandas be an effective method?
Note: let's say I also want to compute the mean speed of all the species; ie I want to take the mean of the speed column. Would numpy be the most effective method for this?
Additional note: All the species names are in common, but the order in the csv and txt files are different (as indicated in my example). How would I correct for this?
Additional note 2: This is a totally contrived example since I've been learning about dictionaries and input/output files, and reviewing loops that I wrote in the past.
Note 3: The txt file should be tab seperated, but the csv file is (obviously) comma separated.
You could do everything needed with the built-in csv module. As I mentioned in a comment, it can be used to read the text file since it's tab-delimited (as well as read the csv file and write an updated version of it).
You seem to indicate there are other fields besides the "animal" and "speed" in both files, but the code below assumes they only have one or both of them.
import csv
csv_filename = 'the_csv_file.csv'
updated_csv_filename = 'the_csv_file2.csv'
txt_filename = 'the_text_file.txt'
# Create a lookup table from text file.
with open(txt_filename, 'r', newline='') as txt_file:
# Use csv module to read text file.
reader = csv.DictReader(txt_file, ('animal', 'speed'), delimiter='\t')
lookup = {row['animal']: row['speed'] for row in reader}
# Read csv file and create an updated version of it.
with open(csv_filename, 'r', newline='') as csv_file, \
open(updated_csv_filename, 'w', newline='') as updated_csv_file:
reader = csv.reader(csv_file)
writer = csv.writer(updated_csv_file)
for row in reader:
# Insert looked-up value (speed) into the row following the first column
# (animal) and copy any columns following that.
row.insert(1, lookup[row[0]]) # Insert looked-up speed into column 2.
writer.writerow(row)
Given the two input files in your question, here's the contents of the updated csv file (assuming there were no additional columns):
Dog,2.1
Cat,8.9
Bird,12.2
This is probably most easily achieved with pandas DataFrames.
You can load both your CSV and text file using the read_csv function (just adjust the separator for the text file to tab) and use the join function to join the two DataFrames on the columns you want to match, something like:
column = pd.read_csv('data.csv')
data = pd.read_csv('data.txt', sep='\t')
joined = column.join(data, on='species')
result = joined[['species', 'speed', 'other column you want to keep']]
If you want to conduct more in depth analysis of your data or your files are too large for memory, you may want to look into importing your data into a dedicated database management system like PostgreSQL.
EDIT: If your files don't contain column names, you can load them with custom names using pd.read_csv(file_path, header=None, names=['name1','name2']) as described here. Also, columns can be renamed after loading using dataframe.rename(columns = {'oldname':'newname'}, inplace = True) as seen here.
You can just use the merge() method of pandas like this:
import pandas as pd
df_csv = pd.read_csv('csv.csv', header=None)
df_txt = pd.read_fwf('txt.txt', header=None)
result = pd.merge(df_txt,df_csv)
print(result)
Gives the following output:
0 1
0 Cat 8.9
1 Bird 12.2
2 Dog 2.1

Parsing CSV File into Python into a contigous block

I am trying to load in time series/Apple's stock price data (3000X5) into Python.
So date, open, high, low, close. I am running the following code in python spyder.
import matplotlib.pyplot as plt
import csv
datafile = open('C:\Users\Riemmman\Desktop\SAMPLE_AAPL_DATA_FOR_Python.csv')
datareader = csv.reader(datafile)
data = []
for row in datareader:
data.append(row)
But the 'data' file still remains as a list file. I want it separated into a continuous block with the headers on top and the data in it's respective column with date being at the utmost left-hand side. As one would see the data in R/Matlab. What am I missing? Thank you for your help.
You want to transpose the data; rows to columns. The zip() function, when applied to all rows, does this for you. Use *datareader to have Python pull all rows in and apply them as separate arguments to the zip() function:
filename = 'C:\Users\Riemmman\Desktop\SAMPLE_AAPL_DATA_FOR_Python.csv'
with open(filename, 'rb') as datafile:
datareader = csv.reader(datafile)
columns = zip(*datareader)
This also uses some more best practices:
Using the file as a context manager with the with statement ensures it is clsed automatically
Open the file in binary mode so the csv module can manage line endings correctly

Extract designated data from one csv file then assign to another csv file using python

I got a csv file containing data in this form,
I want to extract data from column C and write them into a new csv file, like this,
So I need to do 2 things:
write 'node' and number from 1 to 22 into the first row and column (since in this case, there are 22 in one repeated cycle in the column A in input csv)
I have got data in column c extracted and write in output csv, like this,
I need to transpose those data every 22 rows one time and fill them in row starts from B2 position in excel, then B3, B4,...etc.
It's clear that I must loop through every row to do this efficiently, but I don't know how to apply the csv module in python.
Should I download the xlrd package, or can I handle this only use the built-in csv module?
I am working with python 2.7.6 and pyscripter under Windows 8.1 x64. Feel free to give me any suggestion, thanks a lot!
Read the csv python documentation.
The simple way to iterate through rows with csv reader:
import csv
X = []
spamreader = csv.reader('path_to_file/filename.csv',delimiter=',')
for row in spamreader:
X.append(row)
This creates a variable with all the csv data. The structure of your file will make it difficult to read because the cell_separator is ',' but there are also multiple commas within each cell and because of the parentheses there will be a mixture of string and numerical data that will require some cleaning. If you have access to reformatting the csv it might be easier if each cell looked like 1,2,0.01 instead of (1,2,0.01), also consider using a different delimiter between cells such as ';'.
If not expect some tedious data cleaning, and definitely read through the documentation linked above.
Edit: Try the following
import csv
X = []
with open('path_to_file/filename.csv','rb') as csvfile:
spamreader = csv.reader(csvfile,delimiter=',')
for row in spamreader:
rowTemp = []
for i in range(len(row)):
if (i+1)%3==0: #gets every third cell
rowTemp.append(row[i])
X.append(rowTemp)
This is a matrix of all the distance values. Then try:
with open('path_to_output_file/output_file.csv','wb') as csvfile:
spamwriter = csv.writer(csvfile,delimter=',')
for sublist in X:
spamwriter.writerow(sublist)
Not sure if this is exactly what you're looking for but it should be close. It ouputs a csv file that is stripped of all the node pairs

Selecting certain rows from a set of data files in Python

I am trying to manipulate some data with Python, but having quite a bit of difficulty (given that I'm still a rookie). I have taken some code from other questions/sites but still can't quite get what I want.
Basically what I need is to take a set of data files and select the data from 1 particular row of each of those files, then put it into a new file so I can plot it.
So, to get the data into Python in the first place I'm trying to use:
data = []
path = C:/path/to/file
for files in glob.glob(os.path.join(path, ‘*.*’)):
data.append(list(numpy.loadtxt(files, skiprows=34))) #first 34 rows aren't used
This has worked great for me once before, but for some reason it won't work now. Any possible reasons why that might be the case?
Anyway, carrying on, this should give me a 2D list containing all the data.
Next I want to select a certain row from each data set, and can do so using:
x = list(xrange(30)) #since there are 30 files
Then:
rowdata = list(data[i][some particular row] for i in x)
Which gives me a list containing the value for that particular row from each imported file. This part seems to work quite nicely.
Lastly, I want to write this to a file. I have been trying:
f = open('path/to/file', 'w')
for item in rowdata:
f.write(item)
f.close()
But I keep getting an error. Is there another method of approach here?
You are already using numpy to load the text, you can use it to manipulate it as well.
import numpy as np
path = 'C:/path/to/file'
mydata = np.array([np.loadtxt(f) for f in glob.glob(os.path.join(path, '*.*'))])
This will load all your data into one 3d array:
mydata.ndim
#3
where the first dimension (axis) runs over the files, the second over rows, the third over columns:
mydata.shape
#(number of files, number of rows in each file, number of columns in each file)
So, you can access the first file by
mydata[0,...] # equivalent to: mydata[0,:,:]
or specific parts of all files:
mydata[0,34,:] #the 35th row of the first file by
mydata[:,34,:] #the 35th row in all files
mydata[:,34,1] #the second value in the 34th row in all files
To write to file:
Say you want to write a new file with just the 35th row from all files:
np.savetxt(os.join(path,'outfile.txt'), mydata[:,34,:])
If you just have to read from a file and write to a file you can use open().
For a better solution, you can use linecache

Categories