Parsing CSV File into Python into a contigous block - python

I am trying to load in time series/Apple's stock price data (3000X5) into Python.
So date, open, high, low, close. I am running the following code in python spyder.
import matplotlib.pyplot as plt
import csv
datafile = open('C:\Users\Riemmman\Desktop\SAMPLE_AAPL_DATA_FOR_Python.csv')
datareader = csv.reader(datafile)
data = []
for row in datareader:
data.append(row)
But the 'data' file still remains as a list file. I want it separated into a continuous block with the headers on top and the data in it's respective column with date being at the utmost left-hand side. As one would see the data in R/Matlab. What am I missing? Thank you for your help.

You want to transpose the data; rows to columns. The zip() function, when applied to all rows, does this for you. Use *datareader to have Python pull all rows in and apply them as separate arguments to the zip() function:
filename = 'C:\Users\Riemmman\Desktop\SAMPLE_AAPL_DATA_FOR_Python.csv'
with open(filename, 'rb') as datafile:
datareader = csv.reader(datafile)
columns = zip(*datareader)
This also uses some more best practices:
Using the file as a context manager with the with statement ensures it is clsed automatically
Open the file in binary mode so the csv module can manage line endings correctly

Related

Read very huge csv file in chunks using generators and pandas in python

I have a very huge CSVs of 40ishGB , how I can read it chunk by chunk and add a column with value "today's date".
Approaches I tried is directly reading and my system crashed.
Then I used chunks in pd.read_csv which is well one solution to it.
I was wondering if someone suggests a way how we can use generators to do this and add the column to every chunk?
I think using pd.read_csv with chunksize is already quite like using a generator.
This will add a new column to the end, and assign a value of 1 to each row for the column.
with open('test.csv', 'r') as fin, open('test_output.csv', 'w') as fout:
line = fin.readline()
fout.write(line.rstrip('\n') + ',new_column\n')
while True:
line = fin.readline()
if line:
fout.write(line.rstrip('\n') + ',1\n')
else:
break
Here is a convtools based example:
from datetime import date
from convtools import conversion as c
from convtools.contrib.tables import Table
Table.from_csv("input.csv", header=True).update(
**{"name of the new column": date.today().strftime("%Y-%m-%d")}
).into_csv("output.csv")
It will process the stream from one file to another, applying the transformation in the middle.

How to fetch input from the csv file in python

I have a csv (input.csv) file as shown below:
VM IP Naa_Dev Datastore
vm1 xx.xx.xx.x1 naa.ab1234 ds1
vm2 xx.xx.xx.x2 naa.ac1234 ds1
vm3 xx.xx.xx.x3 naa.ad1234 ds2
I want to use this csv file as an input file for my python script. Here in this file, first line i.e. (VM IP Naa_Dev Datastore) is the column heading and each value is separated by space.
So my question is how we can use this csv file for input values in python so if I need to search in python script that what is the value of vm1 IP then it should pickup xx.xx.xx.x1 or same way if I am looking for VM which has naa.ac1234 Naa_Dev should take vm2.
I am using Python version 2.7.8
Any help is much appreciated.
Thanks
Working with tabular data like this, the best way is using pandas.
Something like:
import pandas
dataframe = pandas.read_csv('csv_file.csv')
# finding IP by vm
print(dataframe[dataframe.VM == 'vm1'].IP)
# OUTPUT: xx.xx.xx.x1
# or find by Naa_Dev
print(dataframe[dataframe.Naa_Dev == 'xx.xx.xx.x2'].VM)
# OUTPUT: vm2
For importing csv into python you can use pandas, in your case the code would look like:
import pandas as pd
df = pd.read_csv('input.csv', sep=' ')
and for locating certain rows in created dataframe you can multiple options (that you can easily find in pandas or just by googling 'filter data python'), for example:
df['VM'].where(df['Naa_Dev'] == 'naa.ac1234')
Use the pandas module to read the file into a DataFrame. There is a lot of parameters for reading csv files with pandas.read_csv. The dataframe.to_string() function is extremely useful.
Solution:
# import module with alias 'pd'
import pandas as pd
# Open the CSV file, delimiter is set to white space, and then
# we specify the column names.
dframe = pd.read_csv("file.csv",
delimiter=" ",
names=["VM", "IP", "Naa_Dev", "Datastore"])
# print will output the table
print(dframe)
# to_string will allow you to align and adjust content
# e.g justify = left to align columns to the left.
print(dframe.to_string(justify="left"))
Pandas is probably the best answer but you can also:
import csv
your_list = []
with open('dummy.csv') as csvfile:
reader = csv.DictReader(csvfile, delimiter=' ')
for row in reader:
your_list += [row]
print(your_list)

Writing value to given filed in csv file using pandas or csv module

Is there any way you can write value to specific place in given .csv file using pandas or csv module?
I have tried using csv_reader to read the file and find a line which fits my requirements though I couldn't figure out a way to switch value which is in the file to mine.
What I am trying to achieve here is that I have a spreadsheet of names and values. I am using JSON to update the values from the server and after that I want to update my spreadsheet also.
The latest solution which I came up with was to create separate sheet from which I will get updated data, but this one is not working, though there is no sequence in which the dict is written to the file.
def updateSheet(fileName, aValues):
with open(fileName+".csv") as workingSheet:
writer = csv.DictWriter(workingSheet,aValues.keys())
writer.writeheader()
writer.writerow(aValues)
I will appreciate any guidance and tips.
You can try this way to operate the specified csv file
import pandas as pd
a = ['one','two','three']
b = [1,2,3]
english_column = pd.Series(a, name='english')
number_column = pd.Series(b, name='number')
predictions = pd.concat([english_column, number_column], axis=1)
save = pd.DataFrame({'english':a,'number':b})
save.to_csv('b.csv',index=False,sep=',')

Load all rows in csv file - Python

I want to load a csv file into python. The csv file contains grades for a random number of students and a random number of assignments.
I want python to delete the header and the first column (Name of student) and this is my code:
with open("testgrades.csv") as f:
ncols = len(f.readline().split(','))
nrows = sum(1 for row in f)
grades = np.loadtxt("testgrades.csv", delimiter=',', skiprows=1, usecols=range(1,ncols+1))
print(document1)
The code works for columns but can't handle if I add one or more rows in the csv file?
My CSV file:
csv
And output from Python:
Output
Your csv image looks like a messed up spread sheet image. It isn't a copy of the csv file itself, which is plain text. You should be able to copy-n-paste that text to your question.
The Output image is an array, with numbers that correspond to the first 6 rows of the csv image.
Your question is not clear. I'm guessing you added the last 2 rows to the spread sheet, and are having problems loading those into numpy. I don't see anything wrong with those numbers in the spread sheet image. But if you show the actual csv file content, we might identify the problem. Maybe you aren't actually writing those added rows to the csv file.
Your code sample, with corrected indentation is:
with open("testgrades.csv") as f:
ncols = len(f.readline().split(','))
nrows = sum(1 for row in f)
grades = np.loadtxt("testgrades.csv", delimiter=',', skiprows=1, usecols=range(1,ncols+1))
print(grades)
I can see using the ncols to determine the number of columns. The usecols parameter needs an explicit list of columns, not some sort of all-but-first. You could have also gotten that number from a plain loadtxt (or genfromtxt).
But why calculate nrows? You don't appear to use it. And it isn't needed in the loadtxt. genfromtxt allows a max_rows parameter if you need to limit the number of rows read.
Python has a special module for reading and writing CSV files Python CSV
Python 2
import csv
with open('testgrades.csv', 'rb') as f:
Python 3
import csv
with open('testgrades.csv', newline='') as f:

Extract designated data from one csv file then assign to another csv file using python

I got a csv file containing data in this form,
I want to extract data from column C and write them into a new csv file, like this,
So I need to do 2 things:
write 'node' and number from 1 to 22 into the first row and column (since in this case, there are 22 in one repeated cycle in the column A in input csv)
I have got data in column c extracted and write in output csv, like this,
I need to transpose those data every 22 rows one time and fill them in row starts from B2 position in excel, then B3, B4,...etc.
It's clear that I must loop through every row to do this efficiently, but I don't know how to apply the csv module in python.
Should I download the xlrd package, or can I handle this only use the built-in csv module?
I am working with python 2.7.6 and pyscripter under Windows 8.1 x64. Feel free to give me any suggestion, thanks a lot!
Read the csv python documentation.
The simple way to iterate through rows with csv reader:
import csv
X = []
spamreader = csv.reader('path_to_file/filename.csv',delimiter=',')
for row in spamreader:
X.append(row)
This creates a variable with all the csv data. The structure of your file will make it difficult to read because the cell_separator is ',' but there are also multiple commas within each cell and because of the parentheses there will be a mixture of string and numerical data that will require some cleaning. If you have access to reformatting the csv it might be easier if each cell looked like 1,2,0.01 instead of (1,2,0.01), also consider using a different delimiter between cells such as ';'.
If not expect some tedious data cleaning, and definitely read through the documentation linked above.
Edit: Try the following
import csv
X = []
with open('path_to_file/filename.csv','rb') as csvfile:
spamreader = csv.reader(csvfile,delimiter=',')
for row in spamreader:
rowTemp = []
for i in range(len(row)):
if (i+1)%3==0: #gets every third cell
rowTemp.append(row[i])
X.append(rowTemp)
This is a matrix of all the distance values. Then try:
with open('path_to_output_file/output_file.csv','wb') as csvfile:
spamwriter = csv.writer(csvfile,delimter=',')
for sublist in X:
spamwriter.writerow(sublist)
Not sure if this is exactly what you're looking for but it should be close. It ouputs a csv file that is stripped of all the node pairs

Categories