how to read rows of a column from csv file on python? - python

activity=csv.reader(open('activity(delimited).csv')
data = np.array([activity])
url_data = data[:, 70]
I am trying to extract a column from the CSV file. This column has a list of URLs that I would like to read. However every time I run these few lines, I get:-
IndexError: too many indices for array

There are a bunch of very similar issues on StackOverflow [post 1], [post 2].
For your reference, there is a much cleaner way of achieving this.
genfromtxt will suit your requirements pretty well. From the documentation,
Load data from a text file, with missing values handled as specified.
Each line past the first skip_header lines is split at the delimiter
character, and characters following the comments character are
discarded.
You would need to do something like this.
from numpy import genfromtxt
my_data = genfromtxt('activity(delimited)', delimiter=',')
Given that yours is a csv, the delimiter is a ,.
my_data should now hold a numpy ndarray.

Related

Unsuccessful Importing .txt file containing 11 rows info and then real data starts

I am trying to import so many text files (probably around 100) as you see in the pic. Basically, I tried to use read_csv to import them but it did not work. So, files are kinda in a little bit complex form. I need to separate them in a proper way. The real data (31 columns including time in the first column) that I am going to use starts at 12th row. However, I'd like to keep those first 11 rows as well such that i.e. I can match the Measurements labels with each column in the future. Lastly, I am gonna need to write a for loop to import 100 txt files and read every 31 columns and first 11 info rows in those.
DATA VIEW
I tried read.csv by doing a lot of things even including skiprows, however it did not work out.Then, I also implemented the following code but not perfectly it gave me what I wanted to have
one of the things I've tried is
with open('zzzAD1.TXT', 'r') as the_file:
all_data = [line.split() for line in the_file.readlines()]
height_line = all_data[:11]
data = all_data[11:]
So, could anyone help me please?
If you're trying to get this into pandas, the only problem is that you need to convert the strings to floats, and you'll have to figure out what column headings to use, but you're basically on the right track here.
with open('zzzAD1.TXT', 'r') as the_file:
all_data = [line.split() for line in the_file.readlines()]
height_line = all_data[:11]
data = all_data[11:]
data = [ [float(x) for x in row] for row in data]
df = pandas.DataFrame(data)

Creating a dataframe from a csv file in pandas: column issue

I have a messy text file that I need to sort into columns in a dataframe so I
can do the data analysis I need to do. Here is the messy looking file:
Messy text
I can read it in as a csv file, that looks a bit nicer using:
import pandas as pd
data = pd.read_csv('phx_30kV_indepth_0_0_outfile.txt')
print(data)
And this prints out the data aligned, but the issue is that the output is [640 rows x 1 column]. And I need to separate it into multiple columns and manipulate it as a dataframe.
I have tried a number of solutions using StringIO that have worked here before, but nothing seems to be doing the trick.
However, when I do this, there is the issue that the
delim_whitespace=True
Link to docs ^
df = pd.read_csv('phx_30kV_indepth_0_0_outfile.txt', delim_whitespace=True)
Your input file is actually not in CSV format.
As you provided only .png picture, it is even not clear, whether this file
is divided into rows or not.
If not, you have to start from "cutting" the content into individual lines and
read the content from the output file - result of this cutting.
I think, this is the first step, before you can use either read_csv or read_table (of course, with delim_whitespace=True).

Load all rows in csv file - Python

I want to load a csv file into python. The csv file contains grades for a random number of students and a random number of assignments.
I want python to delete the header and the first column (Name of student) and this is my code:
with open("testgrades.csv") as f:
ncols = len(f.readline().split(','))
nrows = sum(1 for row in f)
grades = np.loadtxt("testgrades.csv", delimiter=',', skiprows=1, usecols=range(1,ncols+1))
print(document1)
The code works for columns but can't handle if I add one or more rows in the csv file?
My CSV file:
csv
And output from Python:
Output
Your csv image looks like a messed up spread sheet image. It isn't a copy of the csv file itself, which is plain text. You should be able to copy-n-paste that text to your question.
The Output image is an array, with numbers that correspond to the first 6 rows of the csv image.
Your question is not clear. I'm guessing you added the last 2 rows to the spread sheet, and are having problems loading those into numpy. I don't see anything wrong with those numbers in the spread sheet image. But if you show the actual csv file content, we might identify the problem. Maybe you aren't actually writing those added rows to the csv file.
Your code sample, with corrected indentation is:
with open("testgrades.csv") as f:
ncols = len(f.readline().split(','))
nrows = sum(1 for row in f)
grades = np.loadtxt("testgrades.csv", delimiter=',', skiprows=1, usecols=range(1,ncols+1))
print(grades)
I can see using the ncols to determine the number of columns. The usecols parameter needs an explicit list of columns, not some sort of all-but-first. You could have also gotten that number from a plain loadtxt (or genfromtxt).
But why calculate nrows? You don't appear to use it. And it isn't needed in the loadtxt. genfromtxt allows a max_rows parameter if you need to limit the number of rows read.
Python has a special module for reading and writing CSV files Python CSV
Python 2
import csv
with open('testgrades.csv', 'rb') as f:
Python 3
import csv
with open('testgrades.csv', newline='') as f:

Creating Arrays from cvs files in python

So I have a data file, which i must extract specific data from. Using;
x=15 #need a way for code to assess how many lines to skip from given data
maxcol=2000 #need a way to find final row in data
data=numpy.genfromtxt('data.dat.csv',skip_header=x,delimiter=',')
column_one=data[0;max,0]
column_two=data[0:max,1]
this gives me an array for the specific case where there are (x=)15 lines of metadata above the required data and where the number of rows of data is (maxcol=)2000. In what way do I go about changing the code to satisfy any value for x and maxcol?
Use pandas. Its read_csv function does all that you want (I don't include its equivalent of delimiter, sep=',', because comma-delimited is the default):
import pandas as pd
data = pd.read_csv('data.dat.csv', skiprows=x, nrows=maxcol)
If you really want that as a numpy array, you can do this:
data = data.values
But you can probably just leave it as a pandas DataFrame.

Exporting a list to a CSV/space separated and each sublist in its own column

I'm sure there is an easy way to do this, so here goes. I'm trying to export my lists into CSV in columns. (Basically, it's how another program will be able to use the data I've generated.) I have the group called [frames] which contains [frame001], [frame002], [frame003], etc. I would like the CSV file that's generated to have all the values for [frame001] in the first column, [frame002] in the second column, and so on. I thought if I could save the file as CSV I could manipulate it in Excel, however, I figure there is a solution that I can program to skip that step.
This is the code that I have tried using so far:
import csv
data = [frames]
out = csv.writer(open(filename,"w"), delimiter=',',quoting=csv.QUOTE_ALL)
out.writerow(data)
I have also tried:
import csv
myfile = open(..., 'wb')
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)
If there's a way to do this so that all the values are space separated, that would be ideal, but at this point I've been trying this for hours and can't get my head around the right solution.
What you're describing is that you want to translate a 2 dimensional array of data. In Python you can achieve this easily with the zip function as long as the inner lists are all the same length.
out.writerows(zip(*data))
If they are not all the same length, you can use itertools.izip_longest to fill the remaining fields with some default value (even '').

Categories