Selecting certain rows from a set of data files in Python - python

I am trying to manipulate some data with Python, but having quite a bit of difficulty (given that I'm still a rookie). I have taken some code from other questions/sites but still can't quite get what I want.
Basically what I need is to take a set of data files and select the data from 1 particular row of each of those files, then put it into a new file so I can plot it.
So, to get the data into Python in the first place I'm trying to use:
data = []
path = C:/path/to/file
for files in glob.glob(os.path.join(path, ‘*.*’)):
data.append(list(numpy.loadtxt(files, skiprows=34))) #first 34 rows aren't used
This has worked great for me once before, but for some reason it won't work now. Any possible reasons why that might be the case?
Anyway, carrying on, this should give me a 2D list containing all the data.
Next I want to select a certain row from each data set, and can do so using:
x = list(xrange(30)) #since there are 30 files
Then:
rowdata = list(data[i][some particular row] for i in x)
Which gives me a list containing the value for that particular row from each imported file. This part seems to work quite nicely.
Lastly, I want to write this to a file. I have been trying:
f = open('path/to/file', 'w')
for item in rowdata:
f.write(item)
f.close()
But I keep getting an error. Is there another method of approach here?

You are already using numpy to load the text, you can use it to manipulate it as well.
import numpy as np
path = 'C:/path/to/file'
mydata = np.array([np.loadtxt(f) for f in glob.glob(os.path.join(path, '*.*'))])
This will load all your data into one 3d array:
mydata.ndim
#3
where the first dimension (axis) runs over the files, the second over rows, the third over columns:
mydata.shape
#(number of files, number of rows in each file, number of columns in each file)
So, you can access the first file by
mydata[0,...] # equivalent to: mydata[0,:,:]
or specific parts of all files:
mydata[0,34,:] #the 35th row of the first file by
mydata[:,34,:] #the 35th row in all files
mydata[:,34,1] #the second value in the 34th row in all files
To write to file:
Say you want to write a new file with just the 35th row from all files:
np.savetxt(os.join(path,'outfile.txt'), mydata[:,34,:])

If you just have to read from a file and write to a file you can use open().
For a better solution, you can use linecache

Related

Unsuccessful Importing .txt file containing 11 rows info and then real data starts

I am trying to import so many text files (probably around 100) as you see in the pic. Basically, I tried to use read_csv to import them but it did not work. So, files are kinda in a little bit complex form. I need to separate them in a proper way. The real data (31 columns including time in the first column) that I am going to use starts at 12th row. However, I'd like to keep those first 11 rows as well such that i.e. I can match the Measurements labels with each column in the future. Lastly, I am gonna need to write a for loop to import 100 txt files and read every 31 columns and first 11 info rows in those.
DATA VIEW
I tried read.csv by doing a lot of things even including skiprows, however it did not work out.Then, I also implemented the following code but not perfectly it gave me what I wanted to have
one of the things I've tried is
with open('zzzAD1.TXT', 'r') as the_file:
all_data = [line.split() for line in the_file.readlines()]
height_line = all_data[:11]
data = all_data[11:]
So, could anyone help me please?
If you're trying to get this into pandas, the only problem is that you need to convert the strings to floats, and you'll have to figure out what column headings to use, but you're basically on the right track here.
with open('zzzAD1.TXT', 'r') as the_file:
all_data = [line.split() for line in the_file.readlines()]
height_line = all_data[:11]
data = all_data[11:]
data = [ [float(x) for x in row] for row in data]
df = pandas.DataFrame(data)

I want to combine csv's, dropping rows while only keeping certain columns

This is the code I have so far:
import pandas as pd
import glob, os
os.chdir("L:/FMData/")
results = pd.DataFrame([])
for counter, file in enumerate(glob.glob("F5331_FM001**")):
namedf = pd.read_csv(file, skiprows=[1,2,3,4,5,6,7], index_col=[1], usecols=[1,2])
results = results.append(namedf)
results.to_csv('L:/FMData/FM001_D/FM5331_FM001_D.csv')
This however is producing a new document as instructed but isn't copying any data into it. I'm wanting to look up files in a certain location, with names along the lines of FM001, combine them, skip the first 7 rows in each csv, and only keep columns 1 and 2 in the new file. Can anyone help with my code?
Thanks in advance!!!
To combine multiple csv files, you should create a list of dataframes. Then combine the dataframes within your list via pd.concat in a single step. This is much more efficient than appending to an existing dataframe.
In addition, you need to write your result to a file outside your for loop.
For example:
results = []
for counter, file in enumerate(glob.glob("F5331_FM001**")):
namedf = pd.read_csv(file, skiprows=[1,2,3,4,5,6,7], index_col=[1], usecols=[1,2])
results = results.append(namedf)
df = pd.concat(results, axis=0)
df.to_csv('L:/FMData/FM001_D/FM5331_FM001_D.csv')
This code works on my side (using Linux and Python 3), it populates a csv file with data in.
Add a print just after the read_csv to see if your csv file actually reads any data, else nothing will be written, like this:
namedf = pd.read_csv(file)
print(namedf)
results = results.append(namedf)
It adds row 1 (probably becuase it is considered the header) and then skips 7 rows then continue, this is my result for csv file just written from one to eleven out in rows:
F5331_FM001.csv
one
0 nine
1 ten
2 eleven
Addition:
If print(namedf) shows nothing, then check your input files.
The python program is looking in L:/FMData/ for your files. Are you sure your files are located in that directory? You can change the directory by adding the correct path with the os.chdir command.

How to loop through multiple csv files and output their contents into one array?

I working in python and trying to take x, y, z coordinates from multiple LAZ files and put them into one array that can be used for another analysis. I am trying to automate this task as I have about 2000 files to turn into one or even 10 arrays.The example involves two files but I can't get the loop to work properly. I think I am not correctly naming my variables. below is an example code I have been trying to write (note that I am extremely new to programming so apologize if this is a horrible code).
Create list of las files, then turn them into an array--attempt at better automation
import numpy as np
from laspy.file import File
import glob as glob
# create list of vegetation files to be opened
VegList = sorted(glob.glob('/Users/sophiathompson/Desktop/copys/Clips/*.las'))
for f in VegList:
print(f)
Veg = File(filename = f, mode = "r") # Open the file
points = Veg.get_points() # Grab all of the points from the file.
print points #this is a check that the number of rows changes at the end
print ("array shape:")
print points.shape
VegListCoords = np.vstack((Veg.x, Veg.y, Veg.z)).transpose()
print VegListCoords
This block reads both files but fills VegListCoords with the results of the second file in the file list. I need it to hold the records from both. if this is a horrible way to go about it, I am very open to a new way.
You keep overwriting VegListCoords by assigning the values in your last opened file
instead, initialize at the beginning :
VegListCoords = []
and do instead :
VegListCoords.append(np.vstack((Veg.x, Veg.y, Veg.z)).transpose())
If you want them in one numpy array at the end, use np.concatenate

Extract designated data from one csv file then assign to another csv file using python

I got a csv file containing data in this form,
I want to extract data from column C and write them into a new csv file, like this,
So I need to do 2 things:
write 'node' and number from 1 to 22 into the first row and column (since in this case, there are 22 in one repeated cycle in the column A in input csv)
I have got data in column c extracted and write in output csv, like this,
I need to transpose those data every 22 rows one time and fill them in row starts from B2 position in excel, then B3, B4,...etc.
It's clear that I must loop through every row to do this efficiently, but I don't know how to apply the csv module in python.
Should I download the xlrd package, or can I handle this only use the built-in csv module?
I am working with python 2.7.6 and pyscripter under Windows 8.1 x64. Feel free to give me any suggestion, thanks a lot!
Read the csv python documentation.
The simple way to iterate through rows with csv reader:
import csv
X = []
spamreader = csv.reader('path_to_file/filename.csv',delimiter=',')
for row in spamreader:
X.append(row)
This creates a variable with all the csv data. The structure of your file will make it difficult to read because the cell_separator is ',' but there are also multiple commas within each cell and because of the parentheses there will be a mixture of string and numerical data that will require some cleaning. If you have access to reformatting the csv it might be easier if each cell looked like 1,2,0.01 instead of (1,2,0.01), also consider using a different delimiter between cells such as ';'.
If not expect some tedious data cleaning, and definitely read through the documentation linked above.
Edit: Try the following
import csv
X = []
with open('path_to_file/filename.csv','rb') as csvfile:
spamreader = csv.reader(csvfile,delimiter=',')
for row in spamreader:
rowTemp = []
for i in range(len(row)):
if (i+1)%3==0: #gets every third cell
rowTemp.append(row[i])
X.append(rowTemp)
This is a matrix of all the distance values. Then try:
with open('path_to_output_file/output_file.csv','wb') as csvfile:
spamwriter = csv.writer(csvfile,delimter=',')
for sublist in X:
spamwriter.writerow(sublist)
Not sure if this is exactly what you're looking for but it should be close. It ouputs a csv file that is stripped of all the node pairs

Exporting a list to a CSV/space separated and each sublist in its own column

I'm sure there is an easy way to do this, so here goes. I'm trying to export my lists into CSV in columns. (Basically, it's how another program will be able to use the data I've generated.) I have the group called [frames] which contains [frame001], [frame002], [frame003], etc. I would like the CSV file that's generated to have all the values for [frame001] in the first column, [frame002] in the second column, and so on. I thought if I could save the file as CSV I could manipulate it in Excel, however, I figure there is a solution that I can program to skip that step.
This is the code that I have tried using so far:
import csv
data = [frames]
out = csv.writer(open(filename,"w"), delimiter=',',quoting=csv.QUOTE_ALL)
out.writerow(data)
I have also tried:
import csv
myfile = open(..., 'wb')
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)
If there's a way to do this so that all the values are space separated, that would be ideal, but at this point I've been trying this for hours and can't get my head around the right solution.
What you're describing is that you want to translate a 2 dimensional array of data. In Python you can achieve this easily with the zip function as long as the inner lists are all the same length.
out.writerows(zip(*data))
If they are not all the same length, you can use itertools.izip_longest to fill the remaining fields with some default value (even '').

Categories