Transposing data from automatically created lists

Transposing data from automatically created lists - python

I am trying to transpose data from lists that are automatically created for each text file. Each text file gets its own version of the listsR list. I then place the lists into another list, listlist so that I can manage a list of lists. I know how to do this using declared lists but this code needs the flexibility to utilize any number of text files, transpose the lists and take the average of each index from among all the lists. This is hopefully being used to create baselines from the text files.
import os
import csv
import numpy as np
os.chdir('////Users////thomaswolff////Desktop////baseline2')
def listNew():
listlist = []
for data in os.listdir(os.getcwd()):
if data.endswith('.TXT') and 'baseline' in data:
with open(data,'rU') as file:
listsR = [[] for i in xrange(0)]
for row in csv.DictReader(file):
listsR.append(float(row[' IRI R e']))
listlist.append(listsR)
This works fine for placing the data into lists of lists but i need to transpose the data according to the index. So listlist[0:][0] would be the first index from each list within listlist. Text files of the data I'm using can be found here at Github using the baseline.txt files:
As you can see from this code I've written in the past, I know how to do this using declared lists, but this is different.

To transpose listlist you can use zip()
list_list_transpose = zip(*listlist)

Related

Rebase csv file by merging a base csv file with another new csv file

I am currently working with two csv files, base.csv and another csv file, output_20170503.csv which will be produced everyday, so my aim here is to rebase every output so that they have the same data as the base.csv.
My base.csv:
ID,Name,Number,Shape,Sound
1,John,45,Round,Meow
2,Jimmy,78,Sphere,Woof
3,Marc,,Triangle,Quack
4,Yun,50,Triangle,Meow
5,Nyancat,,Round,Quack
My output_20170503.csv
ID,Name,Number,Shape,Sound
1,John,,Round,Meow
2,Jimmy,,Sphere,Woof
3,Marc,,Triangle,Quack
4,Yun,,Triangle,
5,Nyancat,,Round,Quack
6,Marc,,Square,Woof
7,Jonnn,,Hexagon,Chirp
The objective here is to rebase the data (ID from 1-5) from base.csv with the output_20170503.csv
What I want to achieve:
ID,Name,Number,Shape,Sound
1,John,45,Round,Meow
2,Jimmy,78,Sphere,Woof
3,Marc,,Triangle,Quack
4,Yun,50,Triangle,Meow
5,Nyancat,,Round,Quack
6,Marc,,Square,Woof
7,Jonnn,,Hexagon,Chirp
I already searched for the solution but what I got;
Merge two csv files (both of csv files have different columns, won't work for me)
Remove duplicates from a csv files (Appending base.csv with the output_20170503.csv and then remove the duplicates, won't work because they have different values for column Number)
Any help would be appreciated, thank you.

You can try this, I use first two item as key and generate a dict and then iterate the new dict update the base dict if the key not in base:
new = {"".join(i.split(',')[:2]): i[:-1].split(',') for i in open('output_20170503.csv')}
base = {"".join(i.split(',')[:2]): i[:-1].split(',') for i in open('base.csv')}
base.update({i: new[i] for i in new if i not in base})
f=open("out.csv","w")
for i in sorted(base.values(), key=lambda x: x[0]):
if i[0]!="ID":
f.write(",".join(i)+"\n")
Output:
1,John,45,Round,Meow
2,Jimmy,78,Sphere,Woof
3,Marc,,Triangle,Quack
4,Yun,50,Triangle,Meow
5,Nyancat,,Round,Quac
6,Marc,,Square,Woof
7,Jonnn,,Hexagon,Chir
Python2.7+ supports the syntactical extension called the "dictionary comprehension" or "dict comprehension", so if you're using Python2.6, you need to replace the first three lines with:
new = dict(("".join(i.split(',')[:2]),i[:-1].split(',')) for i in open('output_20170503.csv'))
base = dict(("".join(i.split(',')[:2]),i[:-1].split(',')) for i in open('base.csv'))
base.update(dict((i,new[i]) for i in new if i not in base))

You should try to use pandas library which is excellent for data manipulation. you can read easily csv files and do merge operation. Your solution might look like the following :
import pandas as pd
base_df = pd.read_csv('base.csv')
output_df = pd.read_csv('My output_20170503.csv')
output_df.update(base_df)
output_df.write_csv('My output_20170503.csv')
The missing values on output_df has now been updated with the one from base_df.

How to loop through multiple csv files and output their contents into one array?

I working in python and trying to take x, y, z coordinates from multiple LAZ files and put them into one array that can be used for another analysis. I am trying to automate this task as I have about 2000 files to turn into one or even 10 arrays.The example involves two files but I can't get the loop to work properly. I think I am not correctly naming my variables. below is an example code I have been trying to write (note that I am extremely new to programming so apologize if this is a horrible code).
Create list of las files, then turn them into an array--attempt at better automation
import numpy as np
from laspy.file import File
import glob as glob
# create list of vegetation files to be opened
VegList = sorted(glob.glob('/Users/sophiathompson/Desktop/copys/Clips/*.las'))
for f in VegList:
print(f)
Veg = File(filename = f, mode = "r") # Open the file
points = Veg.get_points() # Grab all of the points from the file.
print points #this is a check that the number of rows changes at the end
print ("array shape:")
print points.shape
VegListCoords = np.vstack((Veg.x, Veg.y, Veg.z)).transpose()
print VegListCoords
This block reads both files but fills VegListCoords with the results of the second file in the file list. I need it to hold the records from both. if this is a horrible way to go about it, I am very open to a new way.

You keep overwriting VegListCoords by assigning the values in your last opened file
instead, initialize at the beginning :
VegListCoords = []
and do instead :
VegListCoords.append(np.vstack((Veg.x, Veg.y, Veg.z)).transpose())
If you want them in one numpy array at the end, use np.concatenate

Is it possible to slice keys in h5py using python 3, without forming a list?

I am using h5py to read in data from an HDF5 file, and have found that code which worked using Python 2 does not work using Python 3. The file is formatted such that 2D frames of data are present as distinct datasets, which I want to read into a 3D array. The file structure looks like this:
file.h5
|- groupname (group)
|- frame1 (dataset)
|- frame2 (dataset)
...
To read the frames into a 3D array, I have to access the first dataset to get its shape and type information. Because I don't actually know the exact name for each frame, the code I had been using to access the first frame looked like this:
import h5py
fid = h5py.File('file.h5', 'r')
datagroup = fid['groupname']
dataset0 = datagroup[datagroup.keys()[0]]
However, the documentation for h5py says
"When using h5py from Python 3, the keys(), values() and items() methods will return view-like objects instead of lists."
The view objects support iteration but not slicing. So to avoid an error I had to change that line to the following:
dataset0 = datagroup[ [k for k in datagroup.keys()][0] ]
which artificially constructs a temporary list and then grabs its first element. To me this looks awful. Is there a better way to do this?

Group supports returning an iterator but is not directly iterable. That leads to the following:
dataset0 = datagroup[next(iter(datagroup))]

Selecting certain rows from a set of data files in Python

I am trying to manipulate some data with Python, but having quite a bit of difficulty (given that I'm still a rookie). I have taken some code from other questions/sites but still can't quite get what I want.
Basically what I need is to take a set of data files and select the data from 1 particular row of each of those files, then put it into a new file so I can plot it.
So, to get the data into Python in the first place I'm trying to use:
data = []
path = C:/path/to/file
for files in glob.glob(os.path.join(path, ‘*.*’)):
data.append(list(numpy.loadtxt(files, skiprows=34))) #first 34 rows aren't used
This has worked great for me once before, but for some reason it won't work now. Any possible reasons why that might be the case?
Anyway, carrying on, this should give me a 2D list containing all the data.
Next I want to select a certain row from each data set, and can do so using:
x = list(xrange(30)) #since there are 30 files
Then:
rowdata = list(data[i][some particular row] for i in x)
Which gives me a list containing the value for that particular row from each imported file. This part seems to work quite nicely.
Lastly, I want to write this to a file. I have been trying:
f = open('path/to/file', 'w')
for item in rowdata:
f.write(item)
f.close()
But I keep getting an error. Is there another method of approach here?

You are already using numpy to load the text, you can use it to manipulate it as well.
import numpy as np
path = 'C:/path/to/file'
mydata = np.array([np.loadtxt(f) for f in glob.glob(os.path.join(path, '*.*'))])
This will load all your data into one 3d array:
mydata.ndim
#3
where the first dimension (axis) runs over the files, the second over rows, the third over columns:
mydata.shape
#(number of files, number of rows in each file, number of columns in each file)
So, you can access the first file by
mydata[0,...] # equivalent to: mydata[0,:,:]
or specific parts of all files:
mydata[0,34,:] #the 35th row of the first file by
mydata[:,34,:] #the 35th row in all files
mydata[:,34,1] #the second value in the 34th row in all files
To write to file:
Say you want to write a new file with just the 35th row from all files:
np.savetxt(os.join(path,'outfile.txt'), mydata[:,34,:])

If you just have to read from a file and write to a file you can use open().
For a better solution, you can use linecache

Exporting a list to a CSV/space separated and each sublist in its own column

I'm sure there is an easy way to do this, so here goes. I'm trying to export my lists into CSV in columns. (Basically, it's how another program will be able to use the data I've generated.) I have the group called [frames] which contains [frame001], [frame002], [frame003], etc. I would like the CSV file that's generated to have all the values for [frame001] in the first column, [frame002] in the second column, and so on. I thought if I could save the file as CSV I could manipulate it in Excel, however, I figure there is a solution that I can program to skip that step.
This is the code that I have tried using so far:
import csv
data = [frames]
out = csv.writer(open(filename,"w"), delimiter=',',quoting=csv.QUOTE_ALL)
out.writerow(data)
I have also tried:
import csv
myfile = open(..., 'wb')
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)
If there's a way to do this so that all the values are space separated, that would be ideal, but at this point I've been trying this for hours and can't get my head around the right solution.

What you're describing is that you want to translate a 2 dimensional array of data. In Python you can achieve this easily with the zip function as long as the inner lists are all the same length.
out.writerows(zip(*data))
If they are not all the same length, you can use itertools.izip_longest to fill the remaining fields with some default value (even '').

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Transposing data from automatically created lists - python

To transpose listlist you can use zip() list_list_transpose = zip(*listlist)

Related

Rebase csv file by merging a base csv file with another new csv file

How to loop through multiple csv files and output their contents into one array?

Is it possible to slice keys in h5py using python 3, without forming a list?

Selecting certain rows from a set of data files in Python

Exporting a list to a CSV/space separated and each sublist in its own column

Categories

Resources