I am currently working with two csv files, base.csv and another csv file, output_20170503.csv which will be produced everyday, so my aim here is to rebase every output so that they have the same data as the base.csv.
My base.csv:
ID,Name,Number,Shape,Sound
1,John,45,Round,Meow
2,Jimmy,78,Sphere,Woof
3,Marc,,Triangle,Quack
4,Yun,50,Triangle,Meow
5,Nyancat,,Round,Quack
My output_20170503.csv
ID,Name,Number,Shape,Sound
1,John,,Round,Meow
2,Jimmy,,Sphere,Woof
3,Marc,,Triangle,Quack
4,Yun,,Triangle,
5,Nyancat,,Round,Quack
6,Marc,,Square,Woof
7,Jonnn,,Hexagon,Chirp
The objective here is to rebase the data (ID from 1-5) from base.csv with the output_20170503.csv
What I want to achieve:
ID,Name,Number,Shape,Sound
1,John,45,Round,Meow
2,Jimmy,78,Sphere,Woof
3,Marc,,Triangle,Quack
4,Yun,50,Triangle,Meow
5,Nyancat,,Round,Quack
6,Marc,,Square,Woof
7,Jonnn,,Hexagon,Chirp
I already searched for the solution but what I got;
Merge two csv files (both of csv files have different columns, won't work for me)
Remove duplicates from a csv files (Appending base.csv with the output_20170503.csv and then remove the duplicates, won't work because they have different values for column Number)
Any help would be appreciated, thank you.
You can try this, I use first two item as key and generate a dict and then iterate the new dict update the base dict if the key not in base:
new = {"".join(i.split(',')[:2]): i[:-1].split(',') for i in open('output_20170503.csv')}
base = {"".join(i.split(',')[:2]): i[:-1].split(',') for i in open('base.csv')}
base.update({i: new[i] for i in new if i not in base})
f=open("out.csv","w")
for i in sorted(base.values(), key=lambda x: x[0]):
if i[0]!="ID":
f.write(",".join(i)+"\n")
Output:
1,John,45,Round,Meow
2,Jimmy,78,Sphere,Woof
3,Marc,,Triangle,Quack
4,Yun,50,Triangle,Meow
5,Nyancat,,Round,Quac
6,Marc,,Square,Woof
7,Jonnn,,Hexagon,Chir
Python2.7+ supports the syntactical extension called the "dictionary comprehension" or "dict comprehension", so if you're using Python2.6, you need to replace the first three lines with:
new = dict(("".join(i.split(',')[:2]),i[:-1].split(',')) for i in open('output_20170503.csv'))
base = dict(("".join(i.split(',')[:2]),i[:-1].split(',')) for i in open('base.csv'))
base.update(dict((i,new[i]) for i in new if i not in base))
You should try to use pandas library which is excellent for data manipulation. you can read easily csv files and do merge operation. Your solution might look like the following :
import pandas as pd
base_df = pd.read_csv('base.csv')
output_df = pd.read_csv('My output_20170503.csv')
output_df.update(base_df)
output_df.write_csv('My output_20170503.csv')
The missing values on output_df has now been updated with the one from base_df.
I working in python and trying to take x, y, z coordinates from multiple LAZ files and put them into one array that can be used for another analysis. I am trying to automate this task as I have about 2000 files to turn into one or even 10 arrays.The example involves two files but I can't get the loop to work properly. I think I am not correctly naming my variables. below is an example code I have been trying to write (note that I am extremely new to programming so apologize if this is a horrible code).
Create list of las files, then turn them into an array--attempt at better automation
import numpy as np
from laspy.file import File
import glob as glob
# create list of vegetation files to be opened
VegList = sorted(glob.glob('/Users/sophiathompson/Desktop/copys/Clips/*.las'))
for f in VegList:
print(f)
Veg = File(filename = f, mode = "r") # Open the file
points = Veg.get_points() # Grab all of the points from the file.
print points #this is a check that the number of rows changes at the end
print ("array shape:")
print points.shape
VegListCoords = np.vstack((Veg.x, Veg.y, Veg.z)).transpose()
print VegListCoords
This block reads both files but fills VegListCoords with the results of the second file in the file list. I need it to hold the records from both. if this is a horrible way to go about it, I am very open to a new way.
You keep overwriting VegListCoords by assigning the values in your last opened file
instead, initialize at the beginning :
VegListCoords = []
and do instead :
VegListCoords.append(np.vstack((Veg.x, Veg.y, Veg.z)).transpose())
If you want them in one numpy array at the end, use np.concatenate
I am using h5py to read in data from an HDF5 file, and have found that code which worked using Python 2 does not work using Python 3. The file is formatted such that 2D frames of data are present as distinct datasets, which I want to read into a 3D array. The file structure looks like this:
file.h5
|- groupname (group)
|- frame1 (dataset)
|- frame2 (dataset)
...
To read the frames into a 3D array, I have to access the first dataset to get its shape and type information. Because I don't actually know the exact name for each frame, the code I had been using to access the first frame looked like this:
import h5py
fid = h5py.File('file.h5', 'r')
datagroup = fid['groupname']
dataset0 = datagroup[datagroup.keys()[0]]
However, the documentation for h5py says
"When using h5py from Python 3, the keys(), values() and items() methods will return view-like objects instead of lists."
The view objects support iteration but not slicing. So to avoid an error I had to change that line to the following:
dataset0 = datagroup[ [k for k in datagroup.keys()][0] ]
which artificially constructs a temporary list and then grabs its first element. To me this looks awful. Is there a better way to do this?
Group supports returning an iterator but is not directly iterable. That leads to the following:
dataset0 = datagroup[next(iter(datagroup))]
I am trying to manipulate some data with Python, but having quite a bit of difficulty (given that I'm still a rookie). I have taken some code from other questions/sites but still can't quite get what I want.
Basically what I need is to take a set of data files and select the data from 1 particular row of each of those files, then put it into a new file so I can plot it.
So, to get the data into Python in the first place I'm trying to use:
data = []
path = C:/path/to/file
for files in glob.glob(os.path.join(path, ‘*.*’)):
data.append(list(numpy.loadtxt(files, skiprows=34))) #first 34 rows aren't used
This has worked great for me once before, but for some reason it won't work now. Any possible reasons why that might be the case?
Anyway, carrying on, this should give me a 2D list containing all the data.
Next I want to select a certain row from each data set, and can do so using:
x = list(xrange(30)) #since there are 30 files
Then:
rowdata = list(data[i][some particular row] for i in x)
Which gives me a list containing the value for that particular row from each imported file. This part seems to work quite nicely.
Lastly, I want to write this to a file. I have been trying:
f = open('path/to/file', 'w')
for item in rowdata:
f.write(item)
f.close()
But I keep getting an error. Is there another method of approach here?
You are already using numpy to load the text, you can use it to manipulate it as well.
import numpy as np
path = 'C:/path/to/file'
mydata = np.array([np.loadtxt(f) for f in glob.glob(os.path.join(path, '*.*'))])
This will load all your data into one 3d array:
mydata.ndim
#3
where the first dimension (axis) runs over the files, the second over rows, the third over columns:
mydata.shape
#(number of files, number of rows in each file, number of columns in each file)
So, you can access the first file by
mydata[0,...] # equivalent to: mydata[0,:,:]
or specific parts of all files:
mydata[0,34,:] #the 35th row of the first file by
mydata[:,34,:] #the 35th row in all files
mydata[:,34,1] #the second value in the 34th row in all files
To write to file:
Say you want to write a new file with just the 35th row from all files:
np.savetxt(os.join(path,'outfile.txt'), mydata[:,34,:])
If you just have to read from a file and write to a file you can use open().
For a better solution, you can use linecache
I'm sure there is an easy way to do this, so here goes. I'm trying to export my lists into CSV in columns. (Basically, it's how another program will be able to use the data I've generated.) I have the group called [frames] which contains [frame001], [frame002], [frame003], etc. I would like the CSV file that's generated to have all the values for [frame001] in the first column, [frame002] in the second column, and so on. I thought if I could save the file as CSV I could manipulate it in Excel, however, I figure there is a solution that I can program to skip that step.
This is the code that I have tried using so far:
import csv
data = [frames]
out = csv.writer(open(filename,"w"), delimiter=',',quoting=csv.QUOTE_ALL)
out.writerow(data)
I have also tried:
import csv
myfile = open(..., 'wb')
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(mylist)
If there's a way to do this so that all the values are space separated, that would be ideal, but at this point I've been trying this for hours and can't get my head around the right solution.
What you're describing is that you want to translate a 2 dimensional array of data. In Python you can achieve this easily with the zip function as long as the inner lists are all the same length.
out.writerows(zip(*data))
If they are not all the same length, you can use itertools.izip_longest to fill the remaining fields with some default value (even '').