Coordinates output to csv file python - python

I am currently working on a project where i need to collect coordinates and transfer that to a csv file. I am using the k-means algorithm to find the coordinates (the centroids of a larger coordinate collection). The output is a list with coordinates. At first I wanted to simply copy it to an excel file, but that did not work as well as i wanted it to be.
This is my code:
df = pd.read_excel("centroid coordinaten excel lijst.xlsx")
df.head(n=16)
plt.scatter(df.X,df.Y)
km = KMeans(n_clusters=200)
print(km)
y_predict = km.fit_predict(df[['X','Y']])
print(y_predict)
df['cluster'] = y_predict
kmc = km.cluster_centers_
print(kmc)
#The output kmc is the list with coordinates and it looks like this:
[[ 4963621.73063468 52320928.30284858]
[ 4981357.33667335 52293627.08917835]
[ 4974134.37538941 52313274.21495327]
[ 4945992.84398977 52304446.43606138]
[ 4986701.53977273 52317701.43831169]
[ 4993362.9143898 52296985.49271403]
[ 4949408.06109325 52320541.97963558]
[ 4966756.82872596 52301871.5655048 ]
[ 4980845.77591313 52324669.94175716]
[ 4970904.14472671 52292401.47190146]]
Is there anybode who knows how to convert the 'kmc' output into a csv file.
Thanks in advance!

You could use the csv library as follows:
import csv
kmc = [
[4963621.73063468,52320928.30284858],
[4981357.33667335,52293627.08917835],
[4974134.37538941,52313274.21495327],
[4945992.84398977,52304446.43606138],
[4986701.53977273,52317701.43831169],
[4993362.9143898,52296985.49271403],
[4949408.06109325,52320541.97963558],
[4966756.82872596,52301871.5655048],
[4980845.77591313,52324669.94175716],
[4970904.14472671,52292401.47190146],
]
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(['lat', 'long'])
csv_output.writerows(kmc)
Giving you output.csv containing:
lat,long
4963621.73063468,52320928.30284858
4981357.33667335,52293627.08917835
4974134.37538941,52313274.21495327
4945992.84398977,52304446.43606138
4986701.53977273,52317701.43831169
4993362.9143898,52296985.49271403
4949408.06109325,52320541.97963558
4966756.82872596,52301871.5655048
4980845.77591313,52324669.94175716
4970904.14472671,52292401.47190146
I suggest you put a full path to your output file to ensure you have write permission. Or as suggested, use sudo. Alternatively, you could add the following to the top:
import os
os.chdir(os.path.dirname(os.path.abspath(__file__)))
This ensures the output will be in the same folder as your script.

Related

How to show all data (no ellipses) when converting NETCDF to CSV?

I have a trajectory file from a molecular simulation that is written in netCDF format. I would like to convert this file to .csv format so that I can apply further Python-based analysis of the proximity between molecules. The trajectory file contains information corresponding to 3D Cartesian coordinates for all 6500 atoms of my simulation for each time step.
I have used the below script to convert this netCDF file to a .csv file using the netCDF4 and Pandas modules. My code is given below.
import netCDF4
import pandas as pd
fp='TEST4_simulate1.traj'
dataset = netCDF4.Dataset(fp, mode='r')
cols = list(dataset.variables.keys())
list_dataset = []
for c in cols:
list_dataset.append(list(dataset.variables[c][:]))
#print(list_dataset)
df_dataset = pd.DataFrame(list_dataset)
df_dataset = df_dataset.T
df_dataset.columns = cols
df_dataset.to_csv("file_path.csv", index = False)
A small selection of the output .csv file is given below. Notice that a set of ellipses are given between the first and last set of 3 atomic coordinates.
time,spatial,coordinates
12.0,b'x',"[[ 33.332325 -147.24976 -107.131 ]
[ 34.240444 -147.80115 -107.4043 ]
[ 33.640083 -146.47362 -106.41945 ]
...
[ 70.31757 -16.499006 -186.13313 ]
[ 98.310844 65.95696 76.43664 ]
[ 84.08772 52.676186 145.48856 ]]"
How can I modify this code so that the entirety of my atomic coordinates are written to my .csv file?

Finding the number of rows for all files within a folder

Hello I am trying to find the number of rows for all files within a folder. I am trying to do this for a folder that contains only ".txt" files and for a folder that contains ."csv" files.
I know that the way to get the number of rows for a SINGLE ".txt" file is something like this:
file = open("sample.txt","r")
Counter = 0
Content = file.read()
CoList = Content.split("\n")
for i in CoList:
if i:
Counter += 1
print("This is the number of lines in the file")
print(Counter)
Whereas for a SINGLE ".csv" file is something like this:
file = open("sample.csv")
reader = csv.reader(file)
lines= len(list(reader))
print(lines)
But how can I do this for ALL files within a folder? That is, how can I loop each of these procedures across all files within a folder and, ideally, export the output into an excel sheet with columns akin to these:
Filename Number of Rows
1.txt 900
2.txt 653
and so on and so on.
Thank you so much for your help.
You can use glob to detect the files and then just iterate over them.
Other methods : How do I list all files of a directory?
import glob
# 1. list all text files in the directory
rel_filepaths = glob.glob("*.txt")
# 2. (optional) create a function to read the number of rows in a file
def count_rows(filepath):
res = 0
f = open(filepath, 'r')
res = len(f.readlines())
f.close()
return res
# 3. iterate over your files and use the count_row function
counts = [count_rows(filepath) for filepath in rel_filepaths]
print(counts)
Then, if you want to export this result in a .csv or .xslx file, I recommend using pandas.
import pandas as pd
# 1. create a new table and add your two columns filled with the previous values
df = pd.DataFrame()
df["Filename"] = rel_filepaths
df["Number of rows"] = counts
# 2. export this dataframe to `.csv`
df.to_csv("results.csv")
You can also use pandas.ExcelWriter() if you want to use the .xlsx format. Link to documentation & examples : Pandas - ExcelWriter doc

Convert CSV File into Python Dictionary, Array and Binary File

I have a CSV file of tab-separated data with headers and data of different types which I would like to convert into a dictionary of vectors. Eventually I would like to convert the dictionary into numpy arrays, and store them in some binary format for fast retrieval by different scripts. This is a large file with approximately 700k records and 16 columns. The following is a sample:
"answer_option" "value" "fcast_date" "expertise"
"a" 0.8 "2013-07-08" 3
"b" 0.2 "2013-07-08" 3
I have started implementing this with the DictReader class, which I'm just learning about.
import csv
with open( "filename.tab", 'r') as records:
reader = csv.DictReader( records, dialect='excel-tab' )
row = list( reader )
n = len( row )
d = {}
keys = list( row[0] )
for key in keys :
a = []
for i in range(n):
a.append( row[i][key] )
d [key] = a
which gives the result
{'answer_option': ['a', 'b'],
'value': ['0.8', '0.2'],
'fcast_date': ['2013-07-08', '2013-07-08'],
'expertise': ['3', '3']}
Besides the small nuisance of having to clean from the numerical values the quotation characters that are enclosing them, I thought that perhaps there is something ready made. I'm also wondering if there is anything that extracts directly from the file into numpy vectors, since I do not need to necessarily transform my data in dictionaries.
I took a look at SciPy.org and a search of CSV also refers to HDF5 and genfromtxt, but I haven't dived into those suggestions yet. Ideally I would like to be able to store the data in a fast-to-load format, so that it would be simple to load from other scripts with only one command, where all vectors are made available the same way it is possible in Matlab/Octave. Suggestions are appreciated
EDIT: the data are tab separated with strings enclosed by quotation marks.
This will read the csv into a Pandas data frame and remove the quotes:
import pandas as pd
import csv
import io
with open('data_with_quotes.csv') as f_input:
data = [next(csv.reader(io.StringIO(line.replace('"', '')))) for line in f_input]
df = pd.DataFrame(data[1:], columns=data[0])
print(df)
answer_option value fcast_date expertise
0 a 0.8 2013-07-08 3
1 b 0.2 2013-07-08 3
You can easily convert the data to a numpy array using df.values:
array([['a', '0.8', '2013-07-08', '3'],
['b', '0.2', '2013-07-08', '3']], dtype=object)
To save the data in a binary format, I recommend using Hdf5:
import h5py
with h5py.File('file.hdf5', 'w') as f:
dset = f.create_dataset('default', data=df)
To load the data, use the following:
with h5py.File('file.hdf5', 'r') as f:
data = f['default']
You can also use Pandas to save and load the data in binary format:
# Save the data
df.to_hdf('data.h5', key='df', mode='w')
# Load the data
df = pd.read_hdf('data.h5', 'df')

Copy column,add some text and write in a new csv file

I want to make a script that would copy 2nd column from multiple csv files in a folder and add some text before saving it to a single csv file .
here is what i want to do :
1.) Grab data in the 2nd column from all csv files
2.) Append text "hello" & "welcome" to each row at start and end
3.) Write the data into a single file
I tried creating it using pandas
import os
import pandas as pd
dataframes = [pd.read_csv(p, index_col=2, header=None) for p in ('1.csv','2.csv','3.csv')]
merged_dataframe = pd.concat(dataframes, axis=0)
merged_dataframe.to_csv("all.csv", index=False)
The Problem is -
In above code I am forced to mention the file names manually which is very difficult, as a solution I need to include all csv file *.csv
Need to use something like writr.writerow(("Hello"+r[1]+"welcome"))
As there are multiple csv files with many rows(around 100k) in each file so i need to speed up as well.
Here is a sample of the csv files:
"1.csv" "2.csv" "3.csv"
a,Jac b,William c,James
And here is how I would like the output to look all.csv:
Hello Jac welcome
Hello William welcome
Hello James welcome
Any solution using .merge() .append() or .concat() ??
How can I achieve this using python ?
You don't need pandas for this. Here's a really simple way of doing this with csv
import csv
import glob
with open("path/to/output", 'w') as outfile:
for fpath in glob.glob('path/to/directory/*.csv'):
with open(fpath) as infile:
for row in csv.reader(infile):
outfile.write("Hello {} welcome\n".format(row[1]))
1) If you would like to import all .csv files in a folder, you can just use
for i in [a in os.listdir() if a[-4:] == '.csv']:
#code to read in .csv file and concatenate to existing dataframe
2) To append the text and write to a file, you can map a function to each element of the dataframe's column 2 to add the text.
#existing dataframe called df
df[df.columns[1]].map(lambda x: "Hello {} welcome".format(x)).to_csv(<targetpath>)
#replace <targetpath> with your target path
See http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.to_csv.html for all the various parameters you can pass in to to_csv.
Here is a non-pandas solution using built in csv module. Not sure about speed.
import os
import csv
path_to_files = "path to files"
all_csv = os.path.join(path_to_files, "all.csv")
file_list = os.listdir(path_to_files)
names = []
for file in file_list:
if file.endswith(".csv"):
path_to_current_file = os.path.join(path_to_files, file)
with open(path_to_current_file, "r") as current_csv:
reader = csv.reader(current_csv, delimiter=',')
for row in reader:
names.append(row[1])
with open(all_csv, "w") as out_csv:
writer = csv.writer(current_csv, delimiter=',')
for name in names:
writer.writerow(["Hello {} welcome".format(name))

How do I export a two dimensional list in Python to excel?

I have a list that looks like this:
[[[u'example', u'example2'], [u'example', u'example2'], [u'example', u'example2'], [u'example', u'example2'], [u'example', u'example2']], [[5.926582278481011, 10.012500000000001, 7.133823529411763, 8.257352941176471, 7.4767647058823545]]]
I want to save this list to an Excel file in the following way:
Column 1: [example, example, ..., example]
Column 2: [example2, example2, ..., example2]
Column 3: [5.926582278481011, 10.012500000000001, ..., 7.4767647058823545]
You just need to rearrange the data a bit:
import csv
col1 = [i[0] for i in s[0]]
col2 = [i[1] for i in s[0]]
col3 = s[1][0]
with open('results.csv', 'w') as o:
writer = csv.writer(o, delimiter=',')
writer.writerows(zip(col1, col2, col3))
You can open this file and import it in Excel; create a new workbook and then click on the data tab and then insert from file.
I revised your data layout for clarity by adding spaces and newlines, though the result, in memory, is exactly the same as your input data. I assigned it to a variable to work with it.
Your data is not two dimensional: it has three dimensions. We can call the dimensions sheet, row, and column. Your specified result is one sheet of five rows and three columns. The data from the second sheet (the numbers) needs to be put into columns of the first sheet. The first for loop does that. Then, the variable sheet is assigned the values from the first of the two sheets in your data.
The sheet you are building in Python has only data, so you don't need the Excel package to create an XLS file. It's easier to create a CSV file if there is only data.
Run the program.
To open the resulting file in Excel, double-click on the export.csv file created by this code. Excel 2013 (and presumably later) have no import on the File menu, so open the file separately, select it, copy, and paste into the sheet where you want it to go.
import csv
yourdata = [
[
[u'example', u'example2'],
[u'example', u'example2'],
[u'example', u'example2'],
[u'example', u'example2'],
[u'example', u'example2']
],
[
[ 5.926582278481011,
10.012500000000001,
7.133823529411763,
8.257352941176471,
7.4767647058823545
]
]
]
for i in range(len(yourdata[0])):
yourdata[0][i].append(yourdata[1][0][i])
sheet0 = yourdata[0]
newFile = open('export.csv','w',newline='')
newWriter = csv.writer(newFile, dialect='excel')
for i in range(len(sheet0)):
newWriter.writerow(sheet0[i])
newFile.close()
Please use below link to explore different ways:
http://www.python-excel.org/
xlwt is one of the ways:
http://xlwt.readthedocs.io/en/latest/
https://yuji.wordpress.com/2012/04/19/python-xlwt-writing-excel-files/
If you want to use xlwt then below is the code:
import xlwt
workbook = xlwt.Workbook()
sheet = workbook.add_sheet("Sheet")
for i in range(len(rows)):
for j in range(len(rows[i])):
sheet.write(i, j, rows[i][j])
workbook.save("test.xls")
You have to install xlwt first if you want to use above code. For more information please refer xlwt documentation.

Categories