How do I prevent Python from automatically writing objects into csv as a different format than originally? For example, I have list object such as the following:
row = ['APR16', '100.00000']
I want to write this row as is, however when I use writerow function of csv writer, it writes into the csv file as 16-Apr and just 10. I want to keep the original formatting.
EDIT:
Here is the code:
import pandas as pd
dates = ['APR16', 'MAY16', 'JUN16']
numbers = [100.00000, 200.00000, 300.00000]
for i in range(3):
row = []
row.append(dates[i])
row.append(numbers[i])
prow = pd.DataFrame(row)
prow.to_csv('test.csv', index=False, header=False)
And result:
Using pandas:
import pandas as pd
dates = ['APR16', 'MAY16', 'JUN16']
numbers = [100.00000, 200.00000, 300.00000]
data = zip(dates,numbers)
fd = pd.DataFrame(data)
fd.to_csv('test.csv', index=False, header=False) # csv-file
fd.to_excel("test.xls", header=False,index=False) # or xls-file
Result in my terminal:
➜ ~ cat test.csv
APR16
100.00000
Result in LibreOffice:
Related
I am writing a small program to concatenate a load of measurements from multiple csv files. into one excel file. I have pretty much all the program written and working, the only thing i'm struggling to do is to get the data from the csv files to automatically turn into numbers when the dataframe places them into the excel file.
The code I have looks like this:
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
import os
import csv
import glob
os.chdir(r"directoryname")
retval = os.getcwd()
print ("Directory changed to %s" % retval)
files = glob.glob(r"directoryname\datafiles*csv")
print(files)
files.sort(key=lambda x: os.path.getmtime(x))
writer = pd.ExcelWriter('test.xlsx')
df = pd.read_csv("datafile.csv", index_col=False)
df = df.iloc[0:41, 1]
df.to_excel(writer, 'sheetname', startrow =0, startcol=1, index=False)
for f in files:
i+=1
df = pd.read_csv(f, index_col=False)
df = df.iloc[0:41,2]
df.to_excel(writer, 'sheetname', startrow=0, startcol=1+i, index=False)
Thanks in advance
Do you mean:
df.loc[:,'measurements'] = df.loc[:,'measurements'].astype(float)
So when you read the dataframe you can cast all your columns like that for example.
Different solution is, while reading your csv to cast the columns by using dtypes (see Documentation)
EXAMPLE
df = pd.read_csv(os.path.join(savepath,'test.csv') , sep=";" , dtype={
ID' : 'Int64' , 'STATUS' : 'object' } ,encoding = 'utf-8' )
How to convert the output I get from a pretty table to pandas dataframe and save it as an excel file.
My code which gets the pretty table output
from prettytable import PrettyTable
prtab = PrettyTable()
prtab.field_names = ['Item_1', 'Item_2']
for item in Items_2:
prtab.add_row([item, difflib.get_close_matches(item, Items_1)])
print(prtab)
I'm trying to convert this to a pandas dataframe however I get an error saying DataFrame constructor not properly called! My code to convert this is shown below
AA = pd.DataFrame(prtab, columns = ['Item_1', 'Item_2']).reset_index()
I found this method recently.
pretty_table.get_csv_string()
this will convert it to a csv string where you could write to a csv file.
I use it like this:
tbl_as_csv = pretty_table.get_csv_string().replace('\r','')
text_file = open("output_path.csv", "w")
n = text_file.write(tbl_as_csv)
text_file.close()
Load the data into a DataFrame first, then export to PrettyTable and Excel:
import io
import difflib
import pandas as pd
import prettytable as pt
data = []
for item in Items_2:
data.append([item, difflib.get_close_matches(item, Items_1)])
df = pd.DataFrame(data, columns=['Item_1', 'Item_2'])
# Export to prettytable
# https://stackoverflow.com/a/18528589/190597 (Ofer)
# Use io.StringIO with Python3, use io.BytesIO with Python2
output = io.StringIO()
df.to_csv(output)
output.seek(0)
print(pt.from_csv(output))
# Export to Excel file
filename = '/tmp/output.xlsx'
writer = pd.ExcelWriter(filename)
df.to_excel(writer,'Sheet1')
I have a 1million line CSV file. I want to do call a lookup function on each row's 1'st column, and append its result as a new column in the same CSV (if possible).
What I want is this is something like this:
for each row in dataframe
string=row[1]
result=lookupFunction(string)
row.append[string]
I Know i could do it using python's CSV library by opening my CSV, read each row, do my operation, write results to a new CSV.
This is my code using Python's CSV library
with open(rawfile, 'r') as f:
with open(newFile, 'a') as csvfile:
csvwritter = csv.writer(csvfile, delimiter=' ')
for line in f:
#do operation
However I really want to do it with Pandas because it would be something new to me.
This is what my data looks like
77,#oshkosh # tannersville pa,,PA,US
82,#osithesakcom ca,,CA,US
88,#osp open records or,,OR,US
89,#ospbco tel ord in,,IN,US
98,#ospwmnwithn return in,,IN,US
99,#ospwmnwithn tel ord in,,IN,US
100,#osram sylvania inc ma,,MA,US
106,#osteria giotto montclair nj,,NJ,US
Any help and guidance will be appreciated it. THanks
here is a simple example of adding 2 columns to a new column from you csv file
import pandas as pd
df = pd.read_csv("yourpath/yourfile.csv")
df['newcol'] = df['col1'] + df['col2']
create df and csv
import pandas as pd
df = pd.DataFrame(dict(A=[1, 2], B=[3, 4]))
df.to_csv('test_add_column.csv')
read csv into dfromcsv
dfromcsv = pd.read_csv('test_add_column.csv', index_col=0)
create new column
dfromcsv['C'] = df['A'] * df['B']
dfromcsv
write csv
dfromcsv.to_csv('test_add_column.csv')
read it again
dfromcsv2 = pd.read_csv('test_add_column.csv', index_col=0)
dfromcsv2
I am dealing with a csv file that contains three columns and three rows containing numeric data. The csv data file simply looks like the following:
Colum1,Colum2,Colum3
1,2,3
1,2,3
1,2,3
My question is how to write a python code that take a single value of one of the column and perform a specific operation. For example, let say I want to take the first value in 'Colum1' and subtract it from the sum of all the values in the column.
Here is my attempt:
import csv
f = open('columns.csv')
rows = csv.DictReader(f)
value_of_single_row = 0.0
for i in rows:
value_of_single_Row += float(i) # trying to isolate a single value here!
print value_of_single_row - sum(float(r['Colum1']) for r in rows)
f.close()
Based on the code you provided, I suggest you take a look at the doc to see the preferred approach on how to read through a csv file. Take a look here:
How to use CsvReader
with that being said, you can modify the beginning of your code slightly to this:
import csv
with open('data.csv', 'rb') as f:
rows = csv.DictReader(f)
for row in rows:
# perform operation per row
From there you now have access to each row.
This should give you what you need to do proper row-by-row operations.
What I suggest you do is play around with printing out your rows to see what your data looks like. You will see that each row being outputted is a dictionary.
So if you were going through each row, you can just simply do something like this:
for row in rows:
row['Colum1'] # or row.get('Colum1')
# to do some math to add everything in Column1
s += float(row['Column1'])
So all of that will look like this:
import csv
s = 0
with open('data.csv', 'rb') as f:
rows = csv.DictReader(f)
for row in rows:
s += float(row['Colum1'])
You can do pretty much all of this with pandas
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
import sys
import os
Location = r'path/test.csv'
df = pd.read_csv(Location, names=['Colum1','Colum2','Colum3'])
df = df[1:] #Remove the headers since they're unnecessary
print df
df.xs(1)['Colum1']=int(df.loc[1,'Colum1'])+5
print df
You can write back to your csv using df.to_csv('File path', index=False,header=True) Having headers=True will add the headers back in.
To do this more along the lines of what you have you can do it like this
import csv
Location = r'C:/Users/tnabrelsfo/Documents/Programs/Stack/test.csv'
data = []
with open(Location, 'r') as f:
for line in f:
data.append(line.replace('\n','').replace(' ','').split(','))
data = data[1:]
print data
data[1][1] = 5
print data
it will read in each row, cut out the column names, and then you can modify the values by index
So here is my simple solution using pandas library. Suppose we have sample.csv file
import pandas as pd
df = pd.read_csv('sample.csv') # df is now a DataFrame
df['Colum1'] = df['Colum1'] - df['Colum1'].sum() # here we replace the column by subtracting sum of value in the column
print df
df.to_csv('sample.csv', index=False) # save dataframe back to csv file
You can also use map function to do operation to one column, for example,
import pandas as pd
df = pd.read_csv('sample.csv')
col_sum = df['Colum1'].sum() # sum of the first column
df['Colum1'] = df['Colum1'].map(lambda x: x - col_sum)
I have a text file:
sample value1 value2
A 0.1212 0.2354
B 0.23493 1.3442
i import it:
with open('file.txt', 'r') as fo:
notes = next(fo)
headers,*raw_data = [row.strip('\r\n').split('\t') for row in fo] # get column headers and data
names = [row[0] for row in raw_data] # extract first row (variables)
data= np.array([row[1:] for row in raw_data],dtype=float) # get rid of first row
if i then convert it:
s = pd.DataFrame(data,index=names,columns=headers[1:])
the data is recognized as floats. I could get the sample names back as column by s=s.reset_index().
if i do
s = pd.DataFrame(raw_data,columns=headers)
the floats are objects and i cannot perform standard calculations.
How would you make the data frame ? Is it better to import the data as dict ?
BTW i am using python 3.3
You can parse your data file directly into data frame as follows:
df = pd.read_csv('file.txt', sep='\t', index_col='sample')
Which will give you:
value1 value2
sample
A 0.12120 0.2354
B 0.23493 1.3442
[2 rows x 2 columns]
Then, you can do your computations.
To parse such a file, one should use pandas read_csv function.
Below is a minimal example showing the use of read_csv with parameter delim_whitespace set to True
import pandas as pd
from StringIO import StringIO # Python2 or
from io import StringIO # Python3
data = \
"""sample value1 value2
A 0.1212 0.2354
B 0.23493 1.3442"""
# Creation of the dataframe
df = pd.read_csv(StringIO(data), delim_whitespace=True)