Joining sentences to dataframe

Joining sentences to dataframe - python

I want to export a dataframe to csv. But on top of it, I would like to print the date of the dataframe to produce the following result in the csv file. How can I join the string sentence to the dataframe so that I can export it together to csv?
import pandas as pd
import datetime as dt
today1=dt.datetime.today().strftime('%Y%m%d')
print('This dataframe is created on ',today1)
df=pd.DataFrame({'A':[1,2],'B':[3,4]})
print(df)
df.to_csv('temp.csv')

pd.to_csv accepts a filehandle as input. So write your first line, then call to_csv with the same handle:
import pandas as pd
import datetime as dt
today1=dt.datetime.today().strftime('%Y%m%d')
df=pd.DataFrame({'A':[1,2],'B':[3,4]})
with open("temp.csv","w") as f:
f.write('This dataframe is created on {}\n'.format(today1))
df.to_csv(f)
when you read the data back just do the same with pd.read_csv():
with open("temp.csv","r") as f:
date_line = next(f)
df = pd.read_csv(f)

Just remove the to_csv line in your code, then run it in a terminal window as below:
python code.py >> temp.csv
Your print instructions will be printed in temp.csv. The output file is:
('This dataframe is created on ', '20161220')
A B
0 1 3
1 2 4
Not sure if it works in every OS though.

Related

the number of rows in a csv file

I have a csv file that has only one column. I want to extract the number of rows.
When I run the the code below:
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
I get the following output:
[65422771 rows x 1 columns]
But when I run the code below:
file = open("data.csv")
numline = len(file.readlines())
print (numline)
I get the following output:
130845543
What is the correct number of rows in my csv file? What is the difference between the two outputs?

Is it possible that you have an empty line after each entry? because the readlines count is exactly double wrt pandas df rows.
So pandas is skipping empty lines while readlines count them
in order to check the number of empty lines try:
import sys
import csv
csv.field_size_limit(sys.maxsize)
data= open ('data.csv')
for line in csv.reader(data):
if not line:
empty_lines += 1
continue
print line

Converting date and time format when importing csv file in Python

I haven't been able to find a solution in similar questions yet so I'll have to give it a go here.
I am importing a csv file looking like this in notepad:
",""ItemName"""
"Time,""Raw Values"""
"7/19/2019 10:31:29 PM,"" 0"","
"7/19/2019 10:32:01 PM,"" 1"","
What I want when I save it as a new csv, is to reformat the date/time and the corresponding value to this (required by analysis software): The semicolon as separator and in the end is important, and I don't really need a header.
2019-07-19 22:31:29;0;
2019-07-19 22:32:01;1;
This is what it looks like in Python:
Item1 = pd.read_csv(r'.\Datafiles\ItemName.csv')
Item1
#Output:
# ,"ItemName"
# 0 Time,"Raw Values"
# 1 7/19/2019 10:31:29 AM," 0",
# 2 7/19/2019 10:32:01 AM," 1",
valve_G1.dtypes
# ,"ItemName" object
# dtype: object
I have tried using datetime without any luck but there might be something fishy with the datatypes that I am not aware of.

What you want in principle is read to DataFrame, convert datetime column and export df to csv again. I think you will need to get rid of the quote-chars to get the import correct. You can do so by reading the file content to a string, replace the '"', and feed that string to pandas.read_csv. EX:
import os
from io import StringIO
import pandas as pd
# this is just to give an example:
s='''",""ItemName"""
"Time,""Raw Values"""
"7/19/2019 10:31:29 PM,"" 0"","
"7/19/2019 10:32:01 PM,"" 1"","'''
f = StringIO(s)
# in your script, make f a file pointer instead, e.g.
# with open('path_to_input.csv', 'r') as f:
# now get rid of the "
csvcontent = ''
for row in f:
csvcontent += row.replace('"', '')
# read to DataFrame
df = pd.read_csv(StringIO(csvcontent), sep=',', skiprows=1, index_col=False)
df['Time'] = pd.to_datetime(df['Time'])
# save cleaned output as ;-separated csv
dst = 'path_where_to_save.csv'
df.to_csv(dst, index=False, sep=';', line_terminator=';'+os.linesep)

Adding new column with the header containing date at the beginning of CSV file

I was looking on Stackoverflow for this thing but I didn't find exactly what I wanted. I would like to open csv file on Python and add new column with header "Date" and until end of the file add today's date. How can I do it? I was trying to do it with pandas but I only know how to append to the end.
I was trying to do this that way with package csv:
x=open(outfile_name1)
y=csv.reader(x)
z=[]
for row in y:
z.append(['0'] + row)
Instead of ['0'] I wanted to put today's date. Can I then convert this list to csv with pandas or something? Thanks in advance for help!

Try this:
import pandas as pd
import datetime
df = pd.read_csv("my.csv")
df.insert(0, 'Date', datetime.datetime.today().strftime('%Y-%m-%d'))
df.to_csv("my_withDate.csv", index=False)
PS: Read the docs

Is this what you are looking for?
import pandas as pd
import datetime
df = pd.read_csv("file.csv")
df['Date'] = datetime.datetime.today().strftime('%Y-%m-%d')
df.to_csv("new_file.csv", index=False)

As far as I undestand ultimate goal is to write data to csv. One option to do that is to open first file for reading data, second for writing data then write header row into new file prepending it with column name 'Date,' and then iterate over data rows prepending them with date (requires 3.6 <= Python as uses f-strings):
import datetime
with open('columns.csv', 'r') as out, open('out.csv', 'w') as into:
headers = 'Date,' + next(out)
print(headers, end='', file=into)
for row in out:
print(f'{datetime.datetime.today().date()}, {row}', end='', file=into)

Python Pandas performing operation on each row of CSV file

I have a 1million line CSV file. I want to do call a lookup function on each row's 1'st column, and append its result as a new column in the same CSV (if possible).
What I want is this is something like this:
for each row in dataframe
string=row[1]
result=lookupFunction(string)
row.append[string]
I Know i could do it using python's CSV library by opening my CSV, read each row, do my operation, write results to a new CSV.
This is my code using Python's CSV library
with open(rawfile, 'r') as f:
with open(newFile, 'a') as csvfile:
csvwritter = csv.writer(csvfile, delimiter=' ')
for line in f:
#do operation
However I really want to do it with Pandas because it would be something new to me.
This is what my data looks like
77,#oshkosh # tannersville pa,,PA,US
82,#osithesakcom ca,,CA,US
88,#osp open records or,,OR,US
89,#ospbco tel ord in,,IN,US
98,#ospwmnwithn return in,,IN,US
99,#ospwmnwithn tel ord in,,IN,US
100,#osram sylvania inc ma,,MA,US
106,#osteria giotto montclair nj,,NJ,US
Any help and guidance will be appreciated it. THanks

here is a simple example of adding 2 columns to a new column from you csv file
import pandas as pd
df = pd.read_csv("yourpath/yourfile.csv")
df['newcol'] = df['col1'] + df['col2']

create df and csv
import pandas as pd
df = pd.DataFrame(dict(A=[1, 2], B=[3, 4]))
df.to_csv('test_add_column.csv')
read csv into dfromcsv
dfromcsv = pd.read_csv('test_add_column.csv', index_col=0)
create new column
dfromcsv['C'] = df['A'] * df['B']
dfromcsv
write csv
dfromcsv.to_csv('test_add_column.csv')
read it again
dfromcsv2 = pd.read_csv('test_add_column.csv', index_col=0)
dfromcsv2

Automatic reformatting when using csv writer python module

How do I prevent Python from automatically writing objects into csv as a different format than originally? For example, I have list object such as the following:
row = ['APR16', '100.00000']
I want to write this row as is, however when I use writerow function of csv writer, it writes into the csv file as 16-Apr and just 10. I want to keep the original formatting.
EDIT:
Here is the code:
import pandas as pd
dates = ['APR16', 'MAY16', 'JUN16']
numbers = [100.00000, 200.00000, 300.00000]
for i in range(3):
row = []
row.append(dates[i])
row.append(numbers[i])
prow = pd.DataFrame(row)
prow.to_csv('test.csv', index=False, header=False)
And result:

Using pandas:
import pandas as pd
dates = ['APR16', 'MAY16', 'JUN16']
numbers = [100.00000, 200.00000, 300.00000]
data = zip(dates,numbers)
fd = pd.DataFrame(data)
fd.to_csv('test.csv', index=False, header=False) # csv-file
fd.to_excel("test.xls", header=False,index=False) # or xls-file
Result in my terminal:
➜ ~ cat test.csv
APR16
100.00000
Result in LibreOffice:

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Joining sentences to dataframe - python

Just remove the to_csv line in your code, then run it in a terminal window as below: python code.py >> temp.csv Your print instructions will be printed in temp.csv. The output file is: ('This dataframe is created on ', '20161220') A B 0 1 3 1 2 4 Not sure if it works in every OS though.

Related

the number of rows in a csv file

Converting date and time format when importing csv file in Python

Adding new column with the header containing date at the beginning of CSV file

Python Pandas performing operation on each row of CSV file

Automatic reformatting when using csv writer python module

Categories

Resources