python pandas.DataFrame.to_csv newline issue

python pandas.DataFrame.to_csv newline issue - python

Greeting Dear community.
I need to write a python pandas.DataFrame into csv
I try to use something like this:
dfPRR.to_csv(prrDumpName,index=False,quotechar="'",quoting=csv.QUOTE_ALL)
It works fine in some sample but for other sample with long string. I encounter the issue that one record breaks into 2 or 3 different line.
what I want my output file:
'RcdLn','GrpPIR','w_id','fwf_id','part_typ','l_id','head_num','site_num','filename'
'2','0','01','demo_fwf_id','demo_part_typ','demo_l_id','1','0','longdemofilename'
'1100','1','01','demo_fwf_id','demo_part_typ','demo_l_id','1','0','longdemofilename'
'2198','2','01','demo_fwf_id','demo_part_typ','demo_l_id','1','0','longdemofilename'
'3296','3','01','demo_fwf_id','demo_part_typ','demo_l_id','1','0','longdemofilename'
Instead what I get...the file breaking into two seperate line::
'RcdLn','GrpPIR','w_id','fwf_id','part_typ','l_id','head_num','site_num','filename'
'2','0','01','demo_fwf_id
','demo_part_typ','demo_l_id','1','0','longdemofilename'
'1100','1','01','demo_fwf_id
','demo_part_typ','demo_l_id','1','0','longdemofilename'
'2198','2','01','demo_fwf_id
','demo_part_typ','demo_l_id','1','0','longdemofilename'
'3296','3','01','demo_fwf_id
','demo_part_typ','demo_l_id','1','0','longdemofilename'
Is there an option to tell to_csv use a specific record delimitor ?
I do not see that option in documentation of to_csv
What my goal is to create a csv and then a loader program will load the csv
As for now, the loader program cannot load the file when this happens. As it is not able to tell the record is finish or not..?
I see other sample file that the record does not break into 2 or 3 lines when the string is not as long. This is the desired behavior.
How I can enforce this ??

Related

In Python, how can i write multiple times to a file and keep everything on 1 long line line? (40k plus characters)

Working on a data transfer program, to move data from an oracle database to another
application that I cant see or change. I have to create several text files described below and drop them off on sftp site.
I am converting from a 20+ year old SQR report. (yes SQR) :(
I have to create text files that have a format as such an_alpa_code:2343,34533,4442,333335,.....can be thousands or numbers separated by comma.
The file may have only 1 line, but the file might be 48k in size.
There is no choice on the file format, it is required this way.
Tried using Oracle UTL_FILE, but that cannot deal with a line over 32k in length, so looking for an alterative. Python is a language my company has approved for use, so I am hoping it could do this

I too once [was forced] to use SQR many years ago, and so you have my sympathy.
python can definitely do this. If you set the end argument of the print command to an empty string, then you can ensure that no new line is output:
print("Hello world",end='')
perl could also be a good candidate language.
print("Hello world");
Both python and perl have Oracle client libraries.

This gave me one long line
file_obj = open("writing.txt", "w")
for i in range(0,10000):
file_obj.write("mystuff"+str(i)+",")
# file_obj.write('\n')
file_obj.close()

Which newline character is in my CSV?

We receive a .tar.gz file from a client every day and I am rewriting our import process using SSIS. One of the first steps in my process is to unzip the .tar.gz file which I achieve via a Python script.
After unzipping we are left with a number of CSV files which I then import into SQL Server. As an aside, I am loading using the CozyRoc DataFlow Task Plus.
Most of my CSV files load without issue but I have five files which fail. By reading the log I can see that the process is reading the Header and First line as though there is no HeaderRow Delimiter (i.e. it is trying to import the column header as ColumnHeader1ColumnValue1
I took one of these CSVs, copied the top 5 rows into Excel, used Text-To-Columns to delimit the data then saved that as a new CSV file.
This version imported successfully.
That makes me think that somehow the original CSV isn't using {CR}{LF} as the row delimiter but I don't know how to check. Any suggestions?

I ended up using the suggestion commented by #vahdet because I already had notepad++ installed. I can't find the same option in EmEditor but it may exist
For those who are curious, the files are using {LF} which is consistent with the other files. My investigation continues...

Seeing that you have EmEditor, you can use EmEditor to find the eol character in two ways:
Use View > Character Code Value... at the end of a line to display a dialog box showing information about the character at the current position.
Go to View > Marks and turn on Newline Characters and CR and LF with Different Marks to show the eol while editing. LF is displayed with a down arrow while CRLF is a right angle.
Some other things you could try checking for are: file encoding, wrong type of data for a field and an inconsistent number of columns.

How to write a txt with the following list using pandas?

Hello I am using pandas to process a excel file my code looks as follows:
df = xl.parse("Sheet1")
important_Parameters="bash generate.sh --companyCode"+" "+df[u'Company Code '].astype(str)+" "+"--isaADD"+df[u'TP Interchange Address '].astype(str)+" "+"--gsADD"+" "+df[u'TP Functional Group Address '].astype(str)
print(important_Parameters)
Everything works well, when I print my code it looks fine, I wish to write a txt file with the contain of my object called:
important_Parameters
I tried with:
important_Parameters.to_pickle("important.txt")
but the result does not seem like the printing, I believe that is due to the way that I took to write in disk,
I also tried with:
important_Parameters.to_string("importantParameters2.txt")
However this gave me a more friendly representation of the data but the result is including the number of the raw and also the rows's are not completed they look as follows:
bash generate.sh --companyCode 889009d --isaADD...
it is showing this ...
I would like to appreciate any suggestion to produce a simple .txt file called importante.txt with my result, the containing of important_Parameters, thanks for the support
I order to include more details my output looks like, I mean the result of the print:
0 bash generate.sh --companyCode 323232 --isaADD...
1 bash generate.sh --companyCode 323232 --isaADD...
2 bash generate.sh --companyCode 323232 --isaADD...

Panda dataframes have more than a few methods for saving to files. Have you tried important_Parameters.to_csv("important.csv")? I'm not certain what you want the output to look like.
If you want it tab-separated, you can try:
important_Parameters.to_csv("important.csv", sep='\t')
If the file absolutely must end in .txt, just change it to: important_Parameters.to_csv("important.txt"). CSVs are just specifically formatted text files so this shouldn't be a problem.

STDIN file read query [duplicate]

This question already has answers here:
How do I read from stdin?
(25 answers)
Closed 6 years ago.
I am doing small project in which I have to read the file from STDIN.
I am not sure what it means, what I asked the professor he told me,
there is not need to open the file and close like we generally do.
sFile = open ( "file.txt",'r')
I dont have to pass the file as a argument.
I am kind of confused what he wants.

The stdin takes input from different sources - depending on what input it gets.
Given a very simple bit of code for illustration (let's call it: script.py):
import sys
text = sys.stdin.read()
print text
You can either pipe your script with the input-file like so:
$ more file.txt | script.py
In this case, the output of the first part of the pipeline - which is the content of the file - is assigned to our variable(in this case text, which gets printed out eventually).
When left empty (i.e. without any additional input) like so:
$ python script.py
It let's you write the input similar to the input function and assigns the written input to the defined variable(Note that this input-"window" is open until you explicitly close it, which is usually done with Ctrl+D).

import sys, then sys.stdin will be the 'file' you want which you can use like any other file (e.g. sys.stdin.read()), and you don't have to close it. stdin means "standard input".

Might be helpful if you read through this post, which seems to be similar to yours.
'stdin' in this case would be the argument on the command line coming after the python script, so python script.py input_file. This input_file would be the file containing whatever data you are working on.
So, you're probably wondering how to read stdin. There are a couple of options. The one suggested in the thread linked above goes as follows:
import fileinput
for line in fileinput.input():
#read data from file
There are other ways, of course, but I think I'll leave you to it. Check the linked post for more information.
Depending on the context of your assignment, stdin may be automatically sent into the script, or you may have to do it manually as detailed above.

Parse log file in python

I have a log file that has lines that look like this:
"1","2546857-23541","f_last","user","4:19 P.M.","11/02/2009","START","27","27","3","c2546857-23541",""
Each line in the log as 12 double quote sections and the 7th double quote section in the string comes from where the user typed something into the chat window:
"22","2546857-23541","f_last","john","4:38 P.M.","11/02/2009","
What's up","245","47","1","c2546857-23541",""
This string also shows the issue I'm having; There are areas in the chat log where the text the user typed is on a new line in the log file instead of the same line like the first example.
So basically I want the lines in the second example to look like the first example.
I've tried using Find/Replace in N++ and I am able to find each "orphaned" line but I was unable to make it join the line above it.
Then I thought of making a python file to automate it for me, but I'm kind of stuck about how to actually code it.
Python errors out at this line running unutbu's code
"1760","4746880-00129","bwhiteside","tom","11:47 A.M.","12/10/2009","I do not see ^"refresh your knowledge
^" on the screen","422","0","0","c4746871-00128",""

The csv module is smart enough to recognize when a quoted item is not finished (and thus must contain a newline character).
import csv
with open('data.log',"r") as fin:
with open('data2.log','w') as fout:
reader=csv.reader(fin,delimiter=',', quotechar='"', escapechar='^')
writer=csv.writer(fout, delimiter=',',
doublequote=False, quoting=csv.QUOTE_ALL)
for row in reader:
row[6]=row[6].replace('\n',' ')
writer.writerow(row)

If you data is valid CSV you can use Python's csv.reader class. It should work just fine with your sample data. It may not work correctly depending an what an embeded double-quote looks like from the source system. See: http://docs.python.org/library/csv.html#module-contents.

Unless I'm misunderstanding the problem. You simply need to read in the file and remove any newline characters that occur between double quote characters.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python pandas.DataFrame.to_csv newline issue - python

Related

In Python, how can i write multiple times to a file and keep everything on 1 long line line? (40k plus characters)

Which newline character is in my CSV?

How to write a txt with the following list using pandas?

STDIN file read query [duplicate]

Parse log file in python

Categories

Resources