RobotFramework complaining consecutive whitespace issue - python

I am using robotframework 3.1.2 and its automatically parsing a txt file and complaining the error "Collapsing consecutive whitespace during parsing is deprecated" for every line in the file as the txt file has consecutive whitespaces all across the file.
I cannot cleanup the whitespaces in the file since its used by other team members.
How can I configure the robotframework to ignore the whitespaces or ignore that file?
I coudnt find any relevant help on google.
P.S: If I rename the file extension to something else other than txt( like .bin) robotframework stops complaining as it seems to be ignoring the file.

Related

Newlines in Python text files

Why do Python text files always end with a new line during compilation in python3?
I have severally to remove the newlines but it seems it is even a requirement of pycodestyle.

invalid argument C:\Users\Dar laptops\face_recognition\face_mask_detector_4\Keras-FaceMask-Detection-System\data\with_mask\1.jpg

I have a text file which has some paths in it. The problem is that my windows User name is space separated and when i run my python file to read it. It only read till space and ignores everything else.
C:\Users\Dar
laptops\face_recognition\face_mask_detector_4\Keras-FaceMask-Detection-System\data\with_mask\1.jpg
how can i read the space and all the path.

Trailing newline in file when reading JSON in Pyspark results in empty line

When loading JSON data into Spark (v2.4.2) on AWS EMR from S3 using Pyspark, I've observed that a trailing line separator (\n) in the file results in an empty row being created on the end of the Dataframe. Thus, a file with 10,000 lines in it will produce a Dataframe with 10,001 rows, the last of which is empty/all nulls.
The file looks like this:
{line of JSON}\n
{line of JSON}\n
... <-- 9996 similar lines
{line of JSON}\n
{line of JSON}\n
There are no newlines in the JSON itself, i.e. I don't need to read the JSON as multi-line. I am reading it with the following Pyspark command:
df = spark.read.json('s3://{bucket}/{filename}.json.gz')
df.count()
-> 10001
My understanding of this quote from http://jsonlines.org/:
The last character in the file may be a line separator, and it will be treated the same as if there was no line separator present.
... is that that last empty line should not be considered. Am I missing something? I haven't seen anyone else on SO or elsewhere having this problem, yet it seems very obvious in practice. I don't see an option in the Spark Python API docs for suppressing empty lines, nor have I been able to work around it by trying different line separators and specifying them in the load command.
I have verified that removing the final line separator results in a Dataframe that has the correct number of lines.
I found the problem. The file I was uploading had an unexpected encoding (UCS-2 LE BOM instead of UTF-8). I should have thought to check it, but didn't. After I switched the encoding to the expected one (UTF-8) the load worked as intended.

Which newline character is in my CSV?

We receive a .tar.gz file from a client every day and I am rewriting our import process using SSIS. One of the first steps in my process is to unzip the .tar.gz file which I achieve via a Python script.
After unzipping we are left with a number of CSV files which I then import into SQL Server. As an aside, I am loading using the CozyRoc DataFlow Task Plus.
Most of my CSV files load without issue but I have five files which fail. By reading the log I can see that the process is reading the Header and First line as though there is no HeaderRow Delimiter (i.e. it is trying to import the column header as ColumnHeader1ColumnValue1
I took one of these CSVs, copied the top 5 rows into Excel, used Text-To-Columns to delimit the data then saved that as a new CSV file.
This version imported successfully.
That makes me think that somehow the original CSV isn't using {CR}{LF} as the row delimiter but I don't know how to check. Any suggestions?
I ended up using the suggestion commented by #vahdet because I already had notepad++ installed. I can't find the same option in EmEditor but it may exist
For those who are curious, the files are using {LF} which is consistent with the other files. My investigation continues...
Seeing that you have EmEditor, you can use EmEditor to find the eol character in two ways:
Use View > Character Code Value... at the end of a line to display a dialog box showing information about the character at the current position.
Go to View > Marks and turn on Newline Characters and CR and LF with Different Marks to show the eol while editing. LF is displayed with a down arrow while CRLF is a right angle.
Some other things you could try checking for are: file encoding, wrong type of data for a field and an inconsistent number of columns.

How to delete  character from file a file with no extension - Python 2.7

So I have an assignment (Python 2.7) to process a file with no extension. It has a date in its first line :
2016.03.22.
But when I read it from file, and print it out, I get: 2016.03.22.
This does not happen when the extension is .txt, but I cant use that. I've tried this regex:
def checkDate(line):
return re.search('(\d{4}\.\d{2}\.\d{2}\.)', line)
For some reason it does not find it, returns None. I've tried http://www.pyregex.com/ and it sees the pattern without the odd character.
Is there any method to cut it down, without writing the whole file?

Categories