Convert Powershell data clean up code to Python Code - python

I would like to convert the Powershell script below to Python code.
Here is the objective of this code.
The code takes in a comma delimited filename and file extension.
The code below exports file as a pipe delimited file
Then it removes commas that exists within the data
Finally it also removes the double quotes used to qualify the data.
This results in the final file being pipe delimited with no double quotes or commas in the data. In doing this work I used this order because if you try to just replace double quotes and commas before establishing pipes the columns and data would break.
Param([string]$RootString, [string]$Ext)
$OrgFile = $RootString
$NewFile = $RootString.replace($Ext,"out")
Import-Csv $OrgFile -Encoding UTF8 | Export-Csv tempfile.csv -Delimiter "|" -NoTypeInformation
(Get-Content tempfile.csv).Replace(",","").Replace('"',"") | Out-File $NewFile -Encoding UTF8
Copy-Item -Force $NewFile $OrgFile
Remove-Item –path $NewFile -Force
I got dinged a point for this but. Did not see a point in posting bad code that does not work. Here is my version of non working code.
for index in range(len(dfcsv)):
filename = dfcsv['csvpath'].iloc[index]
print(filename)
print(i)
with open(filename, 'r+') as f:
text = f.read()
print(datetime.now())
text = re.sub('","', '|', text)
print(datetime.now())
f.seek(0)
f.write(text)
f.truncate()
i = i + 1
Issues with this code is the method of find and replace. This was creating extra column in the beginning due to double quote. Then sometimes extra column at the end since sometimes there was a double quote at the end. This caused data from different rows to merge together. I didn't post this part as I didn't think it was necessary for my objective. More relevant seemed to put working code to create a better idea of objective. Here is the non working code.

Seem no one here wanted to answer question so I found solution else where. Here is the link for anyone needing to convert comma file to pipe delimited:
https://www.experts-exchange.com/questions/29188372/How-can-I-Convert-Powershell-data-clean-up-code-to-Python-Code.html

Related

Add the contents of one file at the end of each line of another file

I want to add the contents of a file containing passwords and add it to the other file which contains a list of emails and I want the format to be:
email#example.com:examplepass
I've tried to do it with a Python script but no luck so far, tried it with sed and it worked, but not what I wanted it to do. I created a new environment variable and I set it up as follows:
export FILES=$(cat pass.txt)
Then I used the command
sed 's/$/$($FILES)/' test.txt > test5000.txt
to add each password of each line into the email list, but instead it printed out
email#example.com:$FILES
Pretty sure you just want paste -d: test.txt pass.txt
it works with sed but wrapping in single quotes avoids evaluation of the env. variable. So enclose in double quotes instead. Also, no need to repeat the $ in the replacement.
sed "s/\$/($FILES)/" test.txt > test5000.txt
Warning: There's a good chance that some password characters are interpreted in the replacement (for instance what about backwards references like \1 or even quotes!!...) so this solution isn't really viable when you don't know what's in $FILES
Python version (I recommend this solution), untested but pretty confident about it, and is foolproof, even if the password file contains special chars:
with open('pass.txt') as fr:
passwords = fr.read()
with open("test.txt") as fr, open('test50000.txt', "w") as fw:
fw.writelines('{}{}\n'.format(t.rstrip(),passwords) for t in fr)
first, read the password file, then open input & output and write the lines made of the user name (without line terminator) + password + line terminator in a comprehension for better speed.

Writing out text with double double quotes - Python on Linux

I'm trying to take the text output of a query to an SSD (pulling a log page, similar to pulling SMART data. I'm then trying to write this text data out of a log file I update periodically.
My problem happens when the log data for some drives has double double-quotes as a placeholder for a blank field. Here is a snippet of the input:
VER 0x10200
VID 0x15b7
BoardRev 0x0
BootLoadRev ""
When this gets written out (appended) to my own log file, the text gets replaced with several null characters and then when I try to open all the text editors tell me it's corrupted.
The "" characters are replaced by something like this on my Linux system:
BootLoadRev "\00\00\00\00"
Some fields are even longer with the \00 characters. If the "" is not there, things write out OK.
The code is similar to this:
f=open(fileName, 'w')
test_bench.send_command('get_log_page')
identify_data = test_bench.get_data_in()
f.write(identify_data)
f.close()
Is there a way to send this text to a file w/o these nulls causing problems?
Assuming that this is Python 2 (and that your content is thus what Python 3 would call a bytestring), and that your intended data format is raw ASCII, the trivial solution is simply to remove the NULs from your content before you write to disk:
f.write(identify_data.replace('\0', ''))

Malformed CSV quoting

I pass data from SAS to Python using CSV format. Have a problem with a quoting format SAS uses. Strings like "480 КЖИ" ОАО aren't quoted, but Python csv module thinks they're.
dat = ['18CA4,"480 КЖИ" ОАО', '1142F,"""Росдорлизинг"" Российская дор,лизинг,компания"" ОАО"']
for i in csv.reader(dat):
print(i)
>>['18CA4', '480 КЖИ ОАО']
>>['1142F', '"Росдорлизинг" Российская дор,лизинг,компания" ОАО']
The 2nd string is fine, but I need 480 КЖИ ОАО string to be "480 КЖИ" ОАО. Don't find such an option in csv module. Maybe it's possible to force proc export to quote all " chars?
UPD: Here's a similar problem Python CSV : field containing quotation mark at the beginning
UPD2: #Quentin have asked for details. Here they're: I have SAS8.2 connected to 9.1 server. I download custom format data from server side with proc format cntlout=..; proc download... So i get a dictionary-like dataset <key>, <value>. Then i pass this dataset in CSV format using proc export via DDE interface to Python. But proc export quotes only strings which include delimiter (comma) as i understand. So i think, i need SAS to quote quotation marks too or Python to unquote only those strings which include commas.
UPDATE: switching from proc export via DDE to direct reading of dataset with a modified SAS7BDAT Python module hugely improved performance. And i got rid of the problem above.
SAS will add extra quotes if the value has quotes in it already.
data _null_;
file log dsd ;
string='"480 КЖИ" ОАО';
put string;
run;
Generates this result:
"""480 КЖИ"" ОАО"
Perhaps the quotes are being removed at some other point in the flow from SAS to Python? Try saving the CSV file to a disk and having Python read from the disk file.

Parse log file in python

I have a log file that has lines that look like this:
"1","2546857-23541","f_last","user","4:19 P.M.","11/02/2009","START","27","27","3","c2546857-23541",""
Each line in the log as 12 double quote sections and the 7th double quote section in the string comes from where the user typed something into the chat window:
"22","2546857-23541","f_last","john","4:38 P.M.","11/02/2009","
What's up","245","47","1","c2546857-23541",""
This string also shows the issue I'm having; There are areas in the chat log where the text the user typed is on a new line in the log file instead of the same line like the first example.
So basically I want the lines in the second example to look like the first example.
I've tried using Find/Replace in N++ and I am able to find each "orphaned" line but I was unable to make it join the line above it.
Then I thought of making a python file to automate it for me, but I'm kind of stuck about how to actually code it.
Python errors out at this line running unutbu's code
"1760","4746880-00129","bwhiteside","tom","11:47 A.M.","12/10/2009","I do not see ^"refresh your knowledge
^" on the screen","422","0","0","c4746871-00128",""
The csv module is smart enough to recognize when a quoted item is not finished (and thus must contain a newline character).
import csv
with open('data.log',"r") as fin:
with open('data2.log','w') as fout:
reader=csv.reader(fin,delimiter=',', quotechar='"', escapechar='^')
writer=csv.writer(fout, delimiter=',',
doublequote=False, quoting=csv.QUOTE_ALL)
for row in reader:
row[6]=row[6].replace('\n',' ')
writer.writerow(row)
If you data is valid CSV you can use Python's csv.reader class. It should work just fine with your sample data. It may not work correctly depending an what an embeded double-quote looks like from the source system. See: http://docs.python.org/library/csv.html#module-contents.
Unless I'm misunderstanding the problem. You simply need to read in the file and remove any newline characters that occur between double quote characters.

How can I detect DOS line breaks in a file?

I have a bunch of files. Some are Unix line endings, many are DOS. I'd like to test each file to see if if is dos formatted, before I switch the line endings.
How would I do this? Is there a flag I can test for? Something similar?
Python can automatically detect what newline convention is used in a file, thanks to the "universal newline mode" (U), and you can access Python's guess through the newlines attribute of file objects:
f = open('myfile.txt', 'U')
f.readline() # Reads a line
# The following now contains the newline ending of the first line:
# It can be "\r\n" (Windows), "\n" (Unix), "\r" (Mac OS pre-OS X).
# If no newline is found, it contains None.
print repr(f.newlines)
This gives the newline ending of the first line (Unix, DOS, etc.), if any.
As John M. pointed out, if by any chance you have a pathological file that uses more than one newline coding, f.newlines is a tuple with all the newline codings found so far, after reading many lines.
Reference: http://docs.python.org/2/library/functions.html#open
If you just want to convert a file, you can simply do:
with open('myfile.txt', 'U') as infile:
text = infile.read() # Automatic ("Universal read") conversion of newlines to "\n"
with open('myfile.txt', 'w') as outfile:
outfile.write(text) # Writes newlines for the platform running the program
You could search the string for \r\n. That's DOS style line ending.
EDIT: Take a look at this
(Python 2 only:) If you just want to read text files, either DOS or Unix-formatted, this works:
print open('myfile.txt', 'U').read()
That is, Python's "universal" file reader will automatically use all the different end of line markers, translating them to "\n".
http://docs.python.org/library/functions.html#open
(Thanks handle!)
As a complete Python newbie & just for fun, I tried to find some minimalistic way of checking this for one file. This seems to work:
if "\r\n" in open("/path/file.txt","rb").read():
print "DOS line endings found"
Edit: simplified as per John Machin's comment (no need to use regular expressions).
dos linebreaks are \r\n, unix only \n. So just search for \r\n.
Using grep & bash:
grep -c -m 1 $'\r$' file
echo $'\r\n\r\n' | grep -c $'\r$' # test
echo $'\r\n\r\n' | grep -c -m 1 $'\r$'
You can use the following function (which should work in Python 2 and Python 3) to get the newline representation used in an existing text file. All three possible kinds are recognized. The function reads the file only up to the first newline to decide. This is faster and less memory consuming when you have larger text files, but it does not detect mixed newline endings.
In Python 3, you can then pass the output of this function to the newline parameter of the open function when writing the file. This way you can alter the context of a text file without changing its newline representation.
def get_newline(filename):
with open(filename, "rb") as f:
while True:
c = f.read(1)
if not c or c == b'\n':
break
if c == b'\r':
if f.read(1) == b'\n':
return '\r\n'
return '\r'
return '\n'

Categories