Mac to Windows Python - python

I have recently moved a set of near identical programs from my mac to my school's windows, and while the paths appear to be the same (or the tail end of them), they will not run properly.
import glob
import pylab
from pylab import *
def main():
outfnam = "igdata.csv"
fpout = open(outfnam, "w")
nrows = 0
nprocessed = 0
nbadread = 0
filenames = [s.split("/")[1] for s in glob.glob("c/Cmos6_*.IG")]
dirnames = "c an0 an1 an2 an3 an4".split()
for suffix in filenames:
nrows += 1
row = []
row.append(suffix)
for dirnam in dirnames:
fnam = dirnam+"/"+suffix
lines = [l.strip() for l in open(fnam).readlines()]
nprocessed += 1
if len(lines)<5:
nbadread += 1
print "warning: file %s contains only %d lines"%(fnam, len(lines))
tdate = "N/A"
irrad = dirnam
Ig_zeroVds_largeVgs = 0.0
else:
data = loadtxt(fnam, skiprows=5)
tdate = lines[0].split(":")[1].strip()
irrad = lines[3].split(":")[1].strip()
# pull out last column (column "-1") from second-to-last row
Ig_zeroVds_largeVgs = data[-2,-1]
row.append(irrad)
row.append("%.3e"%(Ig_zeroVds_largeVgs))
fpout.write(", ".join(row) + "\n")
print "wrote %d rows to %s"%(nrows, outfnam)
print "processed %d input files, of which %d had missing data"%( \
nprocessed, nbadread)`
This program worked fine for a mac, but for windows I keep getting for :
print "wrote %d rows to %s"%(nrows, outfnam)
print "processed %d input files, of which %d had missing data"%( \
nprocessed, nbadread)
wrote 0 row to file name
processed 0 input files, of which o had missing data
on my mac i go 144 row to file...
does any one have any suggestions?

If the script doesn't raise any errors, this piece of code is most likely returning an empty list.
glob.glob("c/Cmos6_*.IG")
Seeing as glob.glob works perfectly fine with forward slashes on Windows, the problem is most likely that it's not finding the files, which most likely means that the string you provided has an error somewhere in it. Make sure there isn't any error in "c/Cmos6_*.IG".
If the problem isn't caused by this, then unfortunately, I have no idea why it is happening.
Also, when I tried it, filenames returned by glob.glob have backslashes in them on Windows, so you should probably split by "\\" instead.

Off the top of my head, it looks like a problem of using / in the path. Windows uses \ instead.
os.path contains a number of functions to ease working with paths across platforms.

Your s.split("/") should definitely be s.split(os.pathsep). I got bitten by this, onceā€¦ :)
In fact, glob returns paths with \ on Windows and / on Mac OS X, so you need to do your splitting with the appropriate path separator (os.pathsep).

Related

ValueError: scandir: path too long for Windows

I am writing a simple Python script to tell me file size for a set of documents which I am importing from a CSV. I verified that none of the entries are over 100 characters, so this error "ValueError: scandir: path too long for Windows" does not make sense to me.
Here is my code:
# determine size of a given folder in MBytes
import os, subprocess, json, csv, platform
# Function to check if a Drive Letter exists
def hasdrive(letter):
return "Windows" in platform.system() and os.system("vol %s: 2>nul>nul" % (letter)) == 0
# Define Drive to check for
letter = 'S'
# Check if Drive doesnt exist, if not then map drive
if not hasdrive(letter):
subprocess.call(r'net use s: /del /Y', shell=True)
subprocess.call(r'net use s: \\path_to_files', shell=True)
list1 = []
# Import spreadsheet to calculate size
with open('c:\Temp\files_to_delete_subset.csv') as f:
reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
for row in reader:
list1.extend(row)
# Define variables
folder = "S:"
folder_size = 0
# Exporting outcome
for list1 in list1:
folder = folder + str(list1)
for root, dirs, files in os.walk(folder):
for name in files:
folder_size += os.path.getsize(os.path.join(root, name))
print(folder)
# print(os.path.join(root, name) + " " + chr(os.path.getsize(os.path.join(root, name))))
print(folder_size)
From my understanding the max path size in Windows is 260 characters, so 1 driver letter + 100 character path should NOT exceed the Windows max.
Here is an example of a path: '/Document/8669/CORRESP/1722165.doc'
The folder string you're trying to walk is growing forever. Simplifying the code to the problem area:
folder = "S:"
# Exporting outcome
for list1 in list1:
folder = folder + str(list1)
You never set folder otherwise, so it starts out as S:<firstpath>, then on the next loop it's S:<firstpath><secondpath>, then S:<firstpath><secondpath><thirdpath>, etc. Simple fix: Separate drive from folder:
drive = "S:"
# Exporting outcome
for path in list1:
folder = drive + path
Now folder is constructed from scratch on each loop, throwing away the previous path, rather than concatenating them.
I also gave the iteration value a useful name (and removed the str call, because the values should all be str already).

Renaming Multiple Files at Once in a Directory

I am attempting to take a file name such as 'OP 40 856101.txt' from a directory, remove the .txt, set each single word to a specific variable, then reorder the filename based on a required order such as '856101 OP 040'. Below is my code:
import os
dir = 'C:/Users/brian/Documents/Moeller'
orig = os.listdir(dir) #original names of the files in the folder
for orig_name in orig: #This loop splits each file name into a list of stings containing each word
f = os.path.splitext(orig_name)[0]
sep = f.split() #Separation is done by a space
for t in sep: #Loops across each list of strings into an if statement that saves each part to a specific variable
#print(t)
if t.isalpha() and len(t) == 3:
wc = t
elif len(t) > 3 and len(t) < 6:
wc = t
elif t == 'OP':
op = t
elif len(t) >= 4:
pnum = t
else:
opnum = t
if len(opnum) == 2:
opnum = '0' + opnum
new_nam = '%s %s %s %s' % (pnum,op,opnum, wc) #This is the variable that contain the text for the new name
print("The orig filename is %r, the new filename is %r" % (orig_name, new_nam))
os.rename(orig_name, new_nam)
However I am getting an error with my last for loop where I attempt to rename each file in the directory.
FileNotFoundError: [WinError 2] The system cannot find the file specified: '150 856101 OP CLEAN.txt' -> '856101 OP 150 CLEAN'
The code runs perfectly until the os.rename() command, if I print out the variable new_nam, it prints out the correct naming order for all of the files in the directory. Seems like it cannot find the original file though to replace the filename to the string in new_nam. I assume it is a directory issue, however I am newer to python and can't seem to figure where to edit my code. Any tips or advice would be greatly appreciated!
Try this (just changed the last line):
os.rename(os.path.join(dir,orig_name), os.path.join(dir,new_nam))
You need to tell Python the actual path of the file to rename - otherwise, it looks only in the directory containing this file.
Incidentally, it's better not to use dir as a variable name, because that's the name of a built-in.

TypeError: can only concatenate tuple (not "str") to tuple line 6

import os
import time
source = [r'C:\\Documents','/home/swaroop/byte', '/home/swaroop/bin']
target_dir =r'C:\\Documents','/mnt/e/backup/'
today = target_dir + time.strftime("%Y%m%d")
now = time.strftime('%H%M%S')
os.path.exists(today)
os.mkdir(today)
print 'Successful created directory', today
target = today + os.sep + now + '.zip'
zip_command = "zip -qr '%s' %s" % (target, ' '.join(source))
if os.system(zip_command) == 0:
print 'Successful backup to', target
else:
print 'Backup FAILED'
today = target_dir + time.strftime("%Y%m%d")
Plees help
TypeError: can only concatenate tuple (not "str") to tuple
Check the comma as #MosesKoledoye said, the book has:
# 1. The files and directories to be backed up are
# specified in a list.
# Example on Windows:
# source = ['"C:\\My Documents"', 'C:\\Code'] # Example on Mac OS X and Linux:
source = ['/Users/swa/notes']
# Notice we had to use double quotes inside the string # for names with spaces in it.
# 2. The backup must be stored in a
# main backup directory
# Example on Windows:
# target_dir = 'E:\\Backup'
# Example on Mac OS X and Linux:
target_dir = '/Users/swa/backup'
# Remember to change this to which folder you will be using
your code has:
source = [r'C:\\Documents','/home/swaroop/byte', '/home/swaroop/bin']
target_dir =r'C:\\Documents','/mnt/e/backup/'
In the target_dir assignment the comma makes the right-side of the assignment a tuple. To join two strings together use a +, not a comma:
target_dir =r'C:\\Documents' + '/mnt/e/backup/'
better yet, use a single string. However, /mnt is a Linux directory name, not a Windows one. I suspect you actually need:
target_dir = '/mnt/e/backup/'
You have also made the Windows path a raw string, which means the two back-slashes will be retained. Either do this:
'C:\\Documents'
or this:
r'C:\Documents'
(unless or course you actually do want \\)
Edit: I just noticed you also have an indentation problem:
if os.system(zip_command) == 0:
print 'Successful backup to', target
else:
should be:
if os.system(zip_command) == 0:
print 'Successful backup to', target
else:
Finally, when you say "I copy all the code" and it fails, look to see where yours differs from what it says in the book.

Read between 2 offsets of a file

I was wondering how the read() function can be used to read between 2 offsets that are in hex?
I tried using this to convert the offset values to int, but I get a syntax error for the read.() line. Any ideas?
OFFSETS = ('3AF7','3ECF')
OFFSETE = ('3B04','3EDE')
for r, d, f in os.walk("."):
for hahahoho, value in enumerate(OFFSETS and OFFSETE):
try:
with open(os.path.join(r,f), 'rb' ) as fileread:
texttoprint = fileread.seek(int(OFFSETS[hahahoho], 16) -1)
yeeha = texttoprint.read[int(OFFSETS[hahahoho], 16) -1 : int(OFFSETE[damn],16)]
print (yeeha)
hahahoho + 1
this is not the entire code thou, just posted the ones i need help with =(
EDIT:
Alright, i think i should listen to the advice of you people this is the entire code
nost = 1
OFFSETS = ('3AF7','3ECF')
OFFSETE = ('3B04','3EDE')
endscript = 'No'
nooffile = 1
import os, glob, sys, tempfile
try:
directory = input('Enter your directory:').replace('\\','\\\\')
os.chdir(directory)
except FileNotFoundError:
print ('Directory not found!')
endscript = 'YES!'
if endscript == 'YES!':
sys.exit('Error. Be careful of what you do to your computer!')
else:
if os.path.isfile('Done.txt') == True:
print ('The folder has already been modified!')
else:
print ('Searching texts...\r\n')
print ('Printing...')
for r, d, f in os.walk("."):
for HODF in f:
if HODF.endswith(".hod") or "." not in HODF:
for damn, value in enumerate(OFFSETS and OFFSETE):
try:
with open(os.path.join(r,HODF), 'rb' ) as fileread:
fileread.seek(int(OFFSETS[damn],16) -1)
yeeha = fileread.read(int(OFFSETE[damn], 16) - (int(OFFSETS[damn],16) -1))
if b'?\x03\x00\x00\x00\x01\x00\x00\x00Leg2.' not in yeeha and b'?\x03\x00\x00\x00\x01\x00\x00\x00Leg2_r.' not in yeeha:
print (yeeha)
damn + 1
except FileNotFoundError:
print('Invalid file path!')
os._exit(1)
except IndexError:
print ('File successfully modified!')
nooffile = nooffile + 1
nost = 1
print ('\r\n'+str(nooffile)+' files read.',)
print ('\tANI_list.txt, End.dat, Group.txt, Head.txt, Tail.dat files ignored.')
print ('\r\nFiles successfully read! Hope you found what you are looking for!')
May I know whats wrong with it? Cause it works just fine for me
There are other problems with your code, but it sounds like you want to solve that yourself. When it comes to reading a particular byte range from a file, you can do that like this:
start = 1000
end = 1020 # Just examples
fileread.seek(start)
stuff = fileread.read(end - start)
That is, you start by seeking to the start position, and then you read as many bytes as you need (that is 20, in this example).
EDIT:
The only real "problem" with your code is that you're using enumerate in a strange and weird fashion that makes it completely unnecessary. The expression OFFSETS and OFFSETE will simply evaluate to OFFSETE, making OFFSETS and completely superfluous in it. Then, you're only actually using the first value from enumerate (the index), which makes enumerate itself superfluous: You could just have used range(len(OFFSETE)) instead.
More proper, however, would be to loop directly over the values instead of going via an index, like this:
for start, end in zip(OFFSETS, OFFSETE):
# snip
fileread.seek(int(start, 16) - 1)
yeeha = fileread.read(int(start, 16) - int(end, 16) - 1)
The other things are more like slight uglinesses that could be eliminated to make your code much nicer, but aren't strictly speaking wrong. Among them are that you don't need to represent your offsets as strings, but could use hexadecimal literals instead; that you open the file multiple times for no reason, that the hohohaha + 1 expression is completely superfluous, and that you could just bake the - 1 extra offsets directly into your actual offsets instead of adding it later.
I would write it closer to this instead:
OFFSETS = [0x3AF7 - 1, 0x3ECF - 2]
OFFSETE = [0x3B04 - 1, 0x3EDE - 2]
for r, d, f in os.walk("."):
for fn in f:
with open(os.path.join(r, fn), "rb") as fp:
for start, end in zip(OFFSETS, OFFSETE):
fp.seek(start)
yeeha = fp.read(start - end)
# Do whatever it is you need with yeeha

Use grep on file in Python

I have searched the grep answers on here and cannot find an answer. They all seem to search for a string in a file, not a list of strings from a file. I already have a search function that works, but grep does it WAY faster. I have a list of strings in a file sn.txt (with one string on each line, no deliminators). I want to search another file (Merge_EXP.exp) for lines that have a match and write it out to a new file. The file I am searching in has a half millions lines, so searching for a few thousand in there takes hours without grep.
When I run it from command prompt in windows, it does it in minutes:
grep --file=sn.txt Merge_EXP.exp > Merge_EXP_Out.exp
How can I call this same process from Python? I don't really want alternatives in Python because I already have one that works but takes a while. Unless you think you can significantly improve the performance of that:
def match_SN(serialnumb, Exp_Merge, output_exp):
fout = open(output_exp,'a')
f = open(Exp_Merge,'r')
# skip first line
f.readline()
for record in f:
record = record.strip().rstrip('\n')
if serialnumb in record:
fout.write (record + '\n')
f.close()
fout.close()
def main(Output_CSV, Exp_Merge, updated_exp):
# create a blank output
fout = open(updated_exp,'w')
# copy header records
f = open(Exp_Merge,'r')
header1 = f.readline()
fout.write(header1)
header2 = f.readline()
fout.write(header2)
fout.close()
f.close()
f_csv = open(Output_CSV,'r')
f_csv.readline()
for rec in f_csv:
rec_list = rec.split(",")
sn = rec_list[2]
sn = sn.strip().rstrip('\n')
match_SN(sn,Exp_Merge,updated_exp)
Here is a optimized version of pure python code:
def main(Output_CSV, Exp_Merge, updated_exp):
output_list = []
# copy header records
records = open(Exp_Merge,'r').readlines()
output_list = records[0:2]
serials = open(Output_CSV,'r').readlines()
serials = [x.split(",")[2].strip().rstrip('\n') for x in serials]
for s in serials:
items = [x for x in records if s in x]
output_list.extend(items)
open(updated_exp, "w").write("".join(output_list))
main("sn.txt", "merge_exp.exp", "outx.txt")
Input
sn.txt:
x,y,0011
x,y,0002
merge_exp.exp:
Header1
Header2
0011abc
0011bcd
5000n
5600m
6530j
0034k
2000lg
0002gg
Output
Header1
Header2
0011abc
0011bcd
0002gg
Try this out and see how much time it takes...
When I use full path to grep location it worked (I pass it the grep_loc, Serial_List, Export):
import os
Export_Dir = os.path.dirname(Export)
Export_Name = os.path.basename(Export)
Output = Export_Dir + "\Output_" + Export_Name
print "\nOutput: " + Output + "\n"
cmd = grep_loc + " --file=" + Serial_List + " " + Export + " > " + Output
print "grep usage: \n" + cmd + "\n"
os.system(cmd)
print "Output created\n"
I think you have not chosen the right title for your question: What you want to do is the equivalent of a database JOIN. You can use grep for that in this particular instance, because one of your files only has keys and no other information. However, I think it is likely (but of course I don't know your case) that in the future your sn.txt may also contain extra information.
So I would solve the generic case. There are multiple solutions:
import all data into a database, then do a LEFT JOIN (in sql) or equivalent
use a python large data tool
For the latter, you could try numpy or, recommended because you are working with strings, pandas. Pandas has an optimized merge routine, which is very fast in my experience (uses cython under the hood).
Here is pandas PSEUDO code to solve your problem. It is close to real code but I need to know the names of the columns that you want to match on. I assumed here the one column in sn.txt is called key, and the matching column in merge_txt is called sn. I also see you have two header lines in merge_exp, read the docs for that.
# PSEUDO CODE (but close)
import pandas
left = pandas.read_csv('sn.txt')
right = pandas.read_csv('merge_exp.exp')
out = pandas.merge(left, right, left_on="key", right_on="sn", how='left')
out.to_csv("outx.txt")

Categories