For every iteration in my loop for, I need to give 'the number of my iteration' as a name for the file, for example, the goal is to save:
my first iteration in the first file.
my second iteration in the second file.
....
I use for that the library numpy, but my code doesn't give me the solution that i need, in fact my actual code oblige me to enter the name of the file after each iteration, that is easy if I have 6 or 7 iteration, but i am in the case that I have 100 iteration, it doesn't make sense:
for line, a in enumerate(Plaintxt_file):
#instruction
#result
fileName = raw_input()
if(fileName!='end'):
fileName = r'C:\\Users\\My_resul\\Win_My_Scripts\\'+fileName
np.save(fileName+'.npy',Result)
ser.close()
I would be very grateful if you could help me.
Create your file name from the line number:
for line, a in enumerate(Plaintxt_file):
fileName = r'C:\Users\My_resul\Win_My_Scripts\file_{}.npy'.format(line)
np.save(fileName, Result)
This start with file name file_0.npy.
If you like to start with 1, specify the starting index in enumerate:
for line, a in enumerate(Plaintxt_file, 1):
Of course, this assumes you don't need line starting with 0 anywhere else.
I'm not 100% sure what your issue is, but as far as I can tell, you just need some string formatting for the filename.
So, you want, say 100 files, each one created after an iteration. The easiest way to do this would probably be to use something like the following:
for line, a in enumerate():
#do work
filename = "C:\\SaveDir\\OutputFile{0}.txt".format(line)
np.save(filename, Result)
That won't be 100% accurate to your needs, but hopefully that will give you the idea.
If you're just after, say, 100 blank files with the naming scheme "0.npy", "1.npy", all the way up to "n-1.npy", a simple for loop would do the job (no need for numpy!):
n = 100
for i in range(n):
open(str(i) + ".npy", 'a').close()
This loop runs for n iterations and spits out empty files with the filename corresponding to the current iteration
If you do not care about the sequence of the files and you do not want the files from multiple runs of the loop to overwrite each other, you can use random unique IDs.
from uuid import uuid4
# ...
for a in Plaintxt_file:
fileName = 'C:\\Users\\My_resul\\Win_My_Scripts\\file_{}.npy'.format(uuid4())
np.save(fileName, Result)
Sidenote:
Do not use raw strings and escaped backslashes together.
It's either r"C:\path" or "C:\\path" - unless you want double backslashes in the path. I do not know if Windows likes them.
Related
I was not clear enough in my last question, and so I'll explain my question more this time.
I am creating 2 separate programs, where the first one will create a text file with 2 generated numbers, one on line 1 and the second on line 2.
Basically I saved it like this:
In this example I'm not generating numbers, just assigning them quickly.
a = 15
b = 16
saving = open('filename.txt', "w")
saving.write(a+"\n")
saving.write(b+"\n")
saving.close()
Then I opened it on the next one:
opening = open('filename.txt', "w")
a = opening.read()
opening.close()
print(a) #This will print the whole document, but I need each line to be differnet
Now I got the whole file loaded into 'a', but I need it split up, which is something that i have not got a clue on how to do. I don't believe creating a list will help, as I need each number (Variables a and b from program 1) to be different variables in program 2. The reason I need them as 2 separate variables is because I need to divide it by a different number. If I do need to do a list, please say. I tried finding an answer for about an hour in total, though I couldn't find anything.
The reason I can't post the whole program is because I haven't got access to it from here, and no, this is not cheating as we are free to research and ask questions outside the classroom, if someone wonders about that after looking at my previous question.
If you need more info please put it in a comment and I'll respond ASAP.
opening = open('filename.txt') # "w" is not necessary since you're opening it read-only
a = [b.split() for b in opening.readlines()] # create a list of each line and strip the newline "\n" character
print(a[0]) # print first line
print(a[1]) # print second line
I have small scraping script. I have file with 2000 names and I use these names to search for Video IDs in YouTube. Because of the amount it takes pretty long time to get all the IDs so I can't do that in one time. What I want is to find where I ended my last scrape and then start from that position. What is the best way to do this? I was thinking about adding the used name to the list and then just check if it's in the list, if no - start scraping but maybe there's a better way to do this? (I hope yes).
Part that takes name from file and scraped IDs. What I want is when I quit scraping, next time when I start it, it would run not from beginning but from point where it ended last time:
index = 0
for name in itertools.islice(f, index, None):
parameters = {'key': api_key, 'q': name}
request_url = requests.get('https://www.googleapis.com/youtube/v3/search?part=snippet&maxResults=1&type=video&fields=items%2Fid', params = parameters)
videoid = json.loads(request_url.text)
if 'error' in videoid:
pass
else:
index += 1
id_file.write(videoid['items'][0]['id']['videoId'] + '\n')
print videoid['items'][0]['id']['videoId']
You could just remember the index number of the last scraped entry. Every time you finish scraping one entry, increment a counter, then assuming the entries in your text file don't change order, just pick up again at that number?
The simplest answer here is probably mitim's answer. Just keep a file that you rewrite with the last-processed index after each line. For example:
savepath = os.path.expanduser('~/.myprogram.lines')
skiplines = 0
try:
with open(savepath) as f:
skiplines = int(f.read())
except:
pass
with open('names.txt') as f:
for linenumber, line in itertools.islice(enumerate(f), skiplines, None):
do_stuff(line)
with open(savepath, 'w') as f:
f.write(str(linenumber))
However, there are other ways you could do this that might make more sense for your use case.
For example, you could rewrite the "names" file after each name is processed to remove the first line. Or, maybe better, preprocess the list into an anydbm (or even sqlite3) database, so you can more easily remove (or mark) names once they're done.
Or, if you might run against different files, and need to keep a progress for each one, you could store a separate .lines file for each one (probably in a ~/.myprogram directory, rather than flooding the top-level home directory), or use an anydbm mapping pathnames to lines done.
This is my python file:-
TestCases-2
Input-5
Output-1,1,2,3,5
Input-7
Ouput-1,1,2,3,5,8,13
What I want is this:-
A variable test_no = 2 (No. of testcases)
A list testCaseInput = [5,7]
A list testCaseOutput = [[1,1,2,3,5],[1,1,2,3,5,8,13]]
I've tried doing it in this way:
testInput = testCase.readline(-10)
for i in range(0,int(testInput)):
testCaseInput = testCase.readline(-6)
testCaseOutput = testCase.readline(-7)
The next step would be to strip the numbers on the basis of (','), and then put them in a list.
Weirdly, the readline(-6) is not giving desired results.
Is there a better way to do this, which obviously I'm missing out on.
I don't mind using serialization here but I want to make it very simple for someone to write a text file as the one I have shown and then take the data out of it. How to do that?
A negative argument to the readline method specifies the number of bytes to read. I don't think this is what you want to be doing.
Instead, it is simpler to pull everything into a list all at once with readlines():
with open('data.txt') as f:
full_lines = f.readlines()
# parse full lines to get the text to right of "-"
lines = [line.partition('-')[2].rstrip() for line in full_lines]
numcases = int(lines[0])
for i in range(1, len(lines), 2):
caseinput = lines[i]
caseoutput = lines[i+1]
...
The idea here is to separate concerns (the source of the data, the parsing of '-', and the business logic of what to do with the cases). That is better than having a readline() and redundant parsing logic at every step.
I'm not sure if I follow exactly what you're trying to do, but I guess I'd try something like this:
testCaseIn = [];
testCaseOut = [];
for line in testInput:
if (line.startsWith("Input")):
testCaseIn.append(giveMeAList(line.split("-")[1]));
elif (line.startsWith("Output")):
testCaseOut.append(giveMeAList(line.split("-")[1]));
where giveMeAList() is a function that takes a comma seperated list of numbers, and generates a list datathing from it.
I didn't test this code, but I've written stuff that uses this kind of structure when I've wanted to make configuration files in the past.
You can use regex for this and it makes it much easier. See question: python: multiline regular expression
For your case, try this:
import re
s = open("input.txt","r").read()
(inputs,outputs) = zip(*re.findall(r"Input-(?P<input>.*)\nOutput-(?P<output>.*)\n",s))
and then split(",") each output element as required
If you do it this way you get the benefit that you don't need the first line in your input file so you don't need to specify how many entries you have in advance.
You can also take away the unzip (that's the zip(*...) ) from the code above, and then you can deal with each input and output a pair at a time. My guess is that is in fact exactly what you are trying to do.
EDIT Wanted to give you the full example of what I meant just then. I'm assuming this is for a testing script so I would say use the power of the pattern matching iterator to help keep your code shorter and simpler:
for (input,output) in re.findall(r"Input-(?P<input>.*)\nOutput-(?P<output>.*)\n",s):
expectedResults = output.split(",")
testResults = runTest(input)
// compare testResults and expectedResults ...
This line has an error:
Ouput-1,1,2,3,5,8,13 // it should be 'Output' not 'Ouput
This should work:
testCase = open('in.txt', 'r')
testInput = int(testCase.readline().replace("TestCases-",""))
for i in range(0,int(testInput)):
testCaseInput = testCase.readline().replace("Input-","")
testCaseOutput = testCase.readline().replace("Output-","").split(",")
I need to write a program like this:
Write a program that reads a file .picasa.ini and copies pictures in new files, whose names are the same as identification numbers of person on these pictures (eg. 8ff985a43603dbf8.jpg). If there are more person on the picture it makes more copies. If a person is on more pictures, later override earlier copies of pictures; if a person 8ff985a43603dbf8 may appear in more pictures, only one file with this name will exist. You must presume that we have a simple file .picasa.ini.
I have an .ini, that consists:
[img_8538.jpg]
faces=rect64(4ac022d1820c8624),**d5a2d2f6f0d7ccbc**
backuphash=46512
[img_8551.jpg]
faces=rect64(acb64583d1eb84cb),**2623af3d8cb8e040**;rect64(58bf441388df9592),**d85d127e5c45cdc2**
backuphash=8108
...
Is this a good way to start this program?
for line in open('C:\Users\Admin\Desktop\podatki-picasa\.picasa.ini'):
if line.startswith('faces'):
line.split() # what must I do here to split the bolded words?
Is there a better way to do this? Remember the .jpg file must be created with a new name, so I think I should link the current .jpg file with the bolded one.
Consider using ConfigParser. Then you will have to split each value by hand, as you describe.
import ConfigParser
import string
config = ConfigParser.ConfigParser()
config.read('C:\Users\Admin\Desktop\podatki-picasa\.picasa.ini')
imgs = []
for item in config.sections():
imgs.append(config.get(item, 'faces'))
This is still work in progress. Just want to ask if it's correct.
edit:
Still don't know hot to split the bolded words out of there. This split function really is a pain for me.
Suggestions:
Your lines don't start with 'faces', so your second line won't work the way you want it to. Depending on how the rest of the file looks, you might only need to check whether the line is empty or not at that point.
To get the information you need, first split at ',' and work from there
Try at a solution: The elements you need seem to always have a ',' before them, so you can start by splitting at the ',' sign and taking everything from the 1-index elemnt onwards [1::] . Then if what I am thinking is correct, you split those elements twice again: at the ";" and take the 0-index element of that and at that " ", again taking the 0-index element.
for line in open('thingy.ini'):
if line != "\n":
personelements = line.split(",")[1::]
for person in personelements:
personstring = person.split(";")[0].split(" ")[0]
print personstring
works for me to get:
d5a2d2f6f0d7ccbc
2623af3d8cb8e040
d85d127e5c45cdc2
Trying to write a code that searches hash values for specific string's (input by user) and returns the hash if searchquery is present in that line.
Doing this to kind of just learn python a bit more, but it could be a real world application used by an HR department to search a .csv resume database for specific words in each resume.
I'd like this program to look through a .csv file that has three entries per line (id#;applicant name;resume text)
I set it up so that it creates a hash, then created a string for the resume text hash entry, and am trying to use the .find() function to return the entire hash for each instance.
What i'd like is if the word "gpa" is used as a search query and it is found in s['resumetext'] for three applicants(rows in .csv file), it prints the id, name, and resume for every row that has it.(All three applicants)
As it is right now, my program prints the first row in the .csv file(print resume['id'], resume['name'], resume['resumetext']) no matter what the searchquery is, whether it's in the resumetext or not.
lastly, are there better ways to doing this, by searching word documents, pdf's and .txt files in a folder for specific words using python (i've just started reading about the re module and am wondering if this may be the route, rather than putting everything in a .csv file.)
def find_details(id2find):
resumes_f=open("resume_data.csv")
for each_line in resumes_f:
s={}
(s['id'], s['name'], s['resumetext']) = each_line.split(";")
resumetext = str(s['resumetext'])
if resumetext.find(id2find):
return(s)
else:
print "No data matches your search query. Please try again"
searchquery = raw_input("please enter your search term")
resume = find_details(searchquery)
if resume:
print resume['id'], resume['name'], resume['resumetext']
The line
resumetext = str(s['resumetext'])
is redundant, because s['resumetext'] is already a string (since it comes as one of the results from a .split call). So, you can merge this line and the next into
if id2find in s['resumetext']: ...
Your following else is misaligned -- with it placed like that, you'll print the message over and over again. You want to place it after the for loop (and the else isn't needed, though it would work), so I'd suggest:
for each_line in resumes_f:
s = dict(zip('id name resumetext'.split(), each_line.split(";"))
if id2find in s['resumetext']:
return(s)
print "No data matches your search query. Please try again"
I've also shown an alternative way to build dict s, although yours is fine too.
What #Justin Peel said. Also to be more pythonic I would say change
if resumetext.find(id2find) != -1: to if id2find in resumetext:
A few more changes: you might want to lower case the comparison and user input so it matches GPA, gpa, Gpa, etc. You can do this by doing searchquery = raw_input("please enter your search term").lower() and resumetext = s['resumetext'].lower(). You'll note I removed the explicit cast around s['resumetext'] as it's not needed.
One change that I recommend for your code is changing
if resumetext.find(id2find):
to
if resumetext.find(id2find) != -1:
because find() returns -1 if id2find wasn't in resumetext. Otherwise, it returns the index where id2find is first found in resumetext, which could be 0. As #Personman commented, this would give you the false positive because -1 is interpreted as True in Python.
I think that problem has something to do with the fact that find_details() only returns the first entry for which the search string is found in resumetext. It might be good to make find_details() into a generator instead and then you could iterate over it and print the found records out one by one.