Use only a certain portion of file in every iteration - python

I am using an external API for Python (specifically 3.x) to get search results based on certain keywords located in a .txt file. However, due to a constraint as to how many keywords I can search for in every time interval (assume I need an hourly wait) I run the script, I can only use a portion of the keywords (say 50 keywords). How can I, Pythonically, use only a portion of the keywords in every iteration?
Let's assume I have the following list of keywords in the .txt file myWords.txt:
Lorem #0
ipsum #1
dolor #2
sit #3
amet #4
...
vitae #167
I want to use on the keywords found in 0-49 (i.e. the first 50 lines) on the first iteration, 50-99 on the second, 100-149 on the third, and 150-167 on the fourth and last iteration.
This is, of course, possible by reading the whole file, read an iteration counter saved elsewhere, and then choose the keyword range residing in that iterable part of the complete list. However, in what I'd like to do, I do not want to have an external counter, but rather only have my Python script and the myWords.txt where the counter is dealt with in the Python code itself.
I want to take only the keywords that I should be taking in the current run of the script (depending on the (total number of keywords)/50). At the same time, if I were to add any new keywords at the end of the myWords.txt, it should adjust the iterations accordingly, and if needed, add new iterations.

As far as I know there is no way to persist the keywords used between different invocations of your script. However, you do have a couple of choices in how you implement a "persistent storage" of the information that you need in different invocations of the script.
Instead of just having a single input file named myWords.txt, you could have two files. One file containing keywords that you want to search for and one file containing keywords that you've already searched for. As you search for keywords you remove them from the one file and place them in the other.
You can implement a persistent storage strategy, that stores the words.
(The easiest thing and what I would do) is just have a file named next_index.txt and store the last index from your iteration.
Here is an implementation of what I would do:
Create a next position file
echo 0 > next_pos.txt
Now do your work
with open('next_pos.txt') as fh:
next_pos = int(fh.read().strip())
rows_to_search = 2 # This would be 50 in your case
keywords = list()
with open('myWords.txt') as fh:
fh.seek(next_pos)
for _ in range(rows_to_search):
keyword = fh.readline().strip()
keywords.append(keyword)
next_pos = fh.tell()
# Store cursor location in file.
with open('next_pos.txt', 'w') as fh:
fh.write(str(next_pos))
# Make your API call
# Rinse, Wash, Repeat
As I've stated you have lots of options, and I don't know if any one way is more Pythonic than any other, but whatever you do try and keep it simple.

Try this. Modify for your needs.
$ cat foo
1
2
3
4
5
6
7
8
9
10
cat getlines.py
import sys
def getlines(filename, limit):
with open(filename, 'r') as handle:
keys = []
for idx, line in enumerate(handle):
if idx % limit == 0 and idx != 0:
yield keys
keys = []
keys.append(line.strip())
print(list(getlines('foo', 2)))
print(list(getlines('foo', 3)))
print(list(getlines('foo', 4)))

Related

Pulling Vars from lists and applying them to a function

I'm working on a project in my work using purely Python 3:
If I take my scanner, (Because I work in inventory) and anything I scan goes into a text doc, and I scan the location "117" , and then I scan any device in any other location, (the proceeding lines in the text doc "100203") and I run the script and it plugs in '117' in the search on our database and changes each of the devices (whether they were assigned to that location or not) into that location, (Validating those devices are in location '117')
My main question is the 3rd objective down from the Objectives list below that doesn't have "Done" after it.
Objective:
Pull strings from a text document, convert it into a dictionary. = (Text_Dictionary) **Done**
Assign the first var in the dictionary to a separate var. = (First_Line) **Done**
All proceeding var's greater then the first var in the dictionary should be passed into a function individually. = (Proceeding_Lines)
Side note: The code should loop in a fashion that should (.pop) the var from the dictionary/list, But I'm open for other alternatives. (Not mandatory)
What I already have is:
Project.py:
1 import re
2 import os
3 import time
4 import sys
5
6 with open(r"C:\Users\...\text_dictionary.txt") as f:
7 Text_Dictionary = [line.rstrip('\n') for line in
8 open(r"C:\Users\...\text_dictionary.txt")]
9
10 Text_Dict = (Text_Dictionary)
11 First_Line = (Text_Dictionary[0])
12
13 print("The first line is: ", First_Line)
14
15 end = (len(Text_Dictionary) + 1)
16 i = (len(Text_Dictionary))
17
What I have isn't much on the surface, but I have another "*.py" file fill of code that I am going to copy in for the action that I wish to preform on each of the vars in the Text_Dictionary.txt. Lines 15 - 16 was me messing with what I thought might solve this.
In the imported text document, the var's look very close to this (Same length)(All digits):
Text_Dictionary.txt:
117
23000
53455
23454
34534
...
Note: These values will change for each time the code is ran, meaning someone will type/scan in these lines of digits each time.
Explained concept:
Ideally, I would like to have the first line point towards a direction, and the rest of the digits would follow; however, each (Example: '53455') needs to be ran separately then the next in line and (Example: '117') would be where '53455' goes. You could say the first line is static throughout the code, unless otherwise changed inText_Dictionary.txt. '117'is ran in conjunction with each iteration proceeding it.
Background:
This is for inventory management for my office, I am in no way payed for doing this, but it would make my job a heck-of-a-lot easier. Also, I know basic python to get myself around, but this kinda stumped me. Thank you to whoever answers!
I've no clue what you're asking, but I'm going to take a guess. Before I do so, your code was annoying me:
with open("file.txt") as f:
product_ids = [line.strip() for line in f if not line.isspace()]
There. That's all you need. It protects against blank lines in the file and weird invisible spaces too, this way. I decided to leave the data as strings because it probably represents an inventory ID, and in the future that might be upgraded to "53455-b.42454#62dkMlwee".
I'm going to hazard a guess that you want to run different code depending on the number at the top. If so, you can use a dictionary containing functions. You said that you wanted to run code from another file, so this is another_file.py:
__all__ = ["dispatch_whatever"]
dispatch_whatever = {}
def descriptive_name_for_117(product_id):
pass
dispatch_whatever["117"] = descriptive_name_for_117
And back in main_program.py, which is stored in the same directory:
from another_file import dispatch_whatever
for product_id in product_ids[1:]:
dispatch_whatever[product_ids[0]](product_id)

How to give a name for a file?

For every iteration in my loop for, I need to give 'the number of my iteration' as a name for the file, for example, the goal is to save:
my first iteration in the first file.
my second iteration in the second file.
....
I use for that the library numpy, but my code doesn't give me the solution that i need, in fact my actual code oblige me to enter the name of the file after each iteration, that is easy if I have 6 or 7 iteration, but i am in the case that I have 100 iteration, it doesn't make sense:
for line, a in enumerate(Plaintxt_file):
#instruction
#result
fileName = raw_input()
if(fileName!='end'):
fileName = r'C:\\Users\\My_resul\\Win_My_Scripts\\'+fileName
np.save(fileName+'.npy',Result)
ser.close()
I would be very grateful if you could help me.
Create your file name from the line number:
for line, a in enumerate(Plaintxt_file):
fileName = r'C:\Users\My_resul\Win_My_Scripts\file_{}.npy'.format(line)
np.save(fileName, Result)
This start with file name file_0.npy.
If you like to start with 1, specify the starting index in enumerate:
for line, a in enumerate(Plaintxt_file, 1):
Of course, this assumes you don't need line starting with 0 anywhere else.
I'm not 100% sure what your issue is, but as far as I can tell, you just need some string formatting for the filename.
So, you want, say 100 files, each one created after an iteration. The easiest way to do this would probably be to use something like the following:
for line, a in enumerate():
#do work
filename = "C:\\SaveDir\\OutputFile{0}.txt".format(line)
np.save(filename, Result)
That won't be 100% accurate to your needs, but hopefully that will give you the idea.
If you're just after, say, 100 blank files with the naming scheme "0.npy", "1.npy", all the way up to "n-1.npy", a simple for loop would do the job (no need for numpy!):
n = 100
for i in range(n):
open(str(i) + ".npy", 'a').close()
This loop runs for n iterations and spits out empty files with the filename corresponding to the current iteration
If you do not care about the sequence of the files and you do not want the files from multiple runs of the loop to overwrite each other, you can use random unique IDs.
from uuid import uuid4
# ...
for a in Plaintxt_file:
fileName = 'C:\\Users\\My_resul\\Win_My_Scripts\\file_{}.npy'.format(uuid4())
np.save(fileName, Result)
Sidenote:
Do not use raw strings and escaped backslashes together.
It's either r"C:\path" or "C:\\path" - unless you want double backslashes in the path. I do not know if Windows likes them.

Extracting specific variables of a line with linecache

I'm currently using the python linecache module to grab specific lines from a given text document and create a new file with said line. For example, part of the code looks like:
cs = linecache.getline('variables.txt', 7)
cs_1 = open("lo_cs", "w")
cs_1.write(str(cs))
cs_1.close()
The problem is that within variables.txt, line 7 is given by:
variable7 = 3423
for instance. I want the new file, lo_cs, however, to contain only the actual value '3423' and not the entire line of text. Further, I want to insert the whole 'getline' command into an if loop so that if variable7 is left blank, another action is taken. Is there a way to use linecache to check the space following 'variable7 = ' to see if there is anything entered there, and if so, to grab only that particular value or string?
I know (but don't really understand) that bash scripts seem to use '$' as sort of a placeholder for inserting or calling a given file. I think I need to implement something similar to that...
I thought about having instructions in the text file indicating that the value should be specified in the line below -- to avoid selecting out only segments of a line -- but that allows for one to accidentally enter in superfluous breaks, which would mess up all subsequent 'getline' commands, in terms of which line needs to be selected.
Any help in the right direction would be greatly appreciated!
You can use the following method to wrap the functionality you need:
def parseline(l):
sp = l.split('=')
return sp[0], int(sp[1]) if len(sp) > 1 else None
or if you don't need the variable name:
def parseline(l):
sp = l.split('=')
return int(sp[1]) if len(sp) > 1 and sp[1].strip() != '' else None
and then use:
csval = parseline(linecache.getline('variables.txt', 7))
You can later place conditions on csval to see if it's None, and if it is, take another action.

How to assign a single variable to a specific line in python?

I was not clear enough in my last question, and so I'll explain my question more this time.
I am creating 2 separate programs, where the first one will create a text file with 2 generated numbers, one on line 1 and the second on line 2.
Basically I saved it like this:
In this example I'm not generating numbers, just assigning them quickly.
a = 15
b = 16
saving = open('filename.txt', "w")
saving.write(a+"\n")
saving.write(b+"\n")
saving.close()
Then I opened it on the next one:
opening = open('filename.txt', "w")
a = opening.read()
opening.close()
print(a) #This will print the whole document, but I need each line to be differnet
Now I got the whole file loaded into 'a', but I need it split up, which is something that i have not got a clue on how to do. I don't believe creating a list will help, as I need each number (Variables a and b from program 1) to be different variables in program 2. The reason I need them as 2 separate variables is because I need to divide it by a different number. If I do need to do a list, please say. I tried finding an answer for about an hour in total, though I couldn't find anything.
The reason I can't post the whole program is because I haven't got access to it from here, and no, this is not cheating as we are free to research and ask questions outside the classroom, if someone wonders about that after looking at my previous question.
If you need more info please put it in a comment and I'll respond ASAP.
opening = open('filename.txt') # "w" is not necessary since you're opening it read-only
a = [b.split() for b in opening.readlines()] # create a list of each line and strip the newline "\n" character
print(a[0]) # print first line
print(a[1]) # print second line

Python - How to check if the name from file is used?

I have small scraping script. I have file with 2000 names and I use these names to search for Video IDs in YouTube. Because of the amount it takes pretty long time to get all the IDs so I can't do that in one time. What I want is to find where I ended my last scrape and then start from that position. What is the best way to do this? I was thinking about adding the used name to the list and then just check if it's in the list, if no - start scraping but maybe there's a better way to do this? (I hope yes).
Part that takes name from file and scraped IDs. What I want is when I quit scraping, next time when I start it, it would run not from beginning but from point where it ended last time:
index = 0
for name in itertools.islice(f, index, None):
parameters = {'key': api_key, 'q': name}
request_url = requests.get('https://www.googleapis.com/youtube/v3/search?part=snippet&maxResults=1&type=video&fields=items%2Fid', params = parameters)
videoid = json.loads(request_url.text)
if 'error' in videoid:
pass
else:
index += 1
id_file.write(videoid['items'][0]['id']['videoId'] + '\n')
print videoid['items'][0]['id']['videoId']
You could just remember the index number of the last scraped entry. Every time you finish scraping one entry, increment a counter, then assuming the entries in your text file don't change order, just pick up again at that number?
The simplest answer here is probably mitim's answer. Just keep a file that you rewrite with the last-processed index after each line. For example:
savepath = os.path.expanduser('~/.myprogram.lines')
skiplines = 0
try:
with open(savepath) as f:
skiplines = int(f.read())
except:
pass
with open('names.txt') as f:
for linenumber, line in itertools.islice(enumerate(f), skiplines, None):
do_stuff(line)
with open(savepath, 'w') as f:
f.write(str(linenumber))
However, there are other ways you could do this that might make more sense for your use case.
For example, you could rewrite the "names" file after each name is processed to remove the first line. Or, maybe better, preprocess the list into an anydbm (or even sqlite3) database, so you can more easily remove (or mark) names once they're done.
Or, if you might run against different files, and need to keep a progress for each one, you could store a separate .lines file for each one (probably in a ~/.myprogram directory, rather than flooding the top-level home directory), or use an anydbm mapping pathnames to lines done.

Categories