Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have two python scripts that I would like to combine and run as one program. But I am unsure about what exactly do I need to alter to make to the two scripts work together.
Here is my first code:
import random
with open('filename.txt') as fin:
lines = fin.readlines()
random.shuffle(lines)
for i, line in enumerate(lines):
if i >= 0 and i < 6800:
print(line, end='')
And here is the second:
import csv
with open ("Randomfile.txt") as f:
dict1 = {}
r = csv.reader(f,delimiter="\t")
for row in r:
a, b, v = row
dict1.setdefault((a,b),[]).append(v)
#for key in dict1:
#print(key[0])
#print(key[1])
#print(d[key][0]])
with open ("filename2.txt") as f:
dict2 = {}
r = csv.reader(f,delimiter="\t")
for row in r:
a, b, v = row
dict2.setdefault((a,b),[]).append(v)
#for key in dict2:
#print(key[0])
count = 0
for key1 in dict1:
for key2 in dict2:
if (key1[0] == key2[0]) and abs((float(key1[1].split(" ")[0])) - (float(key2[1].split(" ")[0]))) < 0:
count += 1
print(count)
What I usually do is using the first code, I extract a random set of elements. I then save it as a text file, open it in the second code and compare it with my other file to get my results.
However, I would essentially like to skip the saving and reopening process. I want to place my first script in my second and alter the code to make it run as one. So that when my elements are extracted, they are then automatically compared to my other file.
I have read up and watched videos about using
if __name__==__main__
But I don't really understand its function. So if that is the solution, I would love to understand how to use it in solving my problem.
Please help my figure out how I can combine the two scripts, altering them both to have the code run as one. I happy to cooperate and clarify anything.
[EDIT] My files are in the following format.
An example of my random file:
3 10045 0.120559958
4 157465 0.590642951
1 222471 0.947959795
3 222473 0.083341617
2 222541 0.054014337
5 222588 0.060296547
An example of my other file (that i am comparing to my random file):
2 143521109 4.57E-08
1 201466556 5.57E-08
1 11566373 8.43E-08
1 143627370 8.61E-08
6 98624499 1.02E-07
Imagine that instead of having two scripts, each script was a function and then they were both called from another function.
In other words, you would have the following:
def first_code():
...code of first script goes here...
def second_code():
...code of second script goes here...
def master_function():
first_code()
second_code()
Now, if master_function() is called, so are the other two. If you replace that definition with main:
if __name__ == "__main__":
first_code()
second_code()
It will automatically run if you execute the script from your command line.
Well, I modified your code as follows:
import csv
import random
with open('filename.txt') as fin:
lines = fin.readlines()
random.shuffle(lines)
rnd_str = []
for i, line in enumerate(lines):
if i >= 0 and i < 6800:
rnd_str.append(line)
r = rnd_str
dict1 = {}
for row in r:
a, b, v = row.split()
dict1.setdefault((a,b),[]).append(v)
with open ("filename2.txt") as f:
dict2 = {}
r = csv.reader(f,delimiter="\t")
dict2 = {}
for row in r:
a, b, v = row.split()
dict2.setdefault((a,b),[]).append(v)
count = 0
for key1 in dict1:
for key2 in dict2:
if (key1[0] == key2[0]) and ((float(key1[1]) - (float(key2[1]))) < 0):
count += 1
print(count)
Thus you have no need to save the random file and you can process its content in the second part of the code i.e. comparing with the other file's content.
Note: it was a raw in your code:
abs((float(key1[1].split(" ")[0])) - (float(key2[1].split(" ")[0]))) < 0
that made me smile, because how can be the abs(x) < 0?
Anyway the script works now it results 4 upon the samples you gave.
Instead of printing in the first program try to create a dictionary from that output and then you can work with that dict instead of copying output, saving and loading again. You will save a lot of time.
So in the first file try to create a dict and change print into append to that dict. You don't need another script, just extend the first one with the code from second one working with duct instead of new file.
Don't Worry. There's no need to alter your code. Just make a new script and put this in it:
def code1():
import firstprogram
def code2():
import secondprogram
code1()
code2()
This will run your first program and the your second. Just make sure to replace firstprogram and secondprogram with the names of your two programs.
One thing you can do is type in the second file name in the main file. For example, my first file name is 'main.py' and the second one is 'float.py' you can merge these file together by typing:
_merge_ = 'float.py'
in the main file, which is 'main.py'
Hope it works!!
Thanking you all in advance.
Regards,
VC
Related
I have a code that generates characters from 000000000000 to ffffffffffff which are written to a file.
I'm trying to implement a check to see if the program was closed so that I can read from the file, let's say at 00000000781B, and continue for-loop from the file.
The Variable "attempt" in (for attempt in to_attempt:) has tuple type and always starting from zero.
Is it possible to continue the for-loop from the specified value?
import itertools
f = open("G:/empty/last.txt", "r")
lines = f.readlines()
rand_string = str(lines[0])
f.close()
letters = '0123456789ABCDEF'
print(rand_string)
for length in range(1, 20):
to_attempt = itertools.product(letters, repeat=length)
for attempt in to_attempt:
gen_string = rand_string[length:] + ''.join(attempt)
print(gen_string)
You have to store the value on a file to keep track of what value was last being read from. I'm assuming the main for loop running from 000000000000 to ffffffffffff is the to_attempt one. All you need store the value of the for loop in a file. You can use a new variable to keep track of it.
try:
with open('save.txt','r') as reader:
save = int(reader.read())
except FileNotFoundError:
save = 0
#rest of the code
for i in range(save,len(to_attempt)):
with open('save.txt','r') as writer:
writer.write(i)
#rest of the code
I'm trying to count the number of lines contained by a file that looks like this:
-StartACheck
---Lines--
-EndACheck
-StartBCheck
---Lines--
-EndBCheck
with this:
count=0
z={}
for line in file:
s=re.search(r'\-+Start([A-Za-z0-9]+)Check',line)
if s:
e=s.group(1)
for line in file:
z.setdefault(e,[]).append(count)
q=re.search(r'\-+End',line)
if q:
count=0
break
for a,b in z.items():
print(a,len(b))
I want to basically store the number of lines present inside ACheck , BCheck etc in a dictionary but I keep getting the wrong output
Something like this
A,15
B,9
etc
I found out that even though the code should work, it doesn't because of the way the file is opened. I can't change the way it is opened and was looking for an implementation that only opens the file once but counts the same things and gives the exact same output without all the added functions of the newer python version.
This kind of problem can be resolved with a finite state machine. This is a complex matter that would need more explanation than what I could write here. You should look into it to further understand what you can do with it.
But first of all, I'm going to do a few presumptions:
The input file doesn't have any errors
If you have more than one section with the same name, you want their count to be combined
Even though you have tagged this question python 2.7, because you are using print(), I'll presume you are using python 3.x
Here's my suggestion:
import re
input_filename = "/home/evens/Temporaire/StackOverflow/Input_file-39339007.txt"
matchers = {
'start_section' : re.compile(r'\-+Start([A-Za-z0-9]+)Check'),
'end_section' : re.compile(r'\-+End'),
}
inside_section = False # Am I inside a section ?
section_name = None # Which section am I in ?
tally = {} # Sums of each section
with open(input_filename) as file_read:
for line in file_read:
line_matches = {k: v.match(line) for (k, v) in matchers.items()}
if inside_section:
if line_matches['end_section']:
future_inside_section = False
else:
future_inside_section = True
if section_name in tally:
tally[section_name] += 1
else:
tally[section_name] = 1
else:
if line_matches['start_section']:
future_inside_section = True
section_name = line_matches['start_section'].group(1)
# Just before we go in the future
inside_section = future_inside_section
for (a,b) in tally.items():
print('Total of all "{}" sections: {}'.format(a, b))
What this code does is determine :
How it should change its state (Am I going to be inside or outside a section on the next line?)
What else should be done:
Change the name of the section I'm in ?
Count this line in the present section ?
But even this code has its problems:
It doesn't check to see if a section start has a matching section end (-StartACheck could be ended by -EndATotallyInvalidCheck)
It doesn't handle the case where two consecutive section starts (or ends) are detected (Error? Nested sections?)
It doesn't handle the case where there are lines outside a section
And probably other corner cases.
How you want to handle these cases is up to you
This code could probably be further simplified but I don't want to be too complex for now.
Hope this helps. Don't hesitate to ask if you need further explanations.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
Help guys!!!
List of 150 text files,
One text file with query texts: (
SRR1005851
SRR1299210
SRR1021605
SRR1299782
SRR1299369
SRR1006158
...etc).
I want to search for each of this query texts from the list of 150 text files.
if for example SRR1005851 is found in at least 120 of the files, SRR1005851 will be appended in an output file.
the search will iterate all search query text and through all 150 files.
Summary: I am looking for which query text is found in at least 90% of the 150 files.
I don't think I fully understand your question. Posting your code and an example file would have been very helpful.
This code will count all entries in all files, then it will identify unique entries per file. After that, it will count each entry's occurrence in each file. Then, it will select only entries that appeared at least in 90% of all files.
Also, this code could have been shorter, but for readability's sake, I created many variables, with long, meaningful names.
Please read the comments ;)
import os
from collections import Counter
from sys import argv
# adjust your cut point
PERCENT_CUT = 0.9
# here we are going to save each file's entries, so we can sum them later
files_dict = {}
# total files seems to be the number you'll need to check against count
total_files = 0;
# raw total entries, even duplicates
total_entries = 0;
unique_entries = 0;
# first argument is script name, so have the second one be the folder to search
search_dir = argv[1]
# list everything under search dir - ideally only your input files
# CHECK HOW TO READ ONLY SPECIFIC FILE types if you have something inside the same folder
files_list = os.listdir(search_dir)
total_files = len(files_list)
print('Files READ:')
# iterate over each file found at given folder
for file_name in files_list:
print(" "+file_name)
file_object = open(search_dir+file_name, 'r')
# returns a list of entries with 'newline' stripped
file_entries = map(lambda it: it.strip("\r\n"), file_object.readlines())
# gotta count'em all
total_entries += len(file_entries)
# set doesn't allow duplicate entries
entries_set = set(file_entries)
#creates a dict from the set, set each key's value to 1.
file_entries_dict = dict.fromkeys(entries_set, 1)
# entries dict is now used differenty, each key will hold a COUNTER
files_dict[file_name] = Counter(file_entries_dict)
file_object.close();
print("\n\nALL ENTRIES COUNT: "+str(total_entries))
# now we create a dict that will hold each unique key's count so we can sum all dicts read from files
entries_dict = Counter({})
for file_dict_key, file_dict_value in files_dict.items():
print(str(file_dict_key)+" - "+str(file_dict_value))
entries_dict += file_dict_value
print("\nUNIQUE ENTRIES COUNT: "+str(len(entries_dict.keys())))
# print(entries_dict)
# 90% from your question
cut_line = total_files * PERCENT_CUT
print("\nNeeds at least "+str(int(cut_line))+" entries to be listed below")
#output dict is the final dict, where we put entries that were present in > 90% of the files.
output_dict = {}
# this is PYTHON 3 - CHECK YOUR VERSION as older versions might use iteritems() instead of items() in the line belows
for entry, count in entries_dict.items():
if count > cut_line:
output_dict[entry] = count;
print(output_dict)
I’m really new to Python but find myself working on the travelling salesman problem with multiple drivers. Currently I handle the routes as a list of lists but I’m having trouble getting the results out in a suitable .txt format. Each sub-list represents the locations for a driver to visit, which corresponds to a separate list of lat/long tuples. Something like:
driver_routes = [[0,5,3,0],[0,1,4,2,0]]
lat_long =[(lat0,long0),(lat1,long1)...(latn,longn)]
What I would like is a separate .txt file (named “Driver(n)”) that lists the lat/long pairs for that driver to visit.
When I was just working with a single driver, the following code worked fine for me:
optimised_locs = open('Optimisedroute.txt', 'w')
for x in driver_routes:
to_write = ','.join(map(str, lat_long[x]))
optimised_locs.write(to_write)
optimised_locs.write("\n")
optimised_locs.close()
So, I took the automated file naming code from Chris Gregg here (Printing out elements of list into separate text files in python) and tried to make an iterating loop for sublists:
num_drivers = 2
p = 0
while p < num_drivers:
for x in driver_routes[p]:
f = open("Driver"+str(p)+".txt","w")
to_write = ','.join(map(str, lat_long[x]))
print to_write # for testing
f.write(to_write)
f.write("\n")
f.close()
print "break" # for testing
p += 1
The output on my screen looks exactly how I would expect it to look and I generate .txt files with the correct name. However, I just get one tuple printed to each file, not the list that I expect. It’s probably very simple but I can't see why the while loop causes this issue. I would appreciate any suggestions and thank you in advance.
You're overwriting the contents of the file f on every iteration of your for loop because you're re-opening it. You just need to modify your code as follows to open the file once per driver:
while p < num_drivers:
f = open("Driver"+str(p)+".txt","w")
for x in driver_routes[p]:
to_write = ','.join(map(str, lat_long[x]))
print to_write # for testing
f.write(to_write)
f.write("\n")
f.close()
p += 1
Note that opening f is moved to outside the for loop.
I am complete newbie for programming and this is my first real program I am trying to write.
So I have this huge CSV file (hundreds of cols and thousands of rows) where I am trying to extract only few columns based on value in the field. It works fine and I get nice output, but the problem arises when I am try to encapsulate the same logic in a function.
it returns only first extracted row however print works fine.
I have been playing for this for hours and read other examples here and now my mind is mush.
import csv
import sys
newlogfile = csv.reader(open(sys.argv[1], 'rb'))
outLog = csv.writer(open('extracted.csv', 'w'))
def rowExtractor(logfile):
for row in logfile:
if row[32] == 'No':
a = []
a.append(row[44])
a.append(row[58])
a.append(row[83])
a.append(row[32])
return a
outLog.writerow(rowExtractor(newlogfile))
You are exiting prematurely. When you put return a inside the for loop, return gets called on the first iteration. Which means that only the firs iteration runs.
A simple way to do this would be to do:
def rowExtractor(logfile):
#output holds all of the rows
ouput = []
for row in logfile:
if row[32] == 'No':
a = []
a.append(row[44])
a.append(row[58])
a.append(row[83])
a.append(row[32])
output.append(a)
#notice that the return statement is outside of the for-loop
return output
outLog.writerows(rowExtractor(newlogfile))
You could also consider using yield
You've got a return statement in your function...when it hits that line, it will return (thus terminating your loop). You'd need yield instead.
See What does the "yield" keyword do in Python?