Python File Handling

Python File Handling - python

I need to write a program that will write and read to/from a file. I have code that works depending on the order I call functions.
def FileSetup():
TextWrite = open('Leaderboard.txt','w')
TextWrite.write('''| Driver | Car | Team | Grid | Fastest Lap | Race Time | Points |
''')
TextWrite.close()
TextRead = open('Leaderboard.txt','r')
return TextRead
def SortLeaderboard(LeaderBoard):
TextFile = open('Leaderboard.txt', 'w')
for items in LeaderBoard:
TextFile.write('\n| '+items['Driver']+' | '+str(items['Car'])+' | '+items['Team']+' | '+str(items['Grid'])+' | '+items['Fastest Lap']+' | '+items['Race Time']+' | '+str(items['Points'])+' |')
Leaderboard = Setup()
FileSetup()
TextRead = FileSetup()
TextFile = open('Leaderboard.txt','w')
SortLeaderboard(Leaderboard)
#TextRead = FileSetup()
str = TextRead.read()
print str
Depending on which TextRead = FileSetup() I comment out either SortLeaderboard or FileSetup will work. If I comment out the TextRead after I call SortLeaderboard then SortLeaderboard will write to the file and FileSetup won't. If I call it after then FileSetup will write to the file and Sortleaderboard won't.
The problem is only one function writes to the file. I am not able to get both to write to it.
I'm sorry this is really confusing this was the best way I could think of explaining it. If you need me to explain something in a different way just ask and I will try,

Avoid calling .open and .close directly and use context managers instead. They will handle closing the file object after you are done.
from contextlib import contextmanager
#contextmanager
def setup_file():
with open('Leaderboard.txt','w') as writefile:
myfile.write('''| Driver | Car | Team | Grid | Fastest Lap | Race Time | Points |
''')
with open('Leaderboard.txt','r') as myread:
yield myread
def SortLeaderboard(LeaderBoard):
with open('Leaderboard.txt', 'w') as myfile:
for items in LeaderBoard:
TextFile.write('\n| '+items['Driver']+' | '+str(items['Car'])+' | '+items['Team']+' | '+str(items['Grid'])+' | '+items['Fastest Lap']+' | '+items['Race Time']+' | '+str(items['Points'])+' |')
Leaderboard = Setup()
with setup_file() as TextRead:
SortLeaderboard(Leaderboard)
str = TextRead.read()
print str
Here you instantiate your own context manager setup_file that encapsulates preparing the file for use, and cleaning up afterwards.
A context manager is a python generator with a yield statement. The control of flow is passed from the generator to the body of the generator after the yield statement.
After the body of the generator has been executed, flow of control is passed back into the generator and cleanup work can be done.
open can function as a context manager by default, and takes care of closing the file object.

Related

ERRNO2 for WriteToText files in a Dataflow pipeline

I have a branching pipeline with multiple ParDo transforms that are merged and written to text file records in a GCS bucket.
I am receiving the following messages after my pipeline crashes:
The worker lost contact with the service.
RuntimeError: FileNotFoundError: [Errno 2] Not found: gs://MYBUCKET/JOBNAME.00000-of-00001.avro [while running 'WriteToText/WriteToText/Write/WriteImpl/WriteBundles/WriteBundles']
Which looks like it can't find the log file it's been writing to. It seems to be fine until a certain point when the error occurs. I'd like to wrap a try: / except: around it or a breakpoint, but I'm not even sure how to discover what the root cause is.
Is there a way to just write a single file? Or only open a file to write once? It's spamming thousands of output files into this bucket, which is something I'd like to eliminate and may be a factor.
with beam.Pipeline(argv=pipeline_args) as p:
csvlines = (
p | 'Read From CSV' >> beam.io.ReadFromText(known_args.input, skip_header_lines=1)
| 'Parse CSV to Dictionary' >> beam.ParDo(Split())
| 'Read Files into Memory' >> beam.ParDo(DownloadFilesDoFn())
| 'Windowing' >> beam.WindowInto(window.FixedWindows(20 * 60))
)
b1 = ( csvlines | 'Branch1' >> beam.ParDo(Branch1DoFn()) )
b2 = ( csvlines | 'Branch2' >> beam.ParDo(Branch2DoFn()) )
b3 = ( csvlines | 'Branch3' >> beam.ParDo(Branch3DoFn()) )
b4 = ( csvlines | 'Branch4' >> beam.ParDo(Branch4DoFn()) )
b5 = ( csvlines | 'Branch5' >> beam.ParDo(Branch5DoFn()) )
b6 = ( csvlines | 'Branch6' >> beam.ParDo(Branch6DoFn()) )
output = (
(b1,b2,b3,b4,b5,b6) | 'Merge PCollections' >> beam.Flatten()
| 'WriteToText' >> beam.io.Write(beam.io.textio.WriteToText(known_args.output))
)

This question is linked to this previous question which contains more detail about the implementation. The solution there suggested to create an instance of google.cloud.storage.Client() in the start_bundle() of every call to a ParDo(DoFn). This is connected to the same gcs bucket - given via the args in WriteToText(known_args.output)
class DownloadFilesDoFn(beam.DoFn):
def __init__(self):
import re
self.gcs_path_regex = re.compile(r'gs:\/\/([^\/]+)\/(.*)')
def start_bundle(self):
import google.cloud.storage
self.gcs = google.cloud.storage.Client()
def process(self, element):
self.file_match = self.gcs_path_regex.match(element['Url'])
self.bucket = self.gcs.get_bucket(self.file_match.group(1))
self.blob = self.bucket.get_blob(self.file_match.group(2))
self.f = self.blob.download_as_bytes()
It's likely the cause of this error is related to to having too many connections to the client. I'm not clear on good practice for this - since it's been suggested elsewhere that you can set up network connections in this way for each bundle.
Adding this to the end to remove the client object from memory at the end of the bundle should help close some unnecessary lingering connections.
def finish_bundle(self):
del self.gcs, self.gcs_path_regex

How to handle environment prerequisites in BDD?

I am working on an automation test project (using Pytest BDD) and I constantly hit the problem of how to handle environment prerequisites using BDD and Gherkin. For example, almost all of the scenarios require new entities created (users/admins/sites/organizations/etc) only in order to have something to work with.
I think I shouldn't write all the prerequisite actions in the 'Given' section (it seems anti-BDD), but I don't want to get lost about what scenario sets up what and how.
For example, I would never want to create this:
Scenario: A user can buy a ticket from a webshop.
Given an item available in the webshop
And a user is created
And the user has at least one payment option set up
And the user is logged in
When the user buys the item in the webshop
Then the user owns the item
How people generally write down these actions and entities in a way that is readable and maintainable in the future?

One Way - Using #BeforeClass (One time setup)
#RunWith(Cucumber.class)
#CucumberOptions(features = "classpath:features/checkoutmodule/registereduser/",
glue = {"com.ann.automation.test.steps" },
tags = { "#SignIn" },
plugin = { "pretty","json:target/cucumber.json",
"junit:target/cucumber-reports/Cucumber.xml", "html:target/cucumber-reports",
"com.cucumber.listener.ExtentCucumberFormatter"},
strict = false,
dryRun = false,
monochrome = true)
public class RunCuke {
// ----------------------------- Extent Report Configuration -----------------------------
#BeforeClass
public static void setup() {
// below is dummy code just to showcase
File newFile = new File(Constants.EXTENT_REPORT_PATH);
ExtentCucumberFormatter.initiateExtentCucumberFormatter(newFile,true);
ExtentCucumberFormatter.loadConfig(new File(Constants.EXTENT_CONFIG_FILE_PATH));
ExtentCucumberFormatter.addSystemInfo("Browser Name", Constants.BROWSER);
ExtentCucumberFormatter.addSystemInfo("Browser version", Constants.BROWSER_VERSION);
ExtentCucumberFormatter.addSystemInfo("Selenium version", Constants.SELENIUM_VERSION);
}
}
Other Way - Using Background (Setup before every scenario)
Cucumber provides a mechanism for this, by providing a Background keyword
where you can specify
a step or series of steps which are common to all the tests in the feature file.
a step or series of steps should run before each scenario in the
feature. Typically these will be Given steps, but you can use any steps
that you need to.
Example : Here, before every scenario/outline execution, we want user to take to home page of the site and search for a product. So let's see the implementation.
Background:
Given User is on Brand Home Page "https://www.anntaylor.com/"
Given User searches for a styleId for <Site> and makes product selection on the basis of given color and size
| Style_ID | Product_Size | Product_Color |
| TestData1 | TestData1 | TestData1 |
| TestData2 | TestData2 | TestData2 |
#guest_search
Scenario Outline: Validation of UseCase Guest User Order Placement flow from Search
Then Clicking on Cart icon shall take user to Shopping Bag
When Proceeding to checkout as "GuestUser" with emailId <EmailID> shall take user to Shipping Page
And Entering FN as <FName> LN as <LName> Add as <AddL1> ZCode as <ZipCode> PNo as <PhoneNo> shall take user to payment page
And Submitting CCardNo as <CCNo>" Month as <CCMonth> Year as <CCYear> and CVV as <CVV> shall take user to Order Review Page
Then Verify Order gets placed successfully
Examples: Checkout User Information
| EmailID | FName | LName | AddL1 | ZipCode | PhoneNo | CCNo | CCMonth | CCYear | CVV |
| TestData2 | TestData2 | TestData2 | TestData2 | TestData2 | TestData2 | TestData2 | TestData2 | TestData2 | TestData2 |
Last Way - Using #Before (Setup before every scenario)
#Before
public void setUpScenario(Scenario scenario){
log.info("***** FEATURE FILE :-- " + Utility.featureFileName(scenario.getId().split(";")[0].replace("-"," ")) + " --: *****");
log.info("---------- Scenario Name :-- " + scenario.getName() + "----------");
log.info("---------- Scenario Execution Started at " + Utility.getCurrentTime() + "----------");
BasePage.message=scenario;
ExtentTestManager.startTest("Scenario No . " + (x = x + 1) + " : " + scenario.getName());
ExtentTestManager.getTest().log(Status.INFO, "Scenario No . "+ x + " Started : - " + scenario.getName());
// Utility.setupAUTTestRecorder();
// --------- Opening Browser() before every test case execution for the URL given in Feature File. ---------
BaseSteps.getInstance().getBrowserInstantiation();
}

Use Cucumber's "Background" keyword:
Background in Cucumber is used to define a step or series of steps which are common to all the tests in the feature file. It allows you to add some context to the scenarios for a feature where it is defined.
See also the official docs.

how to parametrize a function for pytest such that parameters and test name are taken from a text file

how to parametrize a function for pytest such that parameter are taken from a text file and function name changes on each iteration for pytest-html report
text file format : Function_name | assert_value | Query for assertion from postgresql.
requirement : to create a pytest based framework
so far with my logic(it doesn't work):
with open("F_Query.txt","r") as ins1:
for F_Query in ins1:
#Name of function to be executed to be extracted from the file differentiated with delimeter " | "(leading and trailing space required)
F_name=F_Query.split(" | ")[0]
assert_val=F_Query.split(" | ")[1]
Query=F_Query.split(" | ")[2]
Loc_file=(Query.split(" ")[-5:])[0]
def f(text):
def r(y):
return y
r.__name__ = text
return r
c1.execute(Query)
assert(c1.rowcount() == assert_val), "Please check output file for records"
p = f(F_name)
can anyone explain how to get the function name changed with every iteration while parameters are being passed in pytest fun
latest changes (still doesn't work):
with open("F_Query.txt","r") as ins1:
for F_Query in ins1:
#Name of function to be executed to be extracted from the file differentiated with delimeter " | "(leading and trailing space required)
#F_name=F_Query.split(" | ")[0]
assert_val=int(F_Query.split(" | ")[1])
Query=F_Query.split(" | ")[2]
Query=Query.strip("\n")
#Loc_file=(Query.split(" ")[-5:])[0]
dictionary1[Query]=assert_val
#pytest.mark.parametrize('argumentname',dictionary1)
def test_values(argumentname):
c1.execute(dictionary1.keys(argumentname))
assert(c1.rowcount==dictionary1.values(argumentname))

python - How can I separate a single object file into multiple iterator

There is a large file called /tmp/largefile, and I want to handle the file line by line. My idea is as follow:
|---- Process-001 (largefile-part1)
|
|---- Process-002 (largefile-part2)
(largeefile) --> multiprocessing ----> |
|---- Process-003 (largefile-part3)
|
|---- Process-004 (largefile-part4)
Process-00x will handle a special part of largefile with demo function. demo is the process task function.
|-- [gevent worker001]
|
demo (Process func) ----> a part of largefile ---->|-- [gevent worker002]
|
|-- [gevent ....]
How to split a single object file into multiple iterator by process number?
#!/usr/bin/env python
# -*- coding: utf8 -*-
import multiprocessing
def worker(data):
print(data.strip())
return data
def demo(itertor):
'''Parallel programming: Just demo here'''
for _ in itertor:
worker(_)
return itertor
processes = []
f = open("/tmp/largefile")
for i in range(5):
proc = multiprocessing.Process(target=demo, args=(f, ))
processes.append(proc)
for process in processes:
process.start()
for process in processes:
process.join()

you can use itertools.tee to return 2 copies of an iterator
f = open("/tmp/largefile")
for i in range(5):
f,f_copy = itertools.tee(f)
proc = multiprocessing.Process(target=demo, args=(f_copy, ))
processes.append(proc)
I guess ... this probably isnt really what you want ... but its what your question asks for

fileno errors when calling kevent function

I was using kqueue in Python 2.7 to build a file monitor.
Initially, it kept outputting 0x4000 in flags and 0x1 in data, which turns out to be an error occurred. Then I found one example given by LaclefYoshi, and it works!
My code, giving errors.
import select
from time import sleep
fd = open('test').fileno()
kq = select.kqueue()
flags = select.KQ_EV_ADD | select.KQ_EV_ENABLE | select.KQ_EV_CLEAR
fflags = select.KQ_NOTE_DELETE | select.KQ_NOTE_WRITE | select.KQ_NOTE_EXTEND \
| select.KQ_NOTE_RENAME | select.KQ_NOTE_REVOKE | select.KQ_NOTE_ATTRIB\
| select.KQ_NOTE_LINK
ev = select.kevent(fd, filter=select.KQ_FILTER_VNODE,
flags=flags, fflags=fflags)
evl = kq.control([ev], 1)
print evl
while 1:
revents = kq.control([], 1, None)
print revents
sleep(1)
His version, give the file object directly to the kevent function.
fd = open('test')
ev = select.kevent(fd, filter=select.KQ_FILTER_VNODE,
flags=flags, fflags=fflags)
Another version, call a fileno method in kevent.
fd = open('test')
ev = select.kevent(fd.fileno(), filter=select.KQ_FILTER_VNODE,
flags=flags, fflags=fflags)
But now I'm really confused of why the first version doesn't work while the third one works well. These two should be the same thing, right?
The other question I have is, what exactly is a file object here in Python? I've seen the ident is actually an integer here, which should be the file descriptor instead of file objects. How does it work here!?
Thanks!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python File Handling - python

Related

ERRNO2 for WriteToText files in a Dataflow pipeline

How to handle environment prerequisites in BDD?

how to parametrize a function for pytest such that parameters and test name are taken from a text file

python - How can I separate a single object file into multiple iterator

fileno errors when calling kevent function

Categories

Resources