How to read and send CSV column data to PyTest test cases - python

We have a utility that calls APIs and saves their responses to CSV. That CSV (resp.csv) has API request (column A) along with headers
(column C) and payload (column B) in it. And also the body of their response is stored in column D, and the response code in column E (not visible in the CSV image).
The CSV file looks like this:
I want to pass each response to a set of PyTest test cases which will be having some assertion specific to the response.
For checking the status code I can do this via a function call, which returns the response code before writing to CSV. But the requirement is to read responses from CSV and pass them for assertion/test cases:
#pytest.mark.parametrize('test_input', check_response)
def test_APIResponse(test_input):
print(check_response)
assert test_input == 200, "Test pass"
How can I call the response body stored at CSV (column D) and do assertion via using PyTest test cases?
Can someone guide me with this?
Thanks

Check this out, I think it might help docs. For your specific example you could de something like
import pytest
def load_cases():
# read_csv
class Case:
def __init__(self, number):
self.i = number
def __repr__(self):
return '{number}'.format(number=self.i)
return [Case(i) for i in range(5)]
def parametrize(name, values):
# function for readable description
return pytest.mark.parametrize(name, values, ids=map(repr, values))
#parametrize("case", load_cases())
def test_responses(case):
print(case.i)
You are creating class Case and store everything you need inside this class then access properties from test. You also can play around with indirect parametrization for fixtures, but don't overcomplicate your code.
To read a specific column use something like pandas or just split your string.

I wrote a package called Parametrize From File that can be used to do this. I gave a detailed example of how to load test parameters from an XLSX file in another Stack Overflow post, but I'll briefly reiterate the important points here.
The only complication is that Parametrize From File expects to be able to load test cases as a dictionary of lists of dictionaries (see the docs for more info). This layout makes sense for YAML/TOML/NestedText files, but not for XLSX/CSV files. So we need to provide a function that loads the XSLX/CSV file in question and converts it to the expected format. pandas makes this pretty easy to do if you're willing to add the dependency, otherwise it probably wouldn't be too hard to write something yourself.
Edit: Here's a more concrete example. To begin, here's what the CSV file might look like:
request_,payload,header,response,code
http://localhost:8080/c/u/t,{"ci":""},{},{"ss":""},200
http://localhost:8080/c/u/t?Id=x,{"ci":""},{},{"res":""},200
A few things to note:
The first row gives a name to each column. The code I've written relies on this, and use those same names as the arguments to the parametrized test function. If your file doesn't have these headers, you would need to hard-code names each column.
The name "request" is reserved by pytest, so we have to use "request_" here.
Here's what the corresponding test script might look like:
import parametrize_from_file as pff
from csv import DictReader
from collections import defaultdict
def load_csv(path):
with open(path) as f:
cases = list(DictReader(f))
return defaultdict(lambda: cases)
pff.add_loader('.csv', load_csv)
#pff.parametrize
def test_api_request_response(request_, payload, header, response, code):
assert request_ == ...
assert payload == ...
assert headers == ...
assert response == ...
assert code == ...
A few things to note:
This assumes that the CSV file has the same base name as the test script. If this isn't the case, it's easy to specify a different path.
The load function is expected to return a dictionary mapping test names (e.g. test_api_request_response) to lists of test cases, where each test case is a dictionary mapping parameter names (e.g. request_) to parameter values (e.g. http://localhost:8080). In this case the files doesn't specify any test names, so we cheat and use a defaultdict to return the same test cases for any test name.

Related

Biopython: return chain but with the new chain ID already

I have script which can extract selected chains from a structure into a new file. I do it for 400+ structures. Because chainIDs of my selected chains can differ in the structures, I parse .yaml files where I store the corresponding chainIDs. This script is working, everything is fine but the next step is to rename the chains to be the same in each file. I used edited code from here:this. Basically it worked as well, however the problem is that e.g. my new chainID of chain1 is the same as original chainID of chain2, and the error occurrs:Cannot change id from U to T. The id T is already used for a sibling of this entity. Actually, this happened for many variables and it'd be too complicated doing it manually.
I've got idea that this could be solved by renaming the chainIDs right in the moment when I'm extracting it. Is it possible using Biopython like that? Could'nt find anything similar to my problem.
Simplified code for one structure (in the original one is one more loop for iterating over 400+ structures and its .yaml files):
with open(yaml_file, "r") as file:
proteins = yaml.load(file, Loader=yaml.FullLoader)
chain1= proteins["1_chain"].split(",")[0] #just for illustration that I have to parse the original chainIDs
chain2= proteins["2_chain"].split(",")[0]
structure = parser.get_structure("xxx", "xxx.cif" )[0]
for model in structure:
for chain in model:
class ChainSelect(Select):
def accept_chain(self, chain):
if chain.get_id() == '{}'.format(chain1):
return True # I thought that somewhere in this part could be added command renaming the chain to "A"
if chain.get_id() == '{}'.format(chain2):
return True #here I'd rename it "B"
else:
return False
io = MMCIFIO()
io.set_structure(structure)
io.save("new.cif" , ChainSelect())
Is it possible to somehow expand "return" command in a way that it would return the chain with desired chainID (e.g. A)? Note that the original chain ID can differ in the structures (thus I have to use .format(chainX))
I don't have any other idea how I'd get rid of the error that my desired chainID is already in sibling entity.

How to extract only wanted property from JSON object

When I run the code:
import requests
import json
def get_fact():
catFact = requests.get("https://catfact.ninja/fact?max_length=140")
json_data = json.loads(catFact.text)
return json_data
print(get_fact())
The output is like
{'fact': "Cats are the world's most popular pets, outnumbering dogs by as many as three to one", 'length': 84}
However I just want the fact.
How do I get rid of the 'fact:' at the front and 'length:' at the back?
What you want is to access the key in the python dict you made with the json.loads call. We actually don't need the json library as requests can read and deserialize JSON itself.
This code also checks if the response was OK and fails with informative error message. It follows PEP 20 – The Zen of Python.
import requests
def get_fact():
# Get the facts dictionary in a JSON serialized form.
cat_fact_response = requests.get("https://catfact.ninja/fact?max_length=140")
# Let the response raise the exception if something bad happened to the cat facts server connection.
cat_fact_response.raise_for_status()
# Deserialize the json (make a Python dict from the text we got). requests can do that on it's own:
cat_fact_dict = cat_fact_response.json()
# Access the fact from the json from the dictionary
return cat_fact_dict['fact']
print(get_fact())
When called you get following output as wanted:
# python3 script.py
The cat's tail is used to maintain balance.
Short answer:
you need to use either get_fact()['fact'] or get_fact().get('fact'). The former will throw an exception if fact doesn't exist whereas the latter will return None.
Why:
In your code sample you fetch some json data, and then print out the entire bit of json. When you parse json, the output is a key/value map called a dictionary (or map or object in other languages). The dictionary in this case contains two keys: fact and length. If you only one want of the values, then you need to tell python that you want only a single value -- fact in this case.
Remember though: this wouldn't apply to every json object you read. Not every one is going to have a fact key.
What you are returning in get_fact is a complete JSON object which you are then printing.
To get just its property fact (without the length) use a reference to that key or property like:
return json_data["fact"]
Below is also a link to a tutorial on using JSON in Python:
w3schools: Python JSON
To extract fact field from the response, use:
import requests
import json
def get_fact():
catFact = requests.get("https://catfact.ninja/fact?max_length=140")
json_data = json.loads(catFact.text)
return json_data['fact'] # <- HERE
print(get_fact())
Output:
Cats have "nine lives" thanks to a flexible spine and powerful leg and back muscles
Note: you don't need json module here, use json() method of Response instance returned by requests:
import requests
def get_fact():
catFact = requests.get("https://catfact.ninja/fact?max_length=140").json()
return catFact['fact']
print(get_fact())

Data Driven - How can I use selective args/params as part of parameterize in Python?

I am looking to set up a data-driven approach for my python selenium project (there is none currently). Planning to have the data file as xlsx.
I use pytest in my project. Hence, I explored ddt, #data, #unpack and
pytest.mark.parametrize.
I am able to read my excel values as pass them with #data-unpack or parametrize.
However, in my case, each of my tests will use selected columns from my data file - not all.
eg) My data list will be like this (user, password, item_number, item_name)[('user1', 'abc', 1, 'it1234')('user2', 'def',2, 'it5678')]
My function1 (test 1) will need to parameterize user and password columns only.
My function2 (test 2) will need to parameterize item_number and item_name columns only.
What library or method can I use for my need? Basically, I need to be able to parameterize specific columns from my data file for my tests.
I wrote a library called Parametrize From File that can load test parameters from data files like this. But I'm not sure that I fully understand your example. If this was your data file...
user
password
item number
item name
A
B
C
D
E
F
G
H
...would these be the tests you want to run?
#pytest.mark.parametrize(
'user, password',
[('A', 'B'), ('E', 'F')],
)
def test_1(user, password):
assert ...
#pytest.mark.parametrize(
'iterm_number, item_name',
[('C', 'D'), ('G', 'H')],
)
def test_2(user, password):
assert ...
In other words, are the user/password columns completely unrelated to the item_number/item_name columns? If no, I'm misunderstanding your question. If yes, this isn't very scalable. It's easy to imagine writing 100 tests, each with 2+ parameters, for a total of >200 columns! This format also breaks the convention that every value in a row should be related in some way. I'd recommend either putting the parameters for each test into their own file/worksheet, or using a file format that better matches the list-of-tuples/list-of-dicts structure expected by pytest, e.g. YAML, TOML, NestedText, etc.
With all that said, here's how you would load parameters from an xlsx file using Parametrize From File:
import pandas as pd
from collections import defaultdict
import parametrize_from_file as pff
def load_xlsx(path):
"""
Load an xlsx file and return the data structure expected by Parametrize
From File, which is a map of test names to test parameters. In this case,
the xlsx file doesn't specify any test names, so we use a `defaultdict` to
make all the parameters available to any test.
"""
df = pd.read_excel(path)
return defaultdict(lambda: df)
def get_cols(cols):
"""
Extract specific columns from the parameters loaded from the xlsx file.
The parameters are loaded as a pandas DataFrame, and need to be converted
into a list of dicts in order to be understood by Parametrize From File.
"""
def _get_cols(df):
return df[cols].to_dict('records')
return _get_cols
# Use the function we defined above to load xlsx files.
pff.add_loader('.xlsx', load_xlsx)
#pff.parametrize(preprocess=get_cols(['user', 'password']))
def test_1(user, password):
pass
#pff.parametrize(preprocess=get_cols(['item_number', 'item_name']))
def test_2(item_number, item_name):
pass
Note that this code would be much simpler if the parameters were organized in one of the formats I recommended above.

Using Pyvaru for bulk data (CSV) validation

Am looking for a generic validator module to assist in sanitizing data and importantly, giving back an error log stating why data has been rejected. Am working primarily with CSV files each with an average of 40 columns and with about 40,000 rows. A CSV file would have a mixture of Personal Identifying Information, Contact Information and details about the Account they hold with us w
E.g.
First Name|Last Name|Other Name|Passport Number|Date of Birth|Phone Number|Email Address|Physical Address|Account Number|Invoice Number|Date Opened|Amount Due|Date Due|etc|etc
I need to validate basic stuff like data type, data length, options/choices, ranges, mandatory fields etc. Also there are conditional validations e.g. if an Amount Due value has been provided, then the Date Due must also be provided. If it hasn't then I raise an error.
Pyvaru provides some basic validation classes. Is it possible to implement both this scenarios of basic validation plus conditional validation with pyvaru? If yes, how would I structure the validations. Must I create objects e.g. Identifier Objects, then Account Objects for me to use pyvaru?
Pyvaru validates python objects (class instances and collections like dictionaries, lists and so on), so starting from a CSV I would convert each record into a dictionary using the DictReader.
So, given a CSV like:
policyID,statecode,county,eq_site_limit,hu_site_limit,fl_site_limit,fr_site_limit,tiv_2011,tiv_2012,eq_site_deductible,hu_site_deductible,fl_site_deductible,fr_site_deductible,point_latitude,point_longitude,line,construction,point_granularity
119736,FL,CLAY COUNTY,498960,498960,498960,498960,498960,792148.9,0,9979.2,0,0,30.102261,-81.711777,Residential,Masonry,1
448094,FL,CLAY COUNTY,1322376.3,1322376.3,1322376.3,1322376.3,1322376.3,1438163.57,0,0,0,0,30.063936,-81.707664,Residential,Masonry,3
206893,FL,CLAY COUNTY,190724.4,190724.4,190724.4,190724.4,190724.4,192476.78,0,0,0,0,30.089579,-81.700455,Residential,Wood,1
333743,FL,CLAY COUNTY,0,79520.76,0,0,79520.76,86854.48,0,0,0,0,30.063236,-81.707703,Residential,Wood,3
172534,FL,CLAY COUNTY,0,254281.5,0,254281.5,254281.5,246144.49,0,0,0,0,30.060614,-81.702675,Residential,Wood,1
The validation code would be something like:
import csv
from pyvaru import Validator
from pyvaru.rules import MaxLengthRule, MaxValueRule
class CsvRecordValidator(Validator):
def get_rules(self) -> list:
record: dict = self.data
return [
MaxLengthRule(apply_to=record.get('statecode'),
label='State Code',
max_length=2),
MaxValueRule(apply_to=record.get('eq_site_limit'),
label='Site limit',
max_value=40000),
]
with open('sample.csv', 'r') as csv_file:
reader = csv.DictReader(csv_file)
row = 0
for record in reader:
row += 1
validator = CsvRecordValidator(record)
validation = validator.validate()
if not validation.is_successful():
print(f'Row {row} did not validate. Details: {validation.errors}')
The example above is just a dumb example of what you can do, specifically it checks that "statecode" column has a max length of 2 and that "eq_site_limit" has a max value of 40k.
You can implement your own rules by subclassing the abstract class ValidationRule and implementing the apply() method:
class ContainsHelloRule(ValidationRule):
def apply(self) -> bool:
return 'hello' in self.apply_to
It's also possible to negate rules using bitwise operators... for example by using the previous custom rule which checks that a string mst contain "hello" you can write:
~ ContainsHelloRule(apply_to=a_string, label="A test string")
and thus the rule will be valid only if the string DOES NOT contains the "hello" string.
It would also possible to validate CSV records, without dictionary conversion by validating each line using a PatternRule with a validation regex... but of course you won't know which column value is invalid.

Python ddt unittest select specific fields from test data

I'm in the process of building data driven tests in Python using unittest and ddt.
Is there a way for me to be able to select specific fields from the test data instead of having to pass all the fields as separate parameters?
For example:
I have a csv file containing customers as below:
Title,FirstName,Surname,AddressA,AddressB,AddressC,City,State,PostCode
Mr,Bob,Gergory,44 road end,A Town,Somewhere,LOS ANGELES,CA,90004
Miss,Alice,Woodrow,99 some street,Elsewhere,A City,LOS ANGELES,CA,90003
From this I'd like to be able to select just the first name, city and state in the test.
I can do this like below, however this seems messy and will be more with wider files:
#data(get_test_data("customers.csv"))
#unpack
def test_create_new_customer(self, Title,FirstName,Surname,AddressA,AddressB,AddressC,City,State,PostCode):
self.customer.enter_first_name(FirstName)
self.customer.enter_city(City)
self.customer.enter_state_code(State)
self.customer.click_update()
I was hoping to be able to build a dictionary list out of the csv and then access it as below:
#data(get_test_data_as_dictionary("customers.csv"))
#unpack
def test_create_new_customer(self, test_data):
self.customer.enter_first_name(test_data["FirstName"])
self.customer.enter_city(test_data["City"])
self.customer.enter_state_code(test_data["State"])
self.customer.click_update()
However it would seem that ddt is smarter that I thought and breaks out the data from the dictionary and still expects all the parameters to be declared.
Is there a better way to achieve what I'm after?

Categories