Iterate over sections in a config file - python
I recently got introduced to the library configparser. I would like to be able to check if each section has at least one Boolean value set to 1. For example:
[Horizontal_Random_Readout_Size]
Small_Readout = 0
Medium_Readout = 0
Large_Readout = 0
The above would cause an error.
[Vertical_Random_Readout_Size]
Small_Readout = 0
Medium_Readout = 0
Large_Readout = 1
The above would pass. Below is some pseudo code of what I had in mind:
exit_test = False
for sections in config_file:
section_check = False
for name in parser.options(section):
if parser.getboolean(section, name):
section_check = True
if not section_check:
print "ERROR:Please specify a setting in {} section of the config file".format(section)
exit_test = True
if exit_test:
exit(1)
Questions:
1) How do I perform the first for loop and iterate over the sections of the config file?
2) Is this a good way of doing this or is there a better way? (If there isn't please answer question one.)
Using ConfigParser you have to parse your config.
After parsing you will get all sections using .sections() method.
You can iterate over each section and use .items() to get all key/value pairs of each section.
for each_section in conf.sections():
for (each_key, each_val) in conf.items(each_section):
print each_key
print each_val
Best bet is to load ALL the lines in the file into some kind of array (I'm going to ignore the issue of how much memory that might use and whether to page through it instead).
Then from there you know that lines denoting headings follow a certain format, so you can iterate over your array to create an array of objects containing the heading name; the line index (zero based reference to master array) and whether that heading has a value set.
From there you can iterate over these objects in cross-reference to the master array, and for each heading check the next "n" lines (in the master array) between the current heading and the next.
At this point you're down to the individual config values for that heading so you should easily be able to parse the line and detect a value, whereupon you can break from the loop if true, or for more robustness issue an exclusivity check on those heading's values in order to ensure ONLY one value is set.
Using this approach you have access to all the lines, with one object per heading, so your code remains flexible and functional. Optimise afterwards.
Hope that makes sense and is helpful.
To complete the answer by #Nilesh and comment from #PashMic, here is an example that really iterate over ALL sections, including DEFAULT:
all_section_names: list[str] = conf.sections()
all_section_names.append("DEFAULT")
for section_name in all_section_names:
for key, value in conf.items(section_name):
...
Note that even if there is no real "DEFAULT" section, this will still works. There will just be no item retreived by conf.items("DEFAULT").
Related
Biopython: return chain but with the new chain ID already
I have script which can extract selected chains from a structure into a new file. I do it for 400+ structures. Because chainIDs of my selected chains can differ in the structures, I parse .yaml files where I store the corresponding chainIDs. This script is working, everything is fine but the next step is to rename the chains to be the same in each file. I used edited code from here:this. Basically it worked as well, however the problem is that e.g. my new chainID of chain1 is the same as original chainID of chain2, and the error occurrs:Cannot change id from U to T. The id T is already used for a sibling of this entity. Actually, this happened for many variables and it'd be too complicated doing it manually. I've got idea that this could be solved by renaming the chainIDs right in the moment when I'm extracting it. Is it possible using Biopython like that? Could'nt find anything similar to my problem. Simplified code for one structure (in the original one is one more loop for iterating over 400+ structures and its .yaml files): with open(yaml_file, "r") as file: proteins = yaml.load(file, Loader=yaml.FullLoader) chain1= proteins["1_chain"].split(",")[0] #just for illustration that I have to parse the original chainIDs chain2= proteins["2_chain"].split(",")[0] structure = parser.get_structure("xxx", "xxx.cif" )[0] for model in structure: for chain in model: class ChainSelect(Select): def accept_chain(self, chain): if chain.get_id() == '{}'.format(chain1): return True # I thought that somewhere in this part could be added command renaming the chain to "A" if chain.get_id() == '{}'.format(chain2): return True #here I'd rename it "B" else: return False io = MMCIFIO() io.set_structure(structure) io.save("new.cif" , ChainSelect()) Is it possible to somehow expand "return" command in a way that it would return the chain with desired chainID (e.g. A)? Note that the original chain ID can differ in the structures (thus I have to use .format(chainX)) I don't have any other idea how I'd get rid of the error that my desired chainID is already in sibling entity.
Python- Insert new values into 'nested' list?
What I'm trying to do isn't a huge problem in php, but I can't find much assistance for Python. In simple terms, from a list which produces output as follows: {"marketId":"1.130856098","totalAvailable":null,"isMarketDataDelayed":null,"lastMatchTime":null,"betDelay":0,"version":2576584033,"complete":true,"runnersVoidable":false,"totalMatched":null,"status":"OPEN","bspReconciled":false,"crossMatching":false,"inplay":false,"numberOfWinners":1,"numberOfRunners":10,"numberOfActiveRunners":8,"runners":[{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":2.8,"size":34.16},{"price":2.76,"size":200},{"price":2.5,"size":237.85}],"availableToLay":[{"price":2.94,"size":6.03},{"price":2.96,"size":10.82},{"price":3,"size":33.45}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832765}... All I want to do is add in an extra field, containing the 'runner name' in the data set below, into each of the 'runners' sub lists from the initial data set, based on selection_id=selectionId. So initially I iterate through the full dataset, and then create a separate list to get the runner name from the runner id (I should point out that runnerId===selectionId===selection_id, no idea why there are multiple names are used), this works fine and the code is shown below: for market_book in market_books: market_catalogues = trading.betting.list_market_catalogue( market_projection=["RUNNER_DESCRIPTION", "RUNNER_METADATA", "COMPETITION", "EVENT", "EVENT_TYPE", "MARKET_DESCRIPTION", "MARKET_START_TIME"], filter=betfairlightweight.filters.market_filter( market_ids=[market_book.market_id], ), max_results=100) data = [] for market_catalogue in market_catalogues: for runner in market_catalogue.runners: data.append( (runner.selection_id, runner.runner_name) ) So as you can see I have the data in data[], but what I need to do is add it to the initial data set, based on the selection_id. I'm more comfortable with Php or Javascript, so apologies if this seems a bit simplistic, but the code snippets I've found on-line only seem to assist with very simple Python lists and nothing 'nested' (to me the structure seems similar to a nested array). As per the request below, here is the full list: {"marketId":"1.130856098","totalAvailable":null,"isMarketDataDelayed":null,"lastMatchTime":null,"betDelay":0,"version":2576584033,"complete":true,"runnersVoidable":false,"totalMatched":null,"status":"OPEN","bspReconciled":false,"crossMatching":false,"inplay":false,"numberOfWinners":1,"numberOfRunners":10,"numberOfActiveRunners":8,"runners":[{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":2.8,"size":34.16},{"price":2.76,"size":200},{"price":2.5,"size":237.85}],"availableToLay":[{"price":2.94,"size":6.03},{"price":2.96,"size":10.82},{"price":3,"size":33.45}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832765},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":20,"size":3},{"price":19.5,"size":26.36},{"price":19,"size":2}],"availableToLay":[{"price":21,"size":13},{"price":22,"size":2},{"price":23,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832767},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":11,"size":9.75},{"price":10.5,"size":3},{"price":10,"size":28.18}],"availableToLay":[{"price":11.5,"size":12},{"price":13.5,"size":2},{"price":14,"size":7.75}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832766},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":48,"size":2},{"price":46,"size":5},{"price":42,"size":5}],"availableToLay":[{"price":60,"size":7},{"price":70,"size":5},{"price":75,"size":10}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832769},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":18.5,"size":28.94},{"price":18,"size":5},{"price":17.5,"size":3}],"availableToLay":[{"price":21,"size":20},{"price":23,"size":2},{"price":24,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832768},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":4.3,"size":9},{"price":4.2,"size":257.98},{"price":4.1,"size":51.1}],"availableToLay":[{"price":4.4,"size":20.97},{"price":4.5,"size":30},{"price":4.6,"size":16}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832771},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":24,"size":6.75},{"price":23,"size":2},{"price":22,"size":2}],"availableToLay":[{"price":26,"size":2},{"price":27,"size":2},{"price":28,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832770},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":5.7,"size":149.33},{"price":5.5,"size":29.41},{"price":5.4,"size":5}],"availableToLay":[{"price":6,"size":85},{"price":6.6,"size":5},{"price":6.8,"size":5}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":10064909}],"publishTime":1551612312125,"priceLadderDefinition":{"type":"CLASSIC"},"keyLineDescription":null,"marketDefinition":{"bspMarket":false,"turnInPlayEnabled":false,"persistenceEnabled":false,"marketBaseRate":5,"eventId":"28180290","eventTypeId":"2378961","numberOfWinners":1,"bettingType":"ODDS","marketType":"NONSPORT","marketTime":"2019-03-29T00:00:00.000Z","suspendTime":"2019-03-29T00:00:00.000Z","bspReconciled":false,"complete":true,"inPlay":false,"crossMatching":false,"runnersVoidable":false,"numberOfActiveRunners":8,"betDelay":0,"status":"OPEN","runners":[{"status":"ACTIVE","sortPriority":1,"id":10064909},{"status":"ACTIVE","sortPriority":2,"id":12832765},{"status":"ACTIVE","sortPriority":3,"id":12832766},{"status":"ACTIVE","sortPriority":4,"id":12832767},{"status":"ACTIVE","sortPriority":5,"id":12832768},{"status":"ACTIVE","sortPriority":6,"id":12832770},{"status":"ACTIVE","sortPriority":7,"id":12832769},{"status":"ACTIVE","sortPriority":8,"id":12832771},{"status":"LOSER","sortPriority":9,"id":10317013},{"status":"LOSER","sortPriority":10,"id":10317010}],"regulators":["MR_INT"],"countryCode":"GB","discountAllowed":true,"timezone":"Europe\/London","openDate":"2019-03-29T00:00:00.000Z","version":2576584033,"priceLadderDefinition":{"type":"CLASSIC"}}}
i think i understand what you are trying to do now first hold your data as a python object (you gave us a json object) import json my_data = json.loads(my_json_string) for item in my_data['runners']: item['selectionId'] = [item['selectionId'], my_name_here] the thing is that my_data['runners'][i]['selectionId'] is a string, unless you want to concat the name and the id together, you should turn it into a list or even a dictionary each item is a dicitonary so you can always also a new keys to it item['new_key'] = my_value
So, essentially this works...with one exception...I can see from the print(...) in the loop that the attribute is updated, however what I can't seem to do is then see this update outside the loop. mkt_runners = [] for market_catalogue in market_catalogues: for r in market_catalogue.runners: mkt_runners.append((r.selection_id, r.runner_name)) for market_book in market_books: for runner in market_book.runners: for x in mkt_runners: if runner.selection_id in x: setattr(runner, 'x', x[1]) print(market_book.market_id, runner.x, runner.selection_id) print(market_book.json()) So the print(market_book.market_id.... displays as expected, but when I print the whole list it shows the un-updated version. I can't seem to find an obvious solution, which is odd, as it seems like a really simple thing (I tried messing around with indents, in case that was the problem, but it doesn't seem to be, its like its not refreshing the market_book list post update of the runners sub list)!
Extracting information from unconventional text files? (Python)
I am trying to extract some information from a set of files sent to me by a collaborator. Each file contains some python code which names a sequence of lists. They look something like this: #PHASE = 0 x = np.array(1,2,...) y = np.array(3,4,...) z = np.array(5,6,...) #PHASE = 30 x = np.array(1,4,...) y = np.array(2,5,...) z = np.array(3,6,...) #PHASE = 40 ... And so on. There are 12 files in total, each with 7 phase sets. My goal is to convert each phase into it's own file which can then be read by ascii.read() as a Table object for manipulation in a different section of code. My current method is extremely inefficient, both in terms of resources and time/energy required to assemble. It goes something like this: Start with a function def makeTable(a,b,c): output = Table() output['x'] = a output['y'] = b output['z'] = c return output Then for each phase, I have manually copy-pasted the relevant part of the text file into a cell and appended a line of code fileName_phase = makeTable(a,b,c) Repeat ad nauseam. It would take 84 iterations of this to process all the data, and naturally each would need some minor adjustments to match the specific fileName and phase. Finally, at the end of my code, I have a few lines of code set up to ascii.write each of the tables into .dat files for later manipulation. This entire method is extremely exhausting to set up. If it's the only way to handle the data, I'll do it. I'm hoping I can find a quicker way to set it up, however. Is there one you can suggest?
If efficiency and code reuse instead of copy is the goal, I think that Classes might provide a good way. I'm going to sleep now, but I'll edit later. Here's my thoughts: create a class called FileWithArrays and use a parser to read the lines and put them inside the object FileWithArrays you will create using the class. Once that's done, you can then create a method to transform the object in a table. P.S. A good idea for the parser is to store all the lines in a list and parse them one by one, using list.pop() to auto shrink the list. Hope it helps, tomorrow I'll look more on it if this doesn't help a lot. Try to rewrite/reformat the question if I misunderstood anything, it's not very easy to read.
I will suggest a way which will be scorned by many but will get your work done. So apologies to every one. The prerequisites for this method is that you absolutely trust the correctness of the input files. Which I guess you do. (After all he is your collaborator). So the key point here is that the text in the file is code which means it can be executed. So you can do something like this import re import numpy as np # this is for the actual code in the files. You might have to install numpy library for this to work. file = open("xyz.txt") content = file.read() Now that you have all the content, you have to separate it by phase. For this we will use the re.split function. phase_data = re.split("#PHASE = .*\n", content) Now we have the content of each phase in an array. Now comes for the part of executing it. for phase in phase_data: if len(phase.strip()) == 0: continue exec(phase) table = makeTable(x, y, z) # the x, y and z are defined by the exec. # do whatever you want with the table. I will reiterate that you have to absolutely trust the contents of the file. Since you are executing it as code. But your work seems like a scripting one and I believe this will get your work done. PS : The other "safer" alternative to exec is to have a sandboxing library which takes the string and executes it without affecting the parent scope.
To avoid the safety issue of using exec as suggested by #Ajay Brahmakshatriya, but keeping his first processing step, you can create your own minimal 'phase parser', something like: VARS = 'xyz' def makeTable(phase): assert len(phase) >= 3 output = Table() for i in range(3): line = [s.strip() for s in phase[i].split('=')] assert len(line) == 2 var, arr = line assert var == VARS[i] assert arr[:10]=='np.array([' and arr[-2:]=='])' output[var] = np.fromstring(arr[10:-2], sep=',') return output and then call table = makeTable(phase) instead of exec(phase) table = makeTable(x, y, z) You could also skip all these assert statements without compromising safety, if the file is corrupted or not formatted as expected the error that will be thrown might just be harder to understand...
Parse a file in python to find first a string, then parse the following strings until it find another string
I am trying to scroll trough a result file that one of our process print. The objective is to look through various blocks and find a specific parameter. I tried to tackle this but can't find an efficient way that would avoid to parse the file multiple times. This is an example of the output file that I read: ID:13123 Compound:xyz ... various parameters RhPhase:abc ID:543 Compound:lbm ... various parameters ID:232355 Compound:dfs ... various parameters RhPhase:cvb I am looking for a specific ID that has a RhPhase in it, but since the file contains many more entry, I just want that specific ID. It may or may not have an RhPhase in it; if it has one, I get the value. The only way that I figured out is to actually go through the whole file (which may be hundreds of blocks, to give an idea of the size), and make a list for each ID that has a RhPhase, then in second instance, I scroll through the dictionary, retrieving the value for a specific ID. This feels so inefficient; I tried to do something different, but got stuck at how you mark the lines while you scroll through them; so I can tell python to read each line->when find the ID that I want continue to read->if you find RhPhase get the value, otherwise stop at the next ID. I am stuck here: datafile=open("datafile.txt", "r") for items in datafile.readline(): if "ID:543" in items: [read more lines] [if "RhPhase" in lines:] [ rhphase=lines ] [elif ""ID:" in lines ] [ rhphase=None ] [ break ] Once I find the ID; I don't know how to continue to either look for the RhPhase string or find the first ID: string and stop everything (because this means that the ID does not have an associated RhPhase). This would pass through the file once, and just check for the specific ID, instead of parse the whole thing once and then do a second pass. Is possible to do so or am I stuck to the double parsing ?
Usually, you solve these kind of things with a simple state machine: You read the lines until you find your id; then you put your reader into a special state that then checks for the parameter you want to extract. In your case, you only have two states: ID not found, and ID found, so a simple boolean is enough: foundId = False with open('datafile.txt', 'r') as datafile: for line in datafile: if foundId: if line.startswith('RhPhase'): print('Found RhPhase for ID 543:') print(line) # end reading the file break elif line.startswith('ID:'): print('Error: Found another ID without finding RhPhase first') break # if we haven’t found the ID yet, keep looking for it elif line.startswith('ID:543'): foundId = True
Search a single column for a particular value in a CSV file and return an entire row
Issue The code does not correctly identify the input (item). It simply dumps to my failure message even if such a value exists in the CSV file. Can anyone help me determine what I am doing wrong? Background I am working on a small program that asks for user input (function not given here), searches a specific column in a CSV file (Item) and returns the entire row. The CSV data format is shown below. I have shortened the data from the actual amount (49 field names, 18000+ rows). Code import csv from collections import namedtuple from contextlib import closing def search(): item = 1000001 raw_data = 'active_sanitized.csv' failure = 'No matching item could be found with that item code. Please try again.' check = False with closing(open(raw_data, newline='')) as open_data: read_data = csv.DictReader(open_data, delimiter=';') item_data = namedtuple('item_data', read_data.fieldnames) while check == False: for row in map(item_data._make, read_data): if row.Item == item: return row else: return failure CSV structure active_sanitized.csv Item;Name;Cost;Qty;Price;Description 1000001;Name here:1;1001;1;11;Item description here:1 1000002;Name here:2;1002;2;22;Item description here:2 1000003;Name here:3;1003;3;33;Item description here:3 1000004;Name here:4;1004;4;44;Item description here:4 1000005;Name here:5;1005;5;55;Item description here:5 1000006;Name here:6;1006;6;66;Item description here:6 1000007;Name here:7;1007;7;77;Item description here:7 1000008;Name here:8;1008;8;88;Item description here:8 1000009;Name here:9;1009;9;99;Item description here:9 Notes My experience with Python is relatively little, but I thought this would be a good problem to start with in order to learn more. I determined the methods to open (and wrap in a close function) the CSV file, read the data via DictReader (to get the field names), and then create a named tuple to be able to quickly select the desired columns for the output (Item, Cost, Price, Name). Column order is important, hence the use of DictReader and namedtuple. While there is the possibility of hard-coding each of the field names, I felt that if the program can read them on file open, it would be much more helpful when working on similar files that have the same column names but different column organization. Research CSV Header and named tuple: What is the pythonic way to read CSV file data as rows of namedtuples? Converting CSV data to tuple: How to split a CSV row so row[0] is the name and any remaining items are a tuple? There were additional links of research, but I cannot post more than two.
You have three problems with this: You return on the first failure, so it will never get past the first line. You are reading strings from the file, and comparing to an int. _make iterates over the dictionary keys, not the values, producing the wrong result (item_data(Item='Name', Name='Price', Cost='Qty', Qty='Item', Price='Cost', Description='Description')). for row in (item_data(**data) for data in read_data): if row.Item == str(item): return row return failure This fixes the issues at hand - we check against a string, and we only return if none of the items matched (although you might want to begin converting the strings to ints in the data rather than this hackish fix for the string/int issue). I have also changed the way you are looping - using a generator expression makes for a more natural syntax, using the normal construction syntax for named attributes from a dict. This is cleaner and more readable than using _make and map(). It also fixes problem 3.