How can I do the following in Python:
I have a command output that outputs this:
Datexxxx
Clientxxx
Timexxx
Datexxxx
Client2xxx
Timexxx
Datexxxx
Client3xxx
Timexxx
And I want to work this in a dict like:
Client:(date,time), Client2:(date,time) ...
After reading the data into a string subject, you could do this:
import re
d = {}
for match in re.finditer(
"""(?mx)
^Date(.*)\r?\n
Client\d*(.*)\r?\n
Time(.*)""",
subject):
d[match.group(2)] = (match.group(1), match.group(2))
How about something like:
rows = {}
thisrow = []
for line in output.split('\n'):
if line[:4].lower() == 'date':
thisrow.append(line)
elif line[:6].lower() == 'client':
thisrow.append(line)
elif line[:4].lower() == 'time':
thisrow.append(line)
elif line.strip() == '':
rows[thisrow[1]] = (thisrow[0], thisrow[2])
thisrow = []
print rows
Assumes a trailing newline, no spaces before lines, etc.
What about using a dict with tuples?
Create a dictionary and add the entries:
dict = {}
dict['Client'] = ('date1','time1')
dict['Client2'] = ('date2','time2')
Accessing the entires:
dict['Client']
>>> ('date1','time1')
Related
I am trying to fill with Python a table in Word with DocxTemplate and I have some issues to do it properly. I want to use 2 dictionnaries to fill the data in 1 table, in the figure below.
Table to fill
The 2 dictionnaries are filled in a loop and I write the template document at the end.
The input document to create my dictionnaries is an DB extraction written in SQL.
My main issue is when I want to fill the table with my data in the 2 different dictionnaries.
In the code below I will give as an example the 2 dictionnaries with values in it.
# -*- coding: utf8 -*-
#
#
from docxtpl import DocxTemplate
if __name__ == "__main__":
document = DocxTemplate("template.docx")
DicoOccuTable = {'`num_carnet_adresses`': '`annuaire_telephonique`\n`carnet_adresses`\n`carnet_adresses_complement',
'`num_eleve`': '`CFA_apprentissage_ctrl_coherence`\n`CFA_apprentissage_ctrl_examen`}
DicoChamp = {'`num_carnet_adresses`': 72, '`num_eleve`': 66}
template_values = {}
#
template_values["keys"] = [[{"name":cle, "occu":val} for cle,val in DicoChamp.items()],
[{"table":vals} for cles,vals in DicoOccuTable.items()]]
#
document.render(template_values)
document.save('output/' + nomTable.replace('`','') + '.docx')
As a result the two lines for the table are created but nothing is written within...
I would like to add that it's only been 1 week that I work on Python, so I feel that I don't manage properly the different objects here.
If you have any suggestion to help me, I would appreciate it !
I put here the loop to create the dictionnaries, it may help you to understand why I coded it wrong :)
for c in ChampList:
with open("db_reference.sql", "r") as f:
listTable = []
line = f.readlines()
for l in line:
if 'CREATE TABLE' in l:
begin = True
linecreateTable = l
x = linecreateTable.split()
nomTable = x[2]
elif c in l and begin == True:
listTable.append(nomTable)
elif ') ENGINE=MyISAM DEFAULT CHARSET=latin1;' in l:
begin = False
nbreOccu=len(listTable)
Tables = "\n".join(listTable)
DicoChamp.update({c:nbreOccu})
DicoOccuTable.update({c:Tables})
# DicoChamp = {c:nbreOccu}
template_values = {}
Thank You very much !
Finally I found a solution for this problem. Here it is.
Instead of using 2 dictionnaries I created 1 dictionnary with this strucuture :
Dico = { Champ : [Occu , Tables] }
The full code for creating the table is detailed below :
from docxtpl import DocxTemplate
document = DocxTemplate("template.docx")
template_values = {}
Context = {}
for c in ChampList:
listTable = []
nbreOccu = 0
OccuTables = []
with open("db_reference.sql", "r") as g:
listTable = []
ligne = g.readlines()
for li in ligne:
if 'CREATE TABLE' in li:
begin = True
linecreateTable2 = li
y = linecreateTable2.split()
nomTable2 = y[2]
elif c in li and begin == True:
listTable.append(nomTable2)
elif ') ENGINE=MyISAM DEFAULT CHARSET=latin1;' in li:
begin = False
elif '/*!40101 SET COLLATION_CONNECTION=#OLD_COLLATION_CONNECTION */;' in li:
nbreOccu=len(listTable)
inter = "\n".join(listTable)
OccuTables.append(nbreOccu)
OccuTables.append(inter)
ChampNumPropre = c.replace('`','')
Context.update({ChampNumPropre:OccuTables})
else:
continue
template_values["keys"] = [{"label":cle, "cols":val} for cle,val in Context.items()]
#
document.render(template_values)
document.save('output/' + nomTable.replace('`','') + '.docx')
And I used a table with the following structure :
I hope you will find your answers here and good luck !
I have a text file that looks like this:
1 acatccacgg atgaaggaga ggagaaatgt ttcaaatcag ttctaacacg aaaaccaatt
61 ccaagaccaa gttatgaaat taccactaag cagcagtgaa agaactacat attgaagtca
121 gataagaaag caagctgaag agcaagcact gggcatcttt cttgaaaaaa gtaaggccca
181 agtaacagac tatcagattt ttttgcagtc tttgcattcc tactagatga ttcacagaga
241 agatagtcac atttatcatt cgaaaacatg aaagaattcc agtcagaact tgcatttggg
301 ggcatgtaag tctcaaggtt gtctttttgc caatgtgctg taacattatt gcactcagag
361 tgtactgctg acagccactg ttctgccgaa atgacagaaa atagggaaca
I am trying to read the txt file and make a dictionary that puts the text information into a dictionary like this: {1:[acatccacgg,atgaaggaga, ggagaaatgt, ttcaaatcag, ttctaacacg, aaaaccaatt], 61 : ...}
I have no clue how to do this...I am really new to python
you can try this line of code.
f = open('test.txt','r')
mydictionary = {}
for x in f:
temp = x.strip().split(' ')
mydictionary.update({temp[0]:temp[1:]})
f.close()
print(mydictionary)
this is the cleaner, and more readable way to do so (just try it, and you will understand):
import re
from os.path import exists
def put_in_dict(directory: str):
"""With this function you can find the digits's in every line and
then put it in keys and then you can put the character's in the same line
as value to that key."""
my_dict = {}
pattern_digit = re.compile(r"\d+")
pattern_char = re.compile(r"\w+")
char = []
if exists(directory):
with open(f"{directory}") as file:
all_text = file.read().strip()
list_txt = all_text.splitlines()
numbs = pattern_digit.findall(all_text)
for num in range(len(list_txt)):
char.append(pattern_char.findall(list_txt[num]))
del char[num][0]
for dict_set in range(len(numbs)):
my_dict[numbs[dict_set]] = char[dict_set]
return my_dict # you could make it print(my_dict) too
I'm trying to parse a blocks of text in python 2.7 using itertools.groupby
The data has the following structure:
BEGIN IONS
TITLE=cmpd01_scan=23
RTINSECONDS=14.605
PEPMASS=694.299987792969 505975.375
CHARGE=2+
615.839727 1760.3752441406
628.788226 2857.6264648438
922.4323436 2458.0959472656
940.4432533 9105.5
END IONS
BEGIN IONS
TITLE=cmpd01_scan=24
RTINSECONDS=25.737
PEPMASS=694.299987792969 505975.375
CHARGE=2+
575.7636234 1891.1656494141
590.3553938 2133.4477539063
615.8339562 2433.4252929688
615.9032114 1784.0628662109
END IONS
I need to extract information from the line beigining with "TITLE=", "PEPMASS=","CHARGE=".
The code I'm using as follows:
import itertools
import re
data_file='Test.mgf'
def isa_group_separator(line):
return line=='END IONS\n'
regex_scan = re.compile(r'TITLE=')
regex_precmass=re.compile(r'PEPMASS=')
regex_charge=re.compile(r'CHARGE=')
with open(data_file) as f:
for (key,group) in itertools.groupby(f,isa_group_separator):
#print(key,list(group))
if not key:
precmass_match = filter(regex_precmass.search,group)
print precmass_match
scan_match= filter(regex_scan.search,group)
print scan_match
charge_match = filter(regex_charge.search,group)
print charge_match
However, the output only picks up the "PEPMASS=" line,and if 'scan_match' assignment is done before 'precmass_match', the "TITLE=" line is printed only;
> ['PEPMASS=694.299987792969 505975.375\n'] [] []
> ['PEPMASS=694.299987792969 505975.375\n'] [] []
can someone point out what I'm doing wrong here?
The reason for this is that group is an iterator and it runs only once.
Please find the modified script that does the job.
import itertools
import re
data_file='Test.mgf'
def isa_group_separator(line):
return line == 'END IONS\n'
regex_scan = re.compile(r'TITLE=')
regex_precmass = re.compile(r'PEPMASS=')
regex_charge = re.compile(r'CHARGE=')
with open(data_file) as f:
for (key, group) in itertools.groupby(f, isa_group_separator):
if not key:
g = list(group)
precmass_match = filter(regex_precmass.search, g)
print precmass_match
scan_match = filter(regex_scan.search, g)
print scan_match
charge_match = filter(regex_charge.search, g)
print charge_match
I might try to parse this way (without using groupby(
import re
file = """\
BEGIN IONS
TITLE=cmpd01_scan=23
RTINSECONDS=14.605
PEPMASS=694.299987792969 505975.375
CHARGE=2+
615.839727 1760.3752441406
628.788226 2857.6264648438
922.4323436 2458.0959472656
940.4432533 9105.5
END IONS
BEGIN IONS
TITLE=cmpd01_scan=24
RTINSECONDS=25.737
PEPMASS=694.299987792969 505975.375
CHARGE=2+
575.7636234 1891.1656494141
590.3553938 2133.4477539063
615.8339562 2433.4252929688
615.9032114 1784.0628662109
END IONS""".splitlines()
pat = re.compile(r'(TITLE|PEPMASS|CHARGE)=(.+)')
data = []
for line in file:
m = pat.match(line)
if m is not None:
if m.group(1) == 'TITLE':
data.append([])
data[-1].append(m.group(2))
print(data)
Prints:
[['cmpd01_scan=23', '694.299987792969 505975.375', '2+'], ['cmpd01_scan=24', '694.299987792969 505975.375', '2+']]
How to add different values in the same key of a dictionary? These different values are added
in a loop.
Below is what I desired entries in the dictionary data_dict
data_dict = {}
And during each iterations, output should looks like:
Iteration1 -> {'HUBER': {'100': 5.42}}
Iteration2 -> {'HUBER': {'100': 5.42, '10': 8.34}}
Iteration3 -> {'HUBER': {'100': 5.42, '10': 8.34, '20': 7.75}} etc
However, at the end of the iterations, data_dict is left with the last entry only:
{'HUBER': {'80': 5.50}}
Here's the code:
import glob
path = "./meanFilesRun2/*.txt"
all_files = glob.glob(path)
data_dict = {}
def func_(all_lines, method, points, data_dict):
if method == "HUBER":
mean_error = float(all_lines[-1]) # end of the file contains total_error
data_dict["HUBER"] = {points: mean_error}
return data_dict
elif method == "L1":
mean_error = float(all_lines[-1])
data_dict["L1"] = {points: mean_error}
return data_dict
for file_ in all_files:
lineMthds = file_.split("_")[1] # reading line methods like "HUBER/L1/L2..."
algoNum = file_.split("_")[-2] # reading diff. algos number used like "1/2.."
points = file_.split("_")[2] # diff. points used like "10/20/30..."
if algoNum == "1":
FI = open(file_, "r")
all_lines = FI.readlines()
data_dict = func_(all_lines, lineMthds, points, data_dict)
print data_dict
FI.close()
You can use dict.setdefault here. Currently the problem with your code is that in each call to func_ you're re-assigning data_dict["HUBER"] to a new dict.
Change:
data_dict["HUBER"] = {points: mean_error}
to:
data_dict.setdefault("HUBER", {})[points] = mean_error
You can use defaultdict from the collections module:
import collections
d = collections.defaultdict(dict)
d['HUBER']['100'] = 5.42
d['HUBER']['10'] = 3.45
I have the following text chunk:
string = """
apples: 20
oranges: 30
ripe: yes
farmers:
elmer fudd
lives in tv
farmer ted
lives close
farmer bill
lives far
selling: yes
veggies:
carrots
potatoes
"""
I am trying to find a good regex that will allow me to parse out the key values. I can grab the single line key values with something like:
'(.+?):\s(.+?)\n'
However, the problem comes when I hit farmers, or veggies.
Using the re flags, I need to do something like:
re.findall( '(.+?):\s(.+?)\n', string, re.S),
However, I am having a heck of a time grabbing all of the values associated with farmers.
There is a newline after each value, and a tab, or series of tabs before the values when they are multiline.
and goal is to have something like:
{ 'apples': 20, 'farmers': ['elmer fudd', 'farmer ted'] }
etc.
Thank you in advance for your help.
You might look at PyYAML, this text is very close to, if not actually valid YAML.
Here's a really dumb parser that takes into account your (apparent) indentation rules:
def parse(s):
d = {}
lastkey = None
for fullline in s:
line = fullline.strip()
if not line:
pass
elif ':' not in line:
indent = len(fullline) - len(fullline.lstrip())
if lastindent is None:
lastindent = indent
if lastindent == indent:
lastval.append(line)
else:
if lastkey:
d[lastkey] = lastval
lastkey = None
if line.endswith(':'):
lastkey, lastval, lastindent = key, [], None
else:
key, _, value = line.partition(':')
d[key] = value.strip()
if lastkey:
d[lastkey] = lastval
lastkey = None
return d
import pprint
pprint(parse(string.splitlines()))
The output is:
{'apples': '20',
'oranges': '30',
'ripe': ['elmer fudd', 'farmer ted', 'farmer bill'],
'selling': ['carrots', 'potatoes']}
I think this is already complicated enough that it would look cleaner as an explicit state machine, but I wanted to write this in terms that any novice could understand.
Here's a totally silly way to do it:
import collections
string = """
apples: 20
oranges: 30
ripe: yes
farmers:
elmer fudd
lives in tv
farmer ted
lives close
farmer bill
lives far
selling: yes
veggies:
carrots
potatoes
"""
def funky_parse(inval):
lines = inval.split("\n")
items = collections.defaultdict(list)
at_val = False
key = ''
val = ''
last_indent = 0
for j, line in enumerate(lines):
indent = len(line) - len(line.lstrip())
if j != 0 and at_val and indent > last_indent > 4:
continue
if j != 0 and ":" in line:
if val:
items[key].append(val.strip())
at_val = False
key = ''
line = line.lstrip()
for i, c in enumerate(line, 1):
if at_val:
val += c
else:
key += c
if c == ':':
at_val = True
if i == len(line) and at_val and val:
items[key].append(val.strip())
val = ''
last_indent = indent
return items
print dict(funky_parse(string))
OUTPUT
{'farmers:': ['elmer fudd', 'farmer ted', 'farmer bill'], 'apples:': ['20'], 'veggies:': ['carrots', 'potatoes'], 'ripe:': ['yes'], 'oranges:': ['30'], 'selling:': ['yes']}