Hi i am trying to create a list of parameters from a file
The final result should be something like
param=[[field],[units],[height],[site]]
The problem is that the information is split into lines and some of the parameters do not have all the information
#info in the file
[field1]
unit=m/s
height=70.4
site=site1
[field2]
height=20.6
site=site2
[field3]
units=m
...
so i would like to fulfill all the fields in such a way that, if there is not information assigns 0 or ''
Final result in the example
param={field1:'m/s',70.4,'site1',field2:'',20.6,site2, field3:'m',0,''}
I know how to create a dictionary from list of lists but not to set default values ('' for the strings values an 0 for the numeric ones) in case some values are missing
Thanks
You could group using a defaultdict:
from collections import defaultdict
with open("test.txt") as f:
d = defaultdict(list)
for line in map(str.rstrip, f):
if line.startswith("["):
d["fields"].append(line.strip("[]"))
else:
k,v = line.split("=")
d[k].append(v)
Input::
[field1]
unit=m/s
height=70.4
site=site1
[field2]
height=20.6
site=site2
[field3]
unit=m
height=6.0
site=site3
Output:
defaultdict(<type 'list'>, {'fields': ['field1', 'field2', 'field3'],
'site': ['site1', 'site2', 'site3'], 'unit': ['m/s', 'm'],
'height': ['70.4', '20.6', '6.0']})
If you actually want to group by field, you can use itertools.groupby grouping on lines that start with [:
from itertools import groupby
with open("test.txt") as f:
grps, d = groupby(map(str.rstrip,f), key=lambda x: x.startswith("[")), {}
for k,v in grps:
if k:
k, v = next(v).strip("[]"), list(next(grps)[1])
d[k] = v
print(d)
Output:
{'field2': ['height=20.6', 'site=site2'],
'field3': ['unit=m', 'height=6.0', 'site=site3'],
'field1': ['unit=m/s', 'height=70.4', 'site=site1']}
Each k is a line starting with [, we then call next on the grouper object to get all the lines up to the next line starting with [ or the EOF:
This would fill in the missing information.
f= open('file.txt','r')
field, units, height, site = [],[],[],[]
param = [ field, units, height, site]
lines = f.readlines()
i=0
while True:
try:
line1 = lines[i].rstrip()
if line1.startswith('['):
field.append(line1.strip('[]'))
else:
field.append(0)
i-= 1
except:
field.append(0)
try:
line2 = lines[i+1].rstrip()
if line2.startswith('unit') or line2.startswith('units'):
units.append(line2.split('=')[-1])
else:
units.append('')
i-=1
except:
units.append('')
try:
line3 = lines[i+2].rstrip()
if line3.startswith('height'):
height.append(line3.split('=')[-1])
else:
height.append(0)
i-=1
except:
height.append(0)
try:
line4 = lines[i+3].rstrip()
if line4.startswith('site'):
site.append(line4.split('=')[-1])
else:
site.append('')
except:
site.append('')
break
i +=4
Output:
param:
[['field1', 'field2', 'field3'],
['m/s', '', 'm'],
['70.4', '20.6', 0],
['site1', 'site2', '']]
Related
I am extracting from the log file and print using the below code
for line in data:
g = re.findall(r'([\d.]+).*?(GET|POST|PUT|DELETE)', line)
print (g)
[('1.1.1.1', 'PUT')]
[('2.2.2.2', 'GET')]
[('1.1.1.1', 'PUT')]
[('2.2.2.2', 'POST')]
How to add to the output
output
1.1.1.1: PUT = 2
2.2.2.2: GET = 1,POST=1
You could use a dictionary to count:
# initialize the count dict
count_dict= dict()
for line in data:
g = re.findall(r'([\d.]+).*?(GET|POST|PUT|DELETE)', line)
for tup in g:
# get the counts for tuple tup if we don't have it yet
# use 0 (second argument to .get)
num= count_dict.get(tup, 0)
# increase the count and write it back
count_dict[tup]= num+1
# now iterate over the key (tuple) - value (counts)-pairs
# and print the result
for tup, count in count_dict.items():
print(tup, count)
Ok, I have to admit this doesn't give the exact output, you want, but from this you can do in a similar manner:
out_dict= dict()
for (comma_string, request_type), count in count_dict.items():
out_str= out_dict.get(comma_string, '')
sep='' if out_str == '' else ', '
out_str= f'{out_str}{sep}{request_type} = {count}'
out_dict[comma_string]= out_str
for tup, out_str in out_dict.items():
print(tup, out_str)
From your data that outputs:
1.1.1.1 PUT = 2
2.2.2.2 GET = 1, POST = 1
I would look towards Counter.
from collections import Counter
results = []
for line in data:
g = re.findall(r'([\d.]+).*?(GET|POST|PUT|DELETE)', line)
results.append(g[0])
ip_list = set(result[0] for result in results)
for ip in ip_list:
print(ip, Counter(result[1] for result in results if result[0] == ip ))
You can use collection.defaultdict
Ex:
from collections import defaultdict
result = defaultdict(list)
for line in data:
for ip, method in re.findall(r'([\d.]+).*?(GET|POST|PUT|DELETE)', line):
result[ip].append(method)
for k, v in result.items():
temp = ""
for i in set(v):
temp += " {} = {}".format(i, v.count(i))
print("{}{}".format(k, temp))
from collections import Counter
x = [[('1.1.1.1', 'PUT')],[('2.2.2.2', 'GET')],[('1.1.1.1', 'PUT')],[('2.2.2.2', 'POST')]]
# step 1: convert x into a dict.
m = {}
for i in x:
a, b = i[0]
if a not in m.keys():
m[a] = [b]
else:
x = m[a]
x.append(b)
m[a] = x
print('new dict is {}'.format(m))
# step 2 count frequency
m_values = list(m.values())
yy = []
for i in m_values:
x = []
k = list(Counter(i).keys())
v = list(Counter(i).values())
for i in range(len(k)):
x.append(k[i] + '=' + str(v[i]))
yy.append(x)
# step 3, update the value of the dict
m_keys = list(m.keys())
n = len(m_keys)
for i in range(n):
m[m_keys[i]] = yy[i]
print("final dict is{}".format(m))
Output is
new dict is {'1.1.1.1': ['PUT', 'PUT'], '2.2.2.2': ['GET', 'POST']}
final dict is{'1.1.1.1': ['PUT=2'], '2.2.2.2': ['GET=1', 'POST=1']}
Without dependencies and using a dict for counting, in a very basic way. Given the data_set:
data_set = [[('1.1.1.1', 'PUT')],
[('2.2.2.2', 'GET')],
[('2.2.2.2', 'POST')],
[('1.1.1.1', 'PUT')]]
Initialize the variables (manually, just few verbs) then iterate over the data:
counter = {'PUT': 0, 'GET': 0, 'POST': 0, 'DELETE': 0}
res = {}
for data in data_set:
ip, verb = data[0]
if not ip in res:
res[ip] = counter
else:
res[ip][verb] += 1
print(res)
#=> {'1.1.1.1': {'PUT': 1, 'GET': 0, 'POST': 1, 'DELETE': 0}, '2.2.2.2': {'PUT': 1, 'GET': 0, 'POST': 1, 'DELETE': 0}}
It's required to format the output to better fits your needs.
Background
I am storing data in dictionaries. The dictionaries can be off different length and in a particular dictionary there could be keys with multiple values. I am trying to spit out the data on a CSV file.
Problem/Solution
Image 1 is how my actual output prints out. Image 2 shows how i would want my output to actually printout. Image 2 is the desired output.
CODE
import csv
from itertools import izip_longest
e = {'Lebron':[25,10],'Ray':[40,15]}
c = {'Nba':5000}
def writeData():
with open('file1.csv', mode='w') as csv_file:
fieldnames = ['Player Name','Points','Assist','Company','Total Employes']
writer = csv.writer(csv_file)
writer.writerow(fieldnames)
for employee, company in izip_longest(e.items(), c.items()):
row = list(employee)
row += list(company) if company is not None else ['', ''] # Write empty fields if no company
writer.writerow(row)
writeData()
I am open to all solutions/suggestions that can help me get my desired output format.
For a much simpler answer, you just need to add one line of code to what you have:
row = [row[0]] + row[1]
so:
for employee, company in izip_longest(e.items(), c.items()):
row = list(employee)
row = [row[0]] + row[1]
row += list(company) if company is not None else ['', ''] # Write empty fields if no company
from collections import defaultdict
values = defaultdict(dict)
values[Name1] = {Points: [], Assist: [], Company: blah, Total_Employees: 123}
for generating the output, traverse through each item in the values to give you names, and populate other values using the key_values in the nested dict.
Again, make sure that there no multiple entries with same name, or choose the one with unique entries in the defaultdict.
Demo for the example-
>>> from collections import defaultdict
>>> import csv
>>> values = defaultdict(dict)
>>> vals = [["Lebron", 25, 10, "Nba", 5000], ["Ray", 40, 15]]
>>> fields = ["Name", "Points", "Assist", "Company", "Total Employes"]
>>> for item in vals:
... if len(item) == len(fields):
... details = dict()
... for j in range(1, len(fields)):
... details[fields[j]] = item[j]
... values[item[0]] = details
... elif len(item) < len(fields):
... details = dict()
... for j in range(1, len(fields)):
... if j+1 <= len(item):
... details[fields[j]] = item[j]
... else:
... details[fields[j]] = ""
... values[item[0]] = details
...
>>> values
defaultdict(<class 'dict'>, {'Lebron': {'Points': 25, 'Assist': 10, 'Company': 'Nba', 'Total Employes': 5000}, 'Ray': {'Points': 40, 'Assist': 15, 'Company': '', 'Total Employes': ''}})
>>> csv_file = open('file1.csv', 'w')
>>> writer = csv.writer(csv_file)
>>> for i in values:
... row = [i]
... for j in values[i]:
... row.append(values[i][j])
... writer.writerow(row)
...
23
13
>>> csv_file.close()
Contents of 'file1.csv':
Lebron,25,10,Nba,5000
Ray,40,15,,
had a question regarding summing the multiple values of duplicate keys into one key with the aggregate total. For example:
1:5
2:4
3:2
1:4
Very basic but I'm looking for an output that looks like:
1:9
2:4
3:2
In the two files I am using, I am dealing with a list of 51 users(column 1 of user_artists.dat) who have the artistID(column 2) and how many times that user has listened to that particular artist given by the weight(column 3).
I am attempting to aggregate the total times that artist has been played, across all users and display it in a format such as:
Britney Spears (289) 2393140. Any help or input would be so appreciated.
import codecs
#from collections import defaultdict
with codecs.open("artists.dat", encoding = "utf-8") as f:
artists = f.readlines()
with codecs.open("user_artists.dat", encoding = "utf-8") as f:
users = f.readlines()
artist_list = [x.strip().split('\t') for x in artists][1:]
user_stats_list = [x.strip().split('\t') for x in users][1:]
artists = {}
for a in artist_list:
artistID, name = a[0], a[1]
artists[artistID] = name
grouped_user_stats = {}
for u in user_stats_list:
userID, artistID, weight = u
grouped_user_stats[artistID] = grouped_user_stats[artistID].astype(int)
grouped_user_stats[weight] = grouped_user_stats[weight].astype(int)
for artistID, weight in u:
grouped_user_stats.groupby('artistID')['weight'].sum()
print(grouped_user_stats.groupby('artistID')['weight'].sum())
#if userID not in grouped_user_stats:
#grouped_user_stats[userID] = { artistID: {'name': artists[artistID], 'plays': 1} }
#else:
#if artistID not in grouped_user_stats[userID]:
#grouped_user_stats[userID][artistID] = {'name': artists[artistID], 'plays': 1}
#else:
#grouped_user_stats[userID][artistID]['plays'] += 1
#print('this never happens')
#print(grouped_user_stats)
how about:
import codecs
from collections import defaultdict
# read stuff
with codecs.open("artists.dat", encoding = "utf-8") as f:
artists = f.readlines()
with codecs.open("user_artists.dat", encoding = "utf-8") as f:
users = f.readlines()
# transform artist data in a dict with "artist id" as key and "artist name" as value
artist_repo = dict(x.strip().split('\t')[:2] for x in artists[1:])
user_stats_list = [x.strip().split('\t') for x in users][1:]
grouped_user_stats = defaultdict(lambda:0)
for u in user_stats_list:
#userID, artistID, weight = u
grouped_user_stats[u[0]] += int(u[2]) # accumulate weights in a dict with artist id as key and sum of wights as values
# extra: "fancying" the data transforming the keys of the dict in "<artist name> (artist id)" format
grouped_user_stats = dict(("%s (%s)" % (artist_repo.get(k,"Unknown artist"), k), v) for k ,v in grouped_user_stats.iteritems() )
# lastly print it
for k, v in grouped_user_stats.iteritems():
print k,v
Trying to generate a dictionary from a list of data parsed from .csv files. Getting the error "too many values to unpack", any got any ideas on a fix?
There will be repeating keys/mutliple values to append to each key.
Im pretty new to python and programming so please if you could add a short explanation of what went wrong/how to fix.
Below the script is the data how it appears when res is printed.
#!/usr/bin/python
import csv
import pprint
pp = pprint.PrettyPrinter(indent=4)
import sys
import getopt
res = []
import argparse
parser = argparse.ArgumentParser()
parser.add_argument ("infile", metavar="CSV", nargs="+", type=str, help="data file")
args = parser.parse_args()
with open("out.csv","wb") as f:
output = csv.writer(f)
for filename in args.infile:
for line in csv.reader(open(filename)):
for item in line[2:]:
#to skip empty cells
if not item.strip():
continue
item = item.split(":")
item[1] = item[1].rstrip("%")
# print([line[1]+item[0],item[1]])
res.append([line[1]+item[0],item[1]])
# output.writerow([line[1]+item[0],item[1].rstrip("%")])
pp.pprint( res )
from collections import defaultdict
initial_list = [res]
d = defaultdict(list)
pp.pprint( d )
for k, v in initial_list:
d[k].append(float(v)) # possibly `int(v)` ?
and the console
[ ['P1L', '2.04'],
['Q2R', '1.93'],
['V3I', '20.03'],
['V3M', '78.18'],
['V3S', '1.67'],
['T4L', '1.16'],
['T12N', '75.60'],
['T12S', '22.73'],
['K14E', '1.03'],
['K14R', '50.65'],
['I15*', '63.94'],
['I15V', '35.30'],
['G17A', '38.31'],
['Q18R', '38.43'],
['L19T', '98.62'],
['L24*', '2.18'],
['D25E', '1.87'],
['D25N', '2.17'],
['M36I', '99.76'],
['S37N', '97.23'],
['R41K', '99.03'],
['L63V', '99.42'],
['H69K', '99.30'],
['I72V', '5.76'],
['V82I', '98.70'],
['L89M', '98.49'],
['I93L', '99.64'],
['P4S', '99.09'],
['V35T', '99.26'],
['E36A', '98.23'],
['T39D', '98.78'],
['G45R', '3.11'],
['S48T', '99.70'],
['V60I', '99.44'],
['K102R', '1.04'],
['K103N', '99.11'],
['G112E', '2.77'],
['D123N', '8.14'],
['D123S', '91.12'],
['I132M', '1.41'],
['K173A', '99.55'],
['Q174K', '99.68'],
['D177E', '98.95'],
['G190R', '2.56'],
['E194K', '2.54'],
['T200A', '99.28'],
['Q207E', '98.75'],
['R211K', '98.77'],
['W212*', '3.00'],
['L214F', '99.25'],
['V245E', '99.30'],
['E248D', '99.58'],
['D250E', '99.02'],
['T286A', '99.70'],
['K287R', '1.78'],
['E291D', '99.22'],
['V292I', '98.28'],
['I293V', '99.58'],
['V317A', '28.20'],
['L325V', '2.40'],
['G335D', '98.33'],
['F346S', '4.42'],
['N348I', '3.81'],
['R356K', '71.43'],
['M357I', '20.00'],
['M357T', '80.00']]
defaultdict(<type 'list'>, {})
Traceback (most recent call last):
File "test.py", line 40, in <module
for k, v in initial_list:
ValueError: too many values to unpack
You are wrapping the result in a list:
initial_list = [res]
then try to iterate over the list:
d = defaultdict(list)
pp.pprint( d )
for k, v in initial_list:
d[k].append(float(v)) # possibly `int(v)` ?
You want to loop over res instead:
d = defaultdict(list)
for k, v in res:
d[k].append(float(v))
You can do all this in the CSV reading loop:
from collections import defaultdict
d = defaultdict(list)
with open("out.csv","wb") as f:
output = csv.writer(f)
for filename in args.infile:
for line in csv.reader(open(filename)):
for item in line[2:]:
#to skip empty cells
if not item.strip():
continue
key, value = item.split(":", 1)
value = value.rstrip("%")
d[line1[1] + key].append(float(value))
I want to split keys and values and display the dictionary result below mentioned format. I'm reading a file and splitting the data into list and later moving to dictionary.
Please help me to get the result.
INPUT FILE - commands.txt
login url=http://demo.url.net username=test#url.net password=mytester
create-folder foldername=demo
select-folder foldername=test123
logout
Expected result format
print result_dict
"0": {
"login": [
{
"url": "http://demo.url.net",
"username": "test#url.net",
"password": "mytester"
}
]
},
"1": {
"create-folder": {
"foldername": "demo"
}
},
"2": {
"select-folder": {
"foldername": "test-folder"
}
},
"3": {
"logout": {}
}
CODE
file=os.path.abspath('catalog/commands.txt')
list_output=[f.rstrip().split() for f in open(file).readlines()]
print list_output
counter=0
for data in list_output:
csvdata[counter]=data[0:]
counter=counter+1
print csvdata
for key,val in csvdata.iteritems():
for item in val:
if '=' in item:
key,value=item.split("=")
result[key]=value
print result
As a function:
from collections import defaultdict
from itertools import count
def read_file(file_path):
result = defaultdict(dict)
item = count()
with open(file_path) as f:
for line in f:
if not line:
continue
parts = line.split()
result[next(item)][parts[0]] = dict(p.split('=') for p in parts[1:])
return dict(result)
Better example and explanation:
s = """
login url=http://demo.url.net username=test#url.net password=mytester
create-folder foldername=demo
select-folder foldername=test123
logout
"""
from collections import defaultdict
from itertools import count
result_dict = defaultdict(dict)
item = count()
# pretend you opened the file and are reading it line by line
for line in s.splitlines():
if not line:
continue # skip empty lines
parts = line.split()
result_dict[next(item)][parts[0]] = dict(p.split('=') for p in parts[1:])
With pretty print:
>>> pprint(dict(result_dict))
{0: {'login': {'password': 'mytester',
'url': 'http://demo.url.net',
'username': 'test#url.net'}},
1: {'create-folder': {'foldername': 'demo'}},
2: {'select-folder': {'foldername': 'test123'}},
3: {'logout': {}}}
lines = ["login url=http://demo.url.net username=test#url.net password=mytester",
"create-folder foldername=demo",
"select-folder foldername=test123",
"logout"]
result = {}
for no, line in enumerate(lines):
values = line.split()
pairs = [v.split('=') for v in values[1:]]
result[str(no)] = {values[0]: [dict(pairs)] if len(pairs) > 1 else dict(pairs)}
import pprint
pprint.pprint(result)
Output:
{'0': {'login': [{'password': 'mytester',
'url': 'http://demo.url.net',
'username': 'test#url.net'}]},
'1': {'create-folder': {'foldername': 'demo'}},
'2': {'select-folder': {'foldername': 'test123'}},
'3': {'logout': {}}}
But are you sure you need the extra list inside the login value? If not, just change [dict(pairs)] if len(pairs) > 1 else dict(pairs) to dict(pairs).
r = dict()
f = open('commands.txt')
for i, line in enumerate(f.readlines()):
r[str(i)] = dict()
actions = line.split()
list_actions = {}
for action in actions[1:]:
if "=" in action:
k, v = action.split('=')
list_actions[k] = v
if len(actions[1:]) > 1:
r[str(i)][actions[0]] = [list_actions]
else:
r[str(i)][actions[0]] = list_actions
print r
Should be work