converting txt file into json using python? - python

I have a log file that has the format as follows:
Nov 28 06:26:45 server-01 dhcpd: DHCPDISCOVER from cc:d3:e2:7a:af:40 via 10.39.192.1
Nov 28 06:26:45 server-01 dhcpd: DHCPOFFER on 10.39.255.253 to cc:d3:e2:7a:af:40 via 10.39.192.1
The next step is to convert the text data into a JSON using Python. So far, I have the python script.
Now, the JSON file is created in the following format:
# Python program to convert text
# file to JSON
import json
# the file to be converted
filename = 'Logs.txt'
# resultant dictionary
dict1 = {}
# fields in the sample file
fields =['timestamp', 'Server', 'Service', 'Message']
with open(filename) as fh:
# count variable for employee id creation
l = 1
for line in fh:
# reading line by line from the text file
description = list( line.strip().split(None, 4))
# for output see below
print(description)
# for automatic creation of id for each employee
sno ='emp'+str(l)
# loop variable
i = 0
# intermediate dictionary
dict2 = {}
while i<len(fields):
# creating dictionary for each employee
dict2[fields[i]]= description[i]
i = i + 1
# appending the record of each employee to
# the main dictionary
dict1[sno]= dict2
l = l + 1
# creating json file
out_file = open("test5.json", "w")
json.dump(dict1, out_file, indent = 4)
out_file.close()
which gives the following output:
{
"emp1": { "timestamp": "Nov", "Server": "28", "Service": "06:26:45", "Message": "server-01" },
"emp2": { "timestamp": "Nov", "Server": "28", "Service": "06:26:45", "Message": "server-01" }
}
But I need an ouput like:
{
"timestamp":"Nov 28 06:26:26",
"Server":"server-01",
"Service":"dhcpd",
"Message":"DHCPOFFER on 10.45.45.31 to cc:d3:e2:7a:b9:6b via 10.45.0.1",
}
I don't know why it's not printing the whole data. Can anyone help me with this?

The problem with your code is that you did .split(None, 4), which allows only 4 splits on the input string. Since the date contains spaces too, the result of this will be (e.g. for the first line of your input):
['Nov', # timestamp
'28', # Server
'06:26:45', # Service
'server-01', # Message
'dhcpd: DHCPDISCOVER from cc:d3:e2:7a:af:40 via 10.39.192.1']
You even printed this, so I'm surprised you didn't notice something is wrong.
Now, the first element of the list is assigned to the key 'timestamp', the second element to the key 'Server', and so on. This is how you get a dict that looks like:
{ "timestamp": "Nov", "Server": "28", "Service": "06:26:45", "Message": "server-01" }
Instead, you want to split a maximum of five times. The first three elements of the resultant split are the timestamp.
# Don't need that extra list(), since .split() already returns a list
description = line.strip().split(None, 5)
# Join the first three elements,
joined_timestamp = " ".join(description[:3])
# and replace them in the list
# Setting a slice of a list: See https://stackoverflow.com/q/10623302/843953
description[:3] = [joined_timestamp]
Then, your description looks like this:
['Nov 28 06:26:45',
'server-01',
'dhcpd:',
'DHCPDISCOVER from cc:d3:e2:7a:af:40 via 10.39.192.1']
and the elements fields now correspond to the values in description.
Note that you could replace that entire while i < len(fields)... loop with simply dict2 = dict(zip(fields, description))
P.S.: You might want to clean up other elements of description, such as description[2] = description[2].rstrip(":") to remove the trailing colon in 'dhcpd:'

Related

Read a file and match lines above or below from the matching pattern

I'm reading an input json file, and capturing the array values into a dictionary, by matching tar.gz and printing a line above that (essentially the yaml file).
{"Windows": [
"/home/windows/work/input.yaml",
"/home/windows/work/windows.tar.gz"
],
"Mac": [
"/home/macos/required/utilities/input.yaml",
"/home/macos/required/utilities.tar.gz"
],
"Unix": [
"/home/unix/functional/plugins/input.yaml",
"/home/unix/functional/plugins/Plugin.tar.gz"
]
goes on..
}
Output of the dictionary:
{'/home/windows/work/windows.tar.gz': '/home/windows/work/input.yaml',
'/home/macos/required/utilities/utilities.tar.gz' : '/home/macos/required/input.yaml'
......
}
Problem being, if the entries of json changes, i.e. A) tar.gz entries can come as the 1st element in the list of values or B. or, its mix and match,
Irrespective of the entries, how can I get the output dictionary to be of above mentioned format only.
{ "Windows": [
"/home/windows/work/windows.tar.gz",
"/home/windows/work/input.yaml"
],
"Mac": [
"/home/macos/required/utilities/utilities.tar.gz",
"/home/macos/required/input.yaml"
],
"Unix": [
"/home/unix/functional/plugins/Plugin.tar.gz",
"/home/unix/functional/plugins/input.yaml"
]
goes on.. }
mix and match scenario.
{ "Windows": [
"/home/windows/work/windows.tar.gz",
"/home/windows/work/input.yaml"
],
"Mac": [
"/home/macos/required/utilities/input.yaml",
"/home/macos/required/utilities.tar.gz"
],
"Unix": [
"/home/unix/functional/plugins/Plugin.tar.gz",
"/home/unix/functional/plugins/input.yaml"
] }
My code snippet.
def read_input():
files_to_be_processed = {}
with open('input.json', 'r') as f:
lines = f.read().splitlines()
lines = [line.replace('"', '').replace(" ", '').replace(',', '') for line in lines]
for index, value in enumerate(lines):
match = re.match(r".*.tar.gz", line)
if match:
j = i-1 if i > 1 else 0
for k in range(j, i):
read_input[match.string] = lines[k]
print(read_input)
A method here is to have the following:
1- Using the JSON class in python makes your whole process much easier.
2- After taking the data in the JSON class, you can check each object (aka Windows/Max/Unix), for both the tar-gz and the yaml
3- Assign to new dictionary
Here is a quick code:
import json
def read_input():
files_to_be_processed = {}
with open('input.json','r') as f:
jsonObject = json.load(f)
for value in jsonObject.items():
tarGz = ""
Yaml = ""
for line in value[1]: #value[0] contains the key (e.g. Windows)
if line.endswith('.tar.gz'):
tarGz = line
elif line.endswith('.yaml'):
Yaml = line
files_to_be_processed[tarGz] = Yaml
print(files_to_be_processed)
read_input()
This code can be shortened and optimised using things like list comprehension and other methods, but it should be a good place to get started
One way could be for you to transform the list within your input json_dict into a dict that has a key for "yaml" and "gz"
json_dict_1 = dict.fromkeys(json_dict, dict())
for key in json_dict:
list_val = json_dict[key]
for entry in list_val:
entry_key = 'yaml' if 'yaml' in entry[-4:] else 'gz'
json_dict_1[key][entry_key] = entry
print(json_dict_1)
#{'Windows': {'yaml': '/home/unix/functional/plugins/input.yaml',
# 'gz': '/home/unix/functional/plugins/Plugin.tar.gz'},
# 'Mac': {'yaml': '/home/unix/functional/plugins/input.yaml',
# 'gz': '/home/unix/functional/plugins/Plugin.tar.gz'},
# 'Unix': {'yaml': '/home/unix/functional/plugins/input.yaml',
# 'gz': '/home/unix/functional/plugins/Plugin.tar.gz'}}

How to refer to a dictionary using input string that is name of dictionary

I write a code that creates the list of purchases in excel (shows
price, name of each product and shows total price in the end).
For example it gets a some number of strings that are also dictionary names.
And it creates an excel file with a list of purchases, their price,
and total price at the end.
I don't know what Dictionary it will be but I want to get a price
which is a key pair.
Code:
import openpyxl as xl
Milk = {
"name": "Milk",
"price": 1,
"id": "01",
}
Chicken = {
"name": "Chicken",
"price": 5,
"id": "02"
}
wb = xl.load_workbook('List.xlsx')
sheet = wb['Лист1']
command = ''
num = int(input())
for row in range(2,num):
command = input()
value = command
command = sheet.cell(row,1)
command.value = value
command = sheet.cell(row,2)
key = 'price'
command.value = command.get("price")
wb.save('transactions3.xlsx')
I guess what you might be looking for is locals(), which keeps, among others, the available variables in the local scope, see this post.
my_dict = {'apples': 4}
# assume user input is `my_dict`
dict_name = input()
# return the dictionary's values
print(locals()[dict_name])
Just as a side note, instead of using openpyxl, I would recommend to use pandas dataframes, which implement .to_csv and .from_csv and to me is much more user friendly.

Access one item from a dict and store it into a variable

I am trying to get all the "uuid"'s from an API, and the issue is that it is stored into a dict (I think). Her is how it looks on the API:
{"guild": {
"_id": "5eba1c5f8ea8c960a61f38ed",
"name": "Creators Club",
"name_lower": "creators club",
"coins": 0,
"coinsEver": 0,
"created": 1589255263630,
"members":
[{ "uuid": "db03ceff87ad4909bababc0e2622aaf8",
"rank": "Guild Master",
"joined": 1589255263630,
"expHistory": {
"2020-06-01": 280,
"2020-05-31": 4701,
"2020-05-30": 0,
"2020-05-29": 518,
"2020-05-28": 1055,
"2020-05-27": 136665,
"2020-05-26": 34806}}]
}
}
Now I am interested in the "uuid" part there, and take note: There is multiple players, it can be 1 to 100 players, and I am going to need every UUID.
Now I have done this in my python to get the UUID's displayed on the website:
try:
f = requests.get(
"https://api.hypixel.net/guild?key=[secret]&id=" + guild).json()
guildName = f["guild"]["name"]
guildMembers = f["guild"]["members"]
members = client.getPlayer(uuid=guildMembers) #this converts UUID to player names
#I need to store all uuid's in variables and put them at "guildMembers"
And that gives me all the "UUID codes", and I will be using client.getPlayer(uuid=---) to convert the UUID into the Player Names. I have to loop through each "UUID" into that code client.getPlayer(uuid=---) . But first of I need to save the UUID'S in variables, I have been doing members.uuid to access the UUID on my HTML file, but I don't know how you do the .uuid part in python
If you need anything else, just comment :)
List comprehension is a powerful concept:
members = [client.getPlayer(member['uuid']) for member in guildMembers]
Edit:
If you want to insert the names back into your data (in guildMembers),
use a dictionary comprehension with {uuid: member_name,} format:
members = {member['uuid']: client.getPlayer(uuid=member['uuid']) for member in guildMembers}
Than you can update guildMembers with your results:
for member in guildMembers:
guildMembers[member]['name'] = members[member['uuid']]
Assuming that guild is the main dictionary in which a key called members exists with a list of "sub dictionaries", you can try
uuid = list()
for x in guild['members']:
uuid.append(x['uuid'])
uuid now has all the uuids
If i understood situation right, You just need to loop through all received uuids and get players' data. Something like this:
f = requests.get("https://api.hypixel.net/guild?key=[secret]&id=" + guild).json()
guildName = f["guild"]["name"]
guildMembers = f["guild"]["members"]
guildMembersData = dict() # Here we will save member's data from getPlayer method
for guildMember in guildMembers:
uuid = guildMember["uuid"]
memberData = client.getPlayer(uuid=uuid)
guildMembersData[uuid] = client.getPlayer(uuid=guildMember["uuid"])
print(guildMembersData) # Here will be players' Data.

Writing JSON data in python. Format

I have this method that writes json data to a file. The title is based on books and data is the book publisher,date,author, etc. The method works fine if I wanted to add one book.
Code
import json
def createJson(title,firstName,lastName,date,pageCount,publisher):
print "\n*** Inside createJson method for " + title + "***\n";
data = {}
data[title] = []
data[title].append({
'firstName:', firstName,
'lastName:', lastName,
'date:', date,
'pageCount:', pageCount,
'publisher:', publisher
})
with open('data.json','a') as outfile:
json.dump(data,outfile , default = set_default)
def set_default(obj):
if isinstance(obj,set):
return list(obj)
if __name__ == '__main__':
createJson("stephen-king-it","stephen","king","1971","233","Viking Press")
JSON File with one book/one method call
{
"stephen-king-it": [
["pageCount:233", "publisher:Viking Press", "firstName:stephen", "date:1971", "lastName:king"]
]
}
However if I call the method multiple times , thus adding more book data to the json file. The format is all wrong. For instance if I simply call the method twice with a main method of
if __name__ == '__main__':
createJson("stephen-king-it","stephen","king","1971","233","Viking Press")
createJson("william-golding-lord of the flies","william","golding","1944","134","Penguin Books")
My JSON file looks like
{
"stephen-king-it": [
["pageCount:233", "publisher:Viking Press", "firstName:stephen", "date:1971", "lastName:king"]
]
} {
"william-golding-lord of the flies": [
["pageCount:134", "publisher:Penguin Books", "firstName:william","lastName:golding", "date:1944"]
]
}
Which is obviously wrong. Is there a simple fix to edit my method to produce a correct JSON format? I look at many simple examples online on putting json data in python. But all of them gave me format errors when I checked on JSONLint.com . I have been racking my brain to fix this problem and editing the file to make it correct. However all my efforts were to no avail. Any help is appreciated. Thank you very much.
Simply appending new objects to your file doesn't create valid JSON. You need to add your new data inside the top-level object, then rewrite the entire file.
This should work:
def createJson(title,firstName,lastName,date,pageCount,publisher):
print "\n*** Inside createJson method for " + title + "***\n";
# Load any existing json data,
# or create an empty object if the file is not found,
# or is empty
try:
with open('data.json') as infile:
data = json.load(infile)
except FileNotFoundError:
data = {}
if not data:
data = {}
data[title] = []
data[title].append({
'firstName:', firstName,
'lastName:', lastName,
'date:', date,
'pageCount:', pageCount,
'publisher:', publisher
})
with open('data.json','w') as outfile:
json.dump(data,outfile , default = set_default)
A JSON can either be an array or a dictionary. In your case the JSON has two objects, one with the key stephen-king-it and another with william-golding-lord of the flies. Either of these on their own would be okay, but the way you combine them is invalid.
Using an array you could do this:
[
{ "stephen-king-it": [] },
{ "william-golding-lord of the flies": [] }
]
Or a dictionary style format (I would recommend this):
{
"stephen-king-it": [],
"william-golding-lord of the flies": []
}
Also the data you are appending looks like it should be formatted as key value pairs in a dictionary (which would be ideal). You need to change it to this:
data[title].append({
'firstName': firstName,
'lastName': lastName,
'date': date,
'pageCount': pageCount,
'publisher': publisher
})

Python container troubles

Basically what I am trying to do is generate a json list of SSH keys (public and private) on a server using Python. I am using nested dictionaries and while it does work to an extent, the issue lies with it displaying every other user's keys; I need it to list only the keys that belong to the user for each user.
Below is my code:
def ssh_key_info(key_files):
for f in key_files:
c_time = os.path.getctime(f) # gets the creation time of file (f)
username_list = f.split('/') # splits on the / character
user = username_list[2] # assigns the 2nd field frome the above spilt to the user variable
key_length_cmd = check_output(['ssh-keygen','-l','-f', f]) # Run the ssh-keygen command on the file (f)
attr_dict = {}
attr_dict['Date Created'] = str(datetime.datetime.fromtimestamp(c_time)) # converts file create time to string
attr_dict['Key_Length]'] = key_length_cmd[0:5] # assigns the first 5 characters of the key_length_cmd variable
ssh_user_key_dict[f] = attr_dict
user_dict['SSH_Keys'] = ssh_user_key_dict
main_dict[user] = user_dict
A list containing the absolute path of the keys (/home/user/.ssh/id_rsa for example) is passed to the function. Below is an example of what I receive:
{
"user1": {
"SSH_Keys": {
"/home/user1/.ssh/id_rsa": {
"Date Created": "2017-03-09 01:03:20.995862",
"Key_Length]": "2048 "
},
"/home/user2/.ssh/id_rsa": {
"Date Created": "2017-03-09 01:03:21.457867",
"Key_Length]": "2048 "
},
"/home/user2/.ssh/id_rsa.pub": {
"Date Created": "2017-03-09 01:03:21.423867",
"Key_Length]": "2048 "
},
"/home/user1/.ssh/id_rsa.pub": {
"Date Created": "2017-03-09 01:03:20.956862",
"Key_Length]": "2048 "
}
}
},
As can be seen, user2's key files are included in user1's output. I may be going about this completely wrong, so any pointers are welcomed.
Thanks for the replies, I read up on nested dictionaries and found that the best answer on this post, helped me solve the issue: What is the best way to implement nested dictionaries?
Instead of all the dictionaries, I simplfied the code and just have one dictionary now. This is the working code:
class Vividict(dict):
def __missing__(self, key): # Sets and return a new instance
value = self[key] = type(self)() # retain local pointer to value
return value # faster to return than dict lookup
main_dict = Vividict()
def ssh_key_info(key_files):
for f in key_files:
c_time = os.path.getctime(f)
username_list = f.split('/')
user = username_list[2]
key_bit_cmd = check_output(['ssh-keygen','-l','-f', f])
date_created = str(datetime.datetime.fromtimestamp(c_time))
key_type = key_bit_cmd[-5:-2]
key_bits = key_bit_cmd[0:5]
main_dict[user]['SSH Keys'][f]['Date Created'] = date_created
main_dict[user]['SSH Keys'][f]['Key Type'] = key_type
main_dict[user]['SSH Keys'][f]['Bits'] = key_bits

Categories