Display XML to CSV results in table format

Display XML to CSV results in table format - python

I have the python code script. It is pulling the XML data and copying to CSV. I need the data to be shown in column format. The script is reporting the information as it is shown on the first results. I need to have it report in table format.
Example 1
EventID,TargetUserSid,TargetUserName,TargetDomainName,TargetLogonId
4634, S-1-5-21-2795111079-3225111112-3329435632-1610,grant.larson,AFC,0x3642df8
Example 2
This is the picture for the results of the code below.
EventID 4634
TargetUserSid S-1-5-21-2795111079-3225111112-3329435632-1610
TargetUserName grant.larson
TargetDomainName AFC
TargetLogonId 0x3642df8
LogonType 3
from xml.etree import ElementTree as ET
import pandas as pd
import csv
tree = ET.parse("SecurityLog-rev2.xml")
root = tree.getroot()
ns = "{http://schemas.microsoft.com/win/2004/08/events/event}"
data = []
for eventID in root.findall(".//"):
if eventID.tag == f"{ns}System":
for e_id in eventID.iter():
if e_id.tag == f'{ns}EventID':
row = "EventID", e_id.text
data.append(row)
if eventID.tag == f"{ns}EventData":
for attr in eventID.iter():
if attr.tag == f'{ns}Data':
#print(attr.attrib)
row = attr.get('Name'), attr.text
data.append(row)
df = pd.DataFrame.from_dict(data, orient='columns')
df.to_csv('event_log.csv', index=False, header=False)
print(df)

Related

Convert Log file to Dataframe Pandas

I have log files, which have many lines in the form of :
<log uri="Brand" t="2017-01-24T11:33:54" u="Rohan" a="U" ref="00000000-2017-01" desc="This has been updated."></log>
I am trying to convert each line in the log file into a Data frame and store it in csv or excel format. I want only values of uri, t is nothing but time u for username and desc for description
Something like this
Columns :- uri Date Time User Description
Brand 2017-01-24 11:33:54 Rohan This has been updated.
and so on.

As mentionned by #Corralien in the comments, you can use some of beautifulsoup functions (Beautifulsoup and find_all) to parse each line in your logfile separately, then use pandas.DataFrame constructor with a listcomp to make a DataFrame for each line :
import pandas as pd
import bs4 #pip install beautifulsoup4

with open("/tmp/logfile.txt", "r") as f:
logFile = f.read()

soupObj = bs4.BeautifulSoup(logFile, "html5lib")

dfList = [pd.DataFrame([(x["uri"], *x["t"].split("T"), x["u"], x["desc"])],
columns=["uri", "Date", "Time", "User", "Description"])
for x in soupObj.find_all("log")]
#this bloc creates an Excel file for each df
for lineNumber, df in enumerate(dfList, start=1):
df.to_excel(f"logfile_{lineNumber}.xlsx", index=False)
Output :
print(dfList[0])
uri Date Time User Description
0 Brand 2017-01-24 11:33:54 Rohan This has been updated.
Update :
If you need a single dataframe/spreadsheet for the all the lines, use this :
with open("/tmp/logfile.txt", "r") as f:
soupObj = bs4.BeautifulSoup(f, "html5lib")
df = pd.DataFrame([(x["uri"], *x["t"].split("T"), x["u"], x["desc"])
for x in soupObj.find_all("log")],
columns=["uri", "Date", "Time", "User", "Description"])
df.to_excel("logfile.xlsx", index=False)

convert xml to csv using python

I am learning my way around python and right now I need a little bit of help. I have an XML file from soap api that I am failing at converting to CSV. I managed to get the data with the request library easily. My struggle is converting it to CSV, I end up with headers with no values
My XML Data :
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<Level2 xmlns="https://xxxxxxxxxx/xxxxxxx">
<Level3>
<ResponseStatus>Success</ResponseStatus>
<ErrorMessage/>
<Message>20 alert(s) generated for this period</Message>
<ProcessingTimeSecs>0.88217689999999993</ProcessingTimeSecs>
<Something1>1</Something1>
<Something2/>
<Something3/>
<Something4/>
<VIP>
<MainVIP>
<Date>20210616</Date>
<RegisteredDate>20210216</RegisteredDate>
<Type>YMBA</Type>
<TypeDescription>TYPE OF ENQUIRY</TypeDescription>
<BusinessName>COMPANY NAME</BusinessName>
<ITNumber>987654321</ITNumber>
<RegistrationNumber>123456789</RegistrationNumber>
<SubscriberNumber>55889977</SubscriberNumber>
<SubscriberReference/>
<TicketNumber>1122336655</TicketNumber>
<SubscriberName>COMPANY NAME 2 </SubscriberName>
<CompletedDate>20210615</CompletedDate>
</MainVIP>
</VIP>
<Something5/>
<Something6/>
<Something7/>
<Something8/>
<Something9/>
<PrincipalSomething10/>
<PrincipalSomething11/>
<PrincipalSomething12/>
<PrincipalSomething13/>
<Something14/>
<Something15/>
<Something16/>
<Something17/>
<Something18/>
<PrincipalSomething19/>
<PrincipalSomething20/>
</Level3>
</Level2>
</soap:Body>
</soap:Envelope>
My python code looks like this :
import xml.etree.ElementTree as ET
import pandas as pd
cols = ['Date', 'RegisteredDate', 'Type',
'TypeDescription']
rows = []
# parse xml file
xmlparse = ET.parse('xmldata.xml')
root = xmlparse.getroot()
for i in root:
Date = i.get('Date').text
RegisteredDate = i.get('RegisteredDate').text
Type = i.get('Type').text
TypeDescription = i.get('TypeDescription').text
rows.append({'Date': Date,
'RegisteredDate': RegisteredDate,
'Type': Type,
'TypeDescription': TypeDescription})
df = pd.DataFrame(rows, columns=cols)
print(df)
df.to_csv('csvdata.csv')
In my approach, I was following the idea from here https://www.geeksforgeeks.org/convert-xml-to-csv-in-python/

You probably don't need to go through ElementTree; you can feed the xml directly to pandas. If I understand you correctly, this should do it:
df = pd.read_xml(path_to_file,"//*[local-name()='MainVIP']")
df = df.iloc[:,:4]
df
Output from your xml above:
Date RegisteredDate Type TypeDescription
0 20210616 20210216 YMBA TYPE OF ENQUIRY

Without any external lib - the code below generates a csv file.
The idea is to collect the required elements data from MainVip and store it in list of dicts. Loop on the list and write the data into a file.
import xml.etree.ElementTree as ET
xml = ''' <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<Level2 xmlns="https://xxxxxxxxxx/xxxxxxx">
<Level3>
<ResponseStatus>Success</ResponseStatus>
<ErrorMessage/>
<Message>20 alert(s) generated for this period</Message>
<ProcessingTimeSecs>0.88217689999999993</ProcessingTimeSecs>
<Something1>1</Something1>
<Something2/>
<Something3/>
<Something4/>
<VIP>
<MainVIP>
<Date>20210616</Date>
<RegisteredDate>20210216</RegisteredDate>
<Type>YMBA</Type>
<TypeDescription>TYPE OF ENQUIRY</TypeDescription>
<BusinessName>COMPANY NAME</BusinessName>
<ITNumber>987654321</ITNumber>
<RegistrationNumber>123456789</RegistrationNumber>
<SubscriberNumber>55889977</SubscriberNumber>
<SubscriberReference/>
<TicketNumber>1122336655</TicketNumber>
<SubscriberName>COMPANY NAME 2 </SubscriberName>
<CompletedDate>20210615</CompletedDate>
</MainVIP>
</VIP>
<Something5/>
<Something6/>
<Something7/>
<Something8/>
<Something9/>
<PrincipalSomething10/>
<PrincipalSomething11/>
<PrincipalSomething12/>
<PrincipalSomething13/>
<Something14/>
<Something15/>
<Something16/>
<Something17/>
<Something18/>
<PrincipalSomething19/>
<PrincipalSomething20/>
</Level3>
</Level2>
</soap:Body>
</soap:Envelope>'''
cols = ['Date', 'RegisteredDate', 'Type',
'TypeDescription']
rows = []
NS = '{https://xxxxxxxxxx/xxxxxxx}'
root = ET.fromstring(xml)
for vip in root.findall(f'.//{NS}MainVIP'):
rows.append({c: vip.find(NS+c).text for c in cols})
with open('out.csv','w') as f:
f.write(','.join(cols) + '\n')
for row in rows:
f.write(','.join(row[c] for c in cols) + '\n')
out.csv
Date,RegisteredDate,Type,TypeDescription
20210616,20210216,YMBA,TYPE OF ENQUIRY

Read an xlsx file and write xml

I need your help please, I'm trying to write python code, where in input I take an xlsx I read the various fields of the cells and then generate an xml. I had trouble reading dates, so I used pandas, so I could use dataframes, now you give her a handful, I also read the test data, but I am not succeeding in writing this data within XML; can you please help me?
import pandas as pd
import datetime
import json
import datetime
from xml.etree import ElementTree as ET
df = pd.read_excel('parser.xlsx') #leggo il file excel
df['data autorizzazio'] = pd.to_datetime(df['data autorizzazio'])
#df['data movimentazio'] = pd.to_datetime(df['data menter code hereovimentazio'])
#df.head()
#df.info()
ET.register_namespace("CBIPaymentRequest","http://www.w3.org/2001/XMLSchema-instance")
root = ET.Element("{http://www.w3.org/2001/XMLSchema-instance}CBIPaymentRequest")
root1 = ET.SubElement(root,"GrpHdr")
#root2 = ET.SubElement(root,"PmtInf")
MsgId = ET.SubElement(root1,'MsgId')
MsgId = df.loc[0].values[1]#setto il valore della cella interessata
MsgId.text = df['data autorizzazio'].values[1]
#MsgId = MsgId
print(MsgId)
Prova = ET.SubElement(root1,'PROVA')
Prova = df.loc[0].values[5]
Prova1 = df.__setitem__(Prova,'Prova')
#Prova.text = df['Saluto5'].values[1]
print(Prova)
tree = ET.ElementTree(root)
tree.write("pandas_output_test_1.xml")

at the moment I generate this xml(unfortunately empty)
<CBIPaymentRequest:CBIPaymentRequest xmlns:CBIPaymentRequest="http://www.w3.org/2001/XMLSchema-instance">
<GrpHdr>
<MsgId/>
<PROVA/>
</GrpHdr>
</CBIPaymentRequest:CBIPaymentRequest>
I wish it was populated with data I read in xlsx file

How skip to another loop in python if no data returned by the API?

I have a python code that loops through multiple location and pulls data from a third part API. Below is the code sublocation_idsare location id coming from a directory.
As you can see from the code the data gets converted to a data frame and then saved to a Excel file. The current issue I am facing is if the API does not returns data for publication_timestamp for certain location the loop stops and does not proceeds and I get error as shown below the code.
How do I avoid this and skip to another loop if no data is returned by the API?
for sub in sublocation_ids:
city_num_int = sub['id']
city_num_str = str(city_num_int)
city_name = sub['name']
filter_text_new = filter_text.format(city_num_str)
data = json.dumps({"filters": [filter_text_new], "sort_by":"created_at", "size":2})
r = requests.post(url = api_endpoint, data = data).json()
articles_list = r["articles"]
articles_list_normalized = json_normalize(articles_list)
df = articles_list_normalized
df['publication_timestamp'] = pd.to_datetime(df['publication_timestamp'])
df['publication_timestamp'] = df['publication_timestamp'].apply(lambda x: x.now().strftime('%Y-%m-%d'))
df.to_excel(writer, sheet_name = city_name)
writer.save()
Key Error: publication_timestamp

Change this bit of code:
df = articles_list_normalized
if 'publication_timestamp' in df.columns:
df['publication_timestamp'] = pd.to_datetime(df['publication_timestamp'])
df['publication_timestamp'] = df['publication_timestamp'].apply(lambda x: x.now().strftime('%Y-%m-%d'))
df.to_excel(writer, sheet_name = city_name)
else:
continue
If the API literally returns no data i.e. {} then you might even do the check before normalizing it:
if articles_list:
df = json_normalize(articles_list)
# ... rest of code ...
else:
continue

Exporting filtered Excel table with Python

Python 3.4.3 | Anaconda 2.3 | Pandas
I have filtered some data from an extensive excel. I have it running for two names:
import pandas as pd
import sys
#file loc
R1 = input('Data do Relatório desejado (dd.mm) ---> ')
loc = r'C:\Users\lucas.mascia\Downloads\relatorio-{0}.xlsx'.format(R1)
######################################################################
#Solicitantes
ps_sol = ["Mauro Cavalheiro Junior", "Aline Oliveira"]
#Aplicando filtros
for name in ps_sol:
#opening file
df = pd.read_excel(loc)
dfps = df[[2,15,16,17]]
#apply filter
f1 = dfps[(dfps['Cliente']=="POSTAL SAUDE")
& (dfps['Nome do solicitante']==name)]
#print info
print ('''
=============================================================
Relatorio do dia: {}
Cliente: POSTAL SAUDE
Solicitante: {}
=============================================================
'''.format(R1, name))
print (f1)
f1.to_excel('C:/Users/lucas.mascia/Downloads/ps_sol.xlsx', sheet_name=name)
At the end I am trying to export to another .xlsx file. But it is only saving the info of the last name in the list.
I want it to save for all names that i list in my ps_sol
Help please (:

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Display XML to CSV results in table format - python

Related

Convert Log file to Dataframe Pandas

convert xml to csv using python

Read an xlsx file and write xml

How skip to another loop in python if no data returned by the API?

Exporting filtered Excel table with Python

Categories

Resources