I have the python code script. It is pulling the XML data and copying to CSV. I need the data to be shown in column format. The script is reporting the information as it is shown on the first results. I need to have it report in table format.
Example 1
EventID,TargetUserSid,TargetUserName,TargetDomainName,TargetLogonId
4634, S-1-5-21-2795111079-3225111112-3329435632-1610,grant.larson,AFC,0x3642df8
Example 2
This is the picture for the results of the code below.
EventID 4634
TargetUserSid S-1-5-21-2795111079-3225111112-3329435632-1610
TargetUserName grant.larson
TargetDomainName AFC
TargetLogonId 0x3642df8
LogonType 3
from xml.etree import ElementTree as ET
import pandas as pd
import csv
tree = ET.parse("SecurityLog-rev2.xml")
root = tree.getroot()
ns = "{http://schemas.microsoft.com/win/2004/08/events/event}"
data = []
for eventID in root.findall(".//"):
if eventID.tag == f"{ns}System":
for e_id in eventID.iter():
if e_id.tag == f'{ns}EventID':
row = "EventID", e_id.text
data.append(row)
if eventID.tag == f"{ns}EventData":
for attr in eventID.iter():
if attr.tag == f'{ns}Data':
#print(attr.attrib)
row = attr.get('Name'), attr.text
data.append(row)
df = pd.DataFrame.from_dict(data, orient='columns')
df.to_csv('event_log.csv', index=False, header=False)
print(df)
Related
I have log files, which have many lines in the form of :
<log uri="Brand" t="2017-01-24T11:33:54" u="Rohan" a="U" ref="00000000-2017-01" desc="This has been updated."></log>
I am trying to convert each line in the log file into a Data frame and store it in csv or excel format. I want only values of uri, t is nothing but time u for username and desc for description
Something like this
Columns :- uri Date Time User Description
Brand 2017-01-24 11:33:54 Rohan This has been updated.
and so on.
As mentionned by #Corralien in the comments, you can use some of beautifulsoup functions (Beautifulsoup and find_all) to parse each line in your logfile separately, then use pandas.DataFrame constructor with a listcomp to make a DataFrame for each line :
import pandas as pd
import bs4 #pip install beautifulsoup4
with open("/tmp/logfile.txt", "r") as f:
logFile = f.read()
soupObj = bs4.BeautifulSoup(logFile, "html5lib")
dfList = [pd.DataFrame([(x["uri"], *x["t"].split("T"), x["u"], x["desc"])],
columns=["uri", "Date", "Time", "User", "Description"])
for x in soupObj.find_all("log")]
#this bloc creates an Excel file for each df
for lineNumber, df in enumerate(dfList, start=1):
df.to_excel(f"logfile_{lineNumber}.xlsx", index=False)
Output :
print(dfList[0])
uri Date Time User Description
0 Brand 2017-01-24 11:33:54 Rohan This has been updated.
Update :
If you need a single dataframe/spreadsheet for the all the lines, use this :
with open("/tmp/logfile.txt", "r") as f:
soupObj = bs4.BeautifulSoup(f, "html5lib")
df = pd.DataFrame([(x["uri"], *x["t"].split("T"), x["u"], x["desc"])
for x in soupObj.find_all("log")],
columns=["uri", "Date", "Time", "User", "Description"])
df.to_excel("logfile.xlsx", index=False)
I am learning my way around python and right now I need a little bit of help. I have an XML file from soap api that I am failing at converting to CSV. I managed to get the data with the request library easily. My struggle is converting it to CSV, I end up with headers with no values
My XML Data :
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<Level2 xmlns="https://xxxxxxxxxx/xxxxxxx">
<Level3>
<ResponseStatus>Success</ResponseStatus>
<ErrorMessage/>
<Message>20 alert(s) generated for this period</Message>
<ProcessingTimeSecs>0.88217689999999993</ProcessingTimeSecs>
<Something1>1</Something1>
<Something2/>
<Something3/>
<Something4/>
<VIP>
<MainVIP>
<Date>20210616</Date>
<RegisteredDate>20210216</RegisteredDate>
<Type>YMBA</Type>
<TypeDescription>TYPE OF ENQUIRY</TypeDescription>
<BusinessName>COMPANY NAME</BusinessName>
<ITNumber>987654321</ITNumber>
<RegistrationNumber>123456789</RegistrationNumber>
<SubscriberNumber>55889977</SubscriberNumber>
<SubscriberReference/>
<TicketNumber>1122336655</TicketNumber>
<SubscriberName>COMPANY NAME 2 </SubscriberName>
<CompletedDate>20210615</CompletedDate>
</MainVIP>
</VIP>
<Something5/>
<Something6/>
<Something7/>
<Something8/>
<Something9/>
<PrincipalSomething10/>
<PrincipalSomething11/>
<PrincipalSomething12/>
<PrincipalSomething13/>
<Something14/>
<Something15/>
<Something16/>
<Something17/>
<Something18/>
<PrincipalSomething19/>
<PrincipalSomething20/>
</Level3>
</Level2>
</soap:Body>
</soap:Envelope>
My python code looks like this :
import xml.etree.ElementTree as ET
import pandas as pd
cols = ['Date', 'RegisteredDate', 'Type',
'TypeDescription']
rows = []
# parse xml file
xmlparse = ET.parse('xmldata.xml')
root = xmlparse.getroot()
for i in root:
Date = i.get('Date').text
RegisteredDate = i.get('RegisteredDate').text
Type = i.get('Type').text
TypeDescription = i.get('TypeDescription').text
rows.append({'Date': Date,
'RegisteredDate': RegisteredDate,
'Type': Type,
'TypeDescription': TypeDescription})
df = pd.DataFrame(rows, columns=cols)
print(df)
df.to_csv('csvdata.csv')
In my approach, I was following the idea from here https://www.geeksforgeeks.org/convert-xml-to-csv-in-python/
You probably don't need to go through ElementTree; you can feed the xml directly to pandas. If I understand you correctly, this should do it:
df = pd.read_xml(path_to_file,"//*[local-name()='MainVIP']")
df = df.iloc[:,:4]
df
Output from your xml above:
Date RegisteredDate Type TypeDescription
0 20210616 20210216 YMBA TYPE OF ENQUIRY
Without any external lib - the code below generates a csv file.
The idea is to collect the required elements data from MainVip and store it in list of dicts. Loop on the list and write the data into a file.
import xml.etree.ElementTree as ET
xml = ''' <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<Level2 xmlns="https://xxxxxxxxxx/xxxxxxx">
<Level3>
<ResponseStatus>Success</ResponseStatus>
<ErrorMessage/>
<Message>20 alert(s) generated for this period</Message>
<ProcessingTimeSecs>0.88217689999999993</ProcessingTimeSecs>
<Something1>1</Something1>
<Something2/>
<Something3/>
<Something4/>
<VIP>
<MainVIP>
<Date>20210616</Date>
<RegisteredDate>20210216</RegisteredDate>
<Type>YMBA</Type>
<TypeDescription>TYPE OF ENQUIRY</TypeDescription>
<BusinessName>COMPANY NAME</BusinessName>
<ITNumber>987654321</ITNumber>
<RegistrationNumber>123456789</RegistrationNumber>
<SubscriberNumber>55889977</SubscriberNumber>
<SubscriberReference/>
<TicketNumber>1122336655</TicketNumber>
<SubscriberName>COMPANY NAME 2 </SubscriberName>
<CompletedDate>20210615</CompletedDate>
</MainVIP>
</VIP>
<Something5/>
<Something6/>
<Something7/>
<Something8/>
<Something9/>
<PrincipalSomething10/>
<PrincipalSomething11/>
<PrincipalSomething12/>
<PrincipalSomething13/>
<Something14/>
<Something15/>
<Something16/>
<Something17/>
<Something18/>
<PrincipalSomething19/>
<PrincipalSomething20/>
</Level3>
</Level2>
</soap:Body>
</soap:Envelope>'''
cols = ['Date', 'RegisteredDate', 'Type',
'TypeDescription']
rows = []
NS = '{https://xxxxxxxxxx/xxxxxxx}'
root = ET.fromstring(xml)
for vip in root.findall(f'.//{NS}MainVIP'):
rows.append({c: vip.find(NS+c).text for c in cols})
with open('out.csv','w') as f:
f.write(','.join(cols) + '\n')
for row in rows:
f.write(','.join(row[c] for c in cols) + '\n')
out.csv
Date,RegisteredDate,Type,TypeDescription
20210616,20210216,YMBA,TYPE OF ENQUIRY
I need your help please, I'm trying to write python code, where in input I take an xlsx I read the various fields of the cells and then generate an xml. I had trouble reading dates, so I used pandas, so I could use dataframes, now you give her a handful, I also read the test data, but I am not succeeding in writing this data within XML; can you please help me?
import pandas as pd
import datetime
import json
import datetime
from xml.etree import ElementTree as ET
df = pd.read_excel('parser.xlsx') #leggo il file excel
df['data autorizzazio'] = pd.to_datetime(df['data autorizzazio'])
#df['data movimentazio'] = pd.to_datetime(df['data menter code hereovimentazio'])
#df.head()
#df.info()
ET.register_namespace("CBIPaymentRequest","http://www.w3.org/2001/XMLSchema-instance")
root = ET.Element("{http://www.w3.org/2001/XMLSchema-instance}CBIPaymentRequest")
root1 = ET.SubElement(root,"GrpHdr")
#root2 = ET.SubElement(root,"PmtInf")
MsgId = ET.SubElement(root1,'MsgId')
MsgId = df.loc[0].values[1]#setto il valore della cella interessata
MsgId.text = df['data autorizzazio'].values[1]
#MsgId = MsgId
print(MsgId)
Prova = ET.SubElement(root1,'PROVA')
Prova = df.loc[0].values[5]
Prova1 = df.__setitem__(Prova,'Prova')
#Prova.text = df['Saluto5'].values[1]
print(Prova)
tree = ET.ElementTree(root)
tree.write("pandas_output_test_1.xml")
at the moment I generate this xml(unfortunately empty)
<CBIPaymentRequest:CBIPaymentRequest xmlns:CBIPaymentRequest="http://www.w3.org/2001/XMLSchema-instance">
<GrpHdr>
<MsgId/>
<PROVA/>
</GrpHdr>
</CBIPaymentRequest:CBIPaymentRequest>
I wish it was populated with data I read in xlsx file
I have a python code that loops through multiple location and pulls data from a third part API. Below is the code sublocation_idsare location id coming from a directory.
As you can see from the code the data gets converted to a data frame and then saved to a Excel file. The current issue I am facing is if the API does not returns data for publication_timestamp for certain location the loop stops and does not proceeds and I get error as shown below the code.
How do I avoid this and skip to another loop if no data is returned by the API?
for sub in sublocation_ids:
city_num_int = sub['id']
city_num_str = str(city_num_int)
city_name = sub['name']
filter_text_new = filter_text.format(city_num_str)
data = json.dumps({"filters": [filter_text_new], "sort_by":"created_at", "size":2})
r = requests.post(url = api_endpoint, data = data).json()
articles_list = r["articles"]
articles_list_normalized = json_normalize(articles_list)
df = articles_list_normalized
df['publication_timestamp'] = pd.to_datetime(df['publication_timestamp'])
df['publication_timestamp'] = df['publication_timestamp'].apply(lambda x: x.now().strftime('%Y-%m-%d'))
df.to_excel(writer, sheet_name = city_name)
writer.save()
Key Error: publication_timestamp
Change this bit of code:
df = articles_list_normalized
if 'publication_timestamp' in df.columns:
df['publication_timestamp'] = pd.to_datetime(df['publication_timestamp'])
df['publication_timestamp'] = df['publication_timestamp'].apply(lambda x: x.now().strftime('%Y-%m-%d'))
df.to_excel(writer, sheet_name = city_name)
else:
continue
If the API literally returns no data i.e. {} then you might even do the check before normalizing it:
if articles_list:
df = json_normalize(articles_list)
# ... rest of code ...
else:
continue
Python 3.4.3 | Anaconda 2.3 | Pandas
I have filtered some data from an extensive excel. I have it running for two names:
import pandas as pd
import sys
#file loc
R1 = input('Data do Relatório desejado (dd.mm) ---> ')
loc = r'C:\Users\lucas.mascia\Downloads\relatorio-{0}.xlsx'.format(R1)
######################################################################
#Solicitantes
ps_sol = ["Mauro Cavalheiro Junior", "Aline Oliveira"]
#Aplicando filtros
for name in ps_sol:
#opening file
df = pd.read_excel(loc)
dfps = df[[2,15,16,17]]
#apply filter
f1 = dfps[(dfps['Cliente']=="POSTAL SAUDE")
& (dfps['Nome do solicitante']==name)]
#print info
print ('''
=============================================================
Relatorio do dia: {}
Cliente: POSTAL SAUDE
Solicitante: {}
=============================================================
'''.format(R1, name))
print (f1)
f1.to_excel('C:/Users/lucas.mascia/Downloads/ps_sol.xlsx', sheet_name=name)
At the end I am trying to export to another .xlsx file. But it is only saving the info of the last name in the list.
I want it to save for all names that i list in my ps_sol
Help please (: