I have log files, which have many lines in the form of :
<log uri="Brand" t="2017-01-24T11:33:54" u="Rohan" a="U" ref="00000000-2017-01" desc="This has been updated."></log>
I am trying to convert each line in the log file into a Data frame and store it in csv or excel format. I want only values of uri, t is nothing but time u for username and desc for description
Something like this
Columns :- uri Date Time User Description
Brand 2017-01-24 11:33:54 Rohan This has been updated.
and so on.
As mentionned by #Corralien in the comments, you can use some of beautifulsoup functions (Beautifulsoup and find_all) to parse each line in your logfile separately, then use pandas.DataFrame constructor with a listcomp to make a DataFrame for each line :
import pandas as pd
import bs4 #pip install beautifulsoup4
​
with open("/tmp/logfile.txt", "r") as f:
logFile = f.read()
​
soupObj = bs4.BeautifulSoup(logFile, "html5lib")
​
dfList = [pd.DataFrame([(x["uri"], *x["t"].split("T"), x["u"], x["desc"])],
columns=["uri", "Date", "Time", "User", "Description"])
for x in soupObj.find_all("log")]
#this bloc creates an Excel file for each df​
for lineNumber, df in enumerate(dfList, start=1):
df.to_excel(f"logfile_{lineNumber}.xlsx", index=False)
Output :
print(dfList[0])
uri Date Time User Description
0 Brand 2017-01-24 11:33:54 Rohan This has been updated.
Update :
If you need a single dataframe/spreadsheet for the all the lines, use this :
with open("/tmp/logfile.txt", "r") as f:
soupObj = bs4.BeautifulSoup(f, "html5lib")
df = pd.DataFrame([(x["uri"], *x["t"].split("T"), x["u"], x["desc"])
for x in soupObj.find_all("log")],
columns=["uri", "Date", "Time", "User", "Description"])
df.to_excel("logfile.xlsx", index=False)
Related
I'm working on python code to update and append token price and volume data using gate.io's API to a .csv file. Basically trying to check to see if it's up to date, and update with the most recently hour's data if not. The below code isn't throwing any errors, but it's not working. My columns are all in the same order as they are in the code. Any assistance would be greatly appreciated, thank you
import requests
import pandas as pd
from datetime import datetime
# Define API endpoint and parameters
host = "https://api.gateio.ws"
prefix = "/api/v4"
url = '/spot/candlesticks'
currency_pair = "BTC_USDT"
interval = "1h"
# Read the existing data from the csv file
df = pd.read_csv("price_calcs.csv")
# Extract the last timestamp from the csv file
last_timestamp = df["time1"].iloc[-1]
# Convert the timestamp to datetime and add an hour to get the new "from" parameter
from_time = datetime.utcfromtimestamp(last_timestamp).strftime('%Y-%m-%d %H:%M:%S')
to_time = datetime.utcnow().strftime('%Y-%m-%d %H:%M:%S')
# Use the last timestamp to make a 'GET' request to the API to get the latest hourly data for the token
query_params = {"currency_pair": currency_pair, "from": from_time, "to": to_time, "interval": interval}
r = requests.get(host + prefix + url, params=query_params)
# Append the new data to the existing data from the csv file
new_data = pd.DataFrame(r.json(), columns=["time1", "volume1", "close1", "high1", "low1", "open1", "volume2"])
df = pd.concat([df, new_data])
# Write the updated data to the csv file
df.to_csv("price_calcs.csv", index=False)
Nevermind figured it out myself
I have the python code script. It is pulling the XML data and copying to CSV. I need the data to be shown in column format. The script is reporting the information as it is shown on the first results. I need to have it report in table format.
Example 1
EventID,TargetUserSid,TargetUserName,TargetDomainName,TargetLogonId
4634, S-1-5-21-2795111079-3225111112-3329435632-1610,grant.larson,AFC,0x3642df8
Example 2
This is the picture for the results of the code below.
EventID 4634
TargetUserSid S-1-5-21-2795111079-3225111112-3329435632-1610
TargetUserName grant.larson
TargetDomainName AFC
TargetLogonId 0x3642df8
LogonType 3
from xml.etree import ElementTree as ET
import pandas as pd
import csv
tree = ET.parse("SecurityLog-rev2.xml")
root = tree.getroot()
ns = "{http://schemas.microsoft.com/win/2004/08/events/event}"
data = []
for eventID in root.findall(".//"):
if eventID.tag == f"{ns}System":
for e_id in eventID.iter():
if e_id.tag == f'{ns}EventID':
row = "EventID", e_id.text
data.append(row)
if eventID.tag == f"{ns}EventData":
for attr in eventID.iter():
if attr.tag == f'{ns}Data':
#print(attr.attrib)
row = attr.get('Name'), attr.text
data.append(row)
df = pd.DataFrame.from_dict(data, orient='columns')
df.to_csv('event_log.csv', index=False, header=False)
print(df)
I'm trying to get all tweets from 2018-01-01 until now from various firms.
My code works, however I do not get the tweets from the time range. Sometimes I only get the tweets from today and yesterday or from mid April up to now, but not since the beginning of 2018. I've got then the message: [!] No more data! Scraping will stop now.
ticker = []
#read in csv file with company ticker in a list
with open('C:\\Users\\veron\\Desktop\\Test.csv', newline='') as inputfile:
for row in csv.reader(inputfile):
ticker.append(row[0])
#Getting tweets for every ticker in the list
for i in ticker:
searchstring = (f"{i} since:2018-01-01")
c = twint.Config()
c.Search = searchstring
c.Lang = "en"
c.Panda = True
c.Custom["tweet"] = ["date", "username", "tweet"]
c.Store_csv = True
c.Output = f"{i}.csv"
twint.run.Search(c)
df = pd. read_csv(f"{i}.csv")
df['company'] = i
df.to_csv(f"{i}.csv", index=False)
Does anyone had the same issues and has some tip?
You need to add the configuration parameter Since separately. For example:
c.Since = "2018-01-01"
Similarly for Until:
c.Until = "2017-12-27"
The official documentation might be helpful.
Since (string) - Filter Tweets sent since date, works only with twint.run.Search (Example: 2017-12-27).
Until (string) - Filter Tweets sent until date, works only with twint.run.Search (Example: 2017-12-27).
i have a person class that have 2 method, admin sign in and log in.
def admin_sign_in(self):#this instance make a csv file of admins username and pass
info = {'user_name' : [self.username] , 'password' : [self.password]}
self.admin_df = pd.read_csv('admin_file.csv',sep = ',')
self.admin_df= pd.DataFrame(info)
c = self.admin_df.index.values[-1]
self.admin_df.loc[c+1 ,['user_name','password']]
x = self.admin_df.to_csv('admin_file.csv',header= True ,index = False , mode = 'a')
return x
but in csv file every object i make, saved with header and 0 index.
do you have any suggestion to manage it?
If you run with append mode then you should run it headeres=False.
You could create (manully) file only with headers and later append new rows without headers
import pandas as pd
username = 'james_bond'
password = '007'
info = {
'username': [username],
'password': [password],
}
df = pd.DataFrame(info)
df.to_csv('admin_file.csv', header=False, index=False, mode='a')
It does't need to read previous content and use loc[c+1, ...] to append at the end.
Eventually you could write all without headers and add headers when you read it
df = pd.read_csv('admin_file.csv', names=['username', 'password'])
But it could be better to read previous content and check if username doesn't exist.
I was playing around with the code provided here: https://www.geeksforgeeks.org/update-column-value-of-csv-in-python/ and couldn't seem to figure out how to change the value in a specific column of the row without it bringing up an error.
Say I wanted to change the status of the row belonging to the name Molly Singh, how would I go about it? I've tried the following below only to get an error and the CSV file turning out empty. I'd also prefer the solution be without the use of pandas tysm.
For example the row in the csv file will originally be
Sno Registration Number Name RollNo Status
1 11913907 Molly Singh RK19TSA01 P
What I want the outcome to be
Sno Registration Number Name RollNo Status
1 11913907 Molly Singh RK19TSA01 N
One more question if I were to alter the value in column snow by doing addition/substraction etc how would I go about that as well? Thanks!
the error I get as you can see, the name column is changed to true then false etc
import csv
op = open("AllDetails.csv", "r")
dt = csv.DictReader(op)
print(dt)
up_dt = []
for r in dt:
print(r)
row = {'Sno': r['Sno'],
'Registration Number': r['Registration Number'],
'Name'== "Molly Singh": r['Name'],
'RollNo': r['RollNo'],
'Status': 'P'}
up_dt.append(row)
print(up_dt)
op.close()
op = open("AllDetails.csv", "w", newline='')
headers = ['Sno', 'Registration Number', 'Name', 'RollNo', 'Status']
data = csv.DictWriter(op, delimiter=',', fieldnames=headers)
data.writerow(dict((heads, heads) for heads in headers))
data.writerows(up_dt)
op.close()
Issues
Your error is because the field name in the input file is misspelled as Regristation rather than Registration
Correction is to just read the names from the input file and propagate to the output file as below.
Alternately, you can your code to:
headers = ['Sno', 'Regristation Number', 'Name', 'RollNo', 'Status']
"One more question if I were to alter the value in column snow by doing addition/substraction etc how would I go about that as well"
I'm not sure what is meant by this. In the code below you would just have:
r['Sno'] = (some compute value)
Code
import csv
with open("AllDetails.csv", "r") as op:
dt = csv.DictReader(op)
headers = None
up_dt = []
for r in dt:
# get header of input file
if headers is None:
headers = r
# Change status of 'Molly Singh' record
if r['Name'] == 'Molly Singh':
r['Status'] = 'N'
up_dt.append(r)
with open("AllDetails.csv", "w", newline='') as op:
# Use headers from input file above
data = csv.DictWriter(op, delimiter=',', fieldnames=headers)
data.writerow(dict((heads, heads) for heads in headers))
data.writerows(up_dt)