Convert Excel to Yaml syntax in Python - python

I want to convert my data that is in this form to YAML Syntax (preferably without using pandas or need to install new libraries)
Sample data in excel :
users | name | uid | shell
user1 | nino | 8759 | /bin/ksh
user2 | vivo | 9650 | /bin/sh
Desired output format :
YAML Syntax output

You can do it using file operations. Since you are keen on *"preferably without using pandas or need to install new libraries
Assumption : The "|" symbol is to indicate columns and is not a delimiter or separater
Step 1
Save the excel file as CSV
Then run the code
Code
# STEP 1 : Save your excel file as CSV
ctr = 0
excel_filename = "Book1.csv"
yaml_filename = excel_filename.replace('csv', 'yaml')
users = {}
with open(excel_filename, "r") as excel_csv:
for line in excel_csv:
if ctr == 0:
ctr+=1 # Skip the coumn header
else:
# save the csv as a dictionary
user,name,uid,shell = line.replace(' ','').strip().split(',')
users[user] = {'name': name, 'uid': uid, 'shell': shell}
with open(yaml_filename, "w+") as yf :
yf.write("users: \n")
for u in users:
yf.write(f" {u} : \n")
for k,v in users[u].items():
yf.write(f" {k} : {v}\n")
Output
users:
user1 :
name : nino
uid : 8759
shell : /bin/ksh
user2 :
name : vivo
uid : 9650
shell : /bin/sh

You can do this, in your case you would just do pd.read_excel instead of pd.read_csv:
df = pd.read_csv('test.csv', sep='|')
df['user_col'] = 'users'
data = df.groupby('user_col')[['users', 'name','uid','shell']].apply(lambda x: x.set_index('users').to_dict(orient='index')).to_dict()
with open('newtree.yaml', "w") as f:
yaml.dump(data, f)
Yaml file looks like this:
users:
user1:
name: nino
shell: /bin/ksh
uid: 8759
user2:
name: vivo
shell: /bin/sh
uid: 9650

Related

Create Rows and Tables using BeautifulSoup python with xml to json convertion

Currently I'm creating a parser script that can convert xml to json, my plan is to modify my script and add creation of rows and columns when I convert it to csv file.
The current output of my script it can only creates newlines.
My Script
xml_parser = BeautifulSoup(open('SAMPLE.xml'), 'xml')
DESCRIPTION = xml_parser.DESCRIPTION
NAME = xml_parser.NAME
LOCATION = xml_parser.LOCATION
STATUS = xml_parser.STATUS
data = {
'DESCRIPTION' : DESCRIPTION.text,
'NAME' : NAME.text,
'LOCATION' : LOCATION.text,
'STATUS' : STATUS.text,
}
print(json.dumps(data).replace(",","\n"))
Output
DESCRIPTION: MAIN FLOOR
NAME: FORT-0232
LOCATION: MIDDLE
STATUS: ACTIVE
Plan Output to be
DESCRIPTION | NAME | LOCATION | STATUS |
MAIN FLOOR | FORT-0232 | MIDDLE | ACTIVE |

Read CSV with OOP Python

i'm new to OOP in Python. So, I want to read CSV files using OOP.
I have a CSV file with 5 columns separated by a comma.
I want to read that CSV file, with each column stored in a table of the new dataframe.
So, suppose I have data like these:
1434,"2021-08-13 06:31:59",unread,082196788998,kuse.hamdy#gmail.com
1433,"2021-08-13 06:09:41",unread,081554220007,ritaambarwati1#umsida.ac.id
1432,"2021-08-13 05:35:07",unread,081911075017,rifqinaufalfayyadh#gmail.com
I want the OOP code to read that CSV file and store it to a new table like these:
id date status number email
1434 2021-08-13 06:31:59 unread 089296788998 kuse.hamdy#gmail.com
1433 2021-08-13 06:09:41 unread 081554271927 ritati1#yahoo.com
1432 2021-08-13 05:35:07 unread 081911075017 rifqinaufalfayyadh#gmail.com
I tried this code:
import csv
class Complete_list:
def __init__(self, row, header, list_):
self.__dict__ = dict(zip(header, row))
self.list_ = list_
def __repr__(self):
return self.list_
data = list(csv.reader(open("complete_list.csv")))
instances = [Complete_list(a, data[1], "date_{}".format(i+1)) for i, a in enumerate(data[1:])]
instances = list(instances)
for i in instances:
j = i.list_.split(',')
print(j)
Somehow, I could not access the value of each list separated by the comma and put it into a new dataframe with multiple columns. Instead, I got the result like this:
['date_1']
['date_2']
['date_3']
To be honest , you are better of using libraies like pandas. but this is how i would approach it.
class complete_list:
def __init__(self, path, header=None):
self.data = path
self.header= header
def read(self):
with open(self.data, 'r') as f:
data = [x.split(',') for x in f.readlines()]
return data
def printer(self):
if self.header:
a,b,c,d,e = self.header
yield(f'{a:^10} {b:^15} {c:^25}{d:10}{e:^10}')
for i in self.read():
# print(i)
yield f'{i[0]:^10}| {i[1]:^10} | {i[2]:^10} | {i[3]:^10} | {i[4]:^10}'
headers= ['id', 'date', 'status', 'number', 'email']
data_frame = complete_list('yes.txt',header = headers).printer()
output
id date status number email
1434 | "2021-08-13 06:31:59" | unread | 082196788998 | kuse.hamdy#gmail.com
1433 | "2021-08-13 06:09:41" | unread | 081554220007 | ritaambarwati1#umsida.ac
1432 | "2021-08-13 05:35:07" | unread | 081911075017 | rifqinaufalfayyadh#gmail.com
The pandas library is the perfect tool for that
import pandas as pd
df = pd.read_csv("data.csv", sep=",", names=['id', 'date', 'status', 'number', 'email'])
print(df)
id date status number email
0 1434 2021-08-13 06:31:59 unread 82196788998 kuse.hamdy#gmail.com
1 1433 2021-08-13 06:09:41 unread 81554220007 ritaambarwati1#umsida.ac.id
2 1432 2021-08-13 05:35:07 unread 81911075017 rifqinaufalfayyadh#gmail.com

How to count the number of occurrences of a string in a file and append it within another file

I need to count the number of occurrences of 'Product ID' in the .txt file and have it print the number within that file. I'm new to python and trying to wrap my head around this. I have it working separately in the code, but it prints the number to the command line after running the program (hence the print). I tried using print(count) >> "hardDriveSummary.txt file" and print >> count, "hardDriveSummary.txt file" but can't get it to work.
# Read .xml file and putlines row_name and Product ID into new .txt file
search = 'row_name', 'Product ID'
#source file
with open('20190211-131516_chris_Hard_Drive_Order.xml') as f1:
#output file
with open('hardDriveSummary.txt', 'wt') as f2:
lines = f1.readlines()
for i, line in enumerate(lines):
if line.startswith(search):
f2.write("\n" + line)
#count how many occurances of 'Product ID' in .txt file
def main():
file = open('hardDriveSummary.txt', 'r').read()
team = "Product ID"
count = file.count(team)
print(count)
main()
Sample of hardDriveSummary.txt:
Name Country 1
Product ID : 600GB
Name Country 2
Product ID : 600GB
Name Country 1
Product ID : 450GB
Contents of .xml file:
************* Server Summary *************
Server serv01
label R720
asset_no CNT3NW1
Name Country 1
name.1 City1
Unnamed: 6 NaN
************* Drive Summary **************
ID : 0:1:0
State : Failed
Product ID : 600GB
Serial No. : 6SL5KF5G
************* Server Summary *************
Server serv02
label R720
asset_no BZYGT03
Name Country 2
name.1 City2
Unnamed: 6 NaN
************* Drive Summary **************
ID : 0:1:0
State : Failed
Product ID : 600GB
Serial No. : 6SL5K75G
************* Server Summary *************
Server serv03
label R720
asset_no 5GT4N51
Name Country 1
name.1 City1
Unnamed: 6 NaN
************* Drive Summary **************
ID : 0:1:0
State : Failed
Product ID : 450GB
Serial No. : 6S55K5MG
If you simply just want to tag the counter value onto the end of the file the following code should work:
import os
def main():
with open('hardDriveSummary.txt', 'ab+') as f:
term = "Product ID"
count = f.read().count(term)
f.seek(os.SEEK_END) # Because we've already read the entire file. Go to the end before writing otherwise we get an IOError
f.write('\n'+str(count))
Since Product ID is two different words, you split the entire text to two word groups, the following code will give you the expected results:
from collections import Counter
f = open(r"sample.py", "r")
words = f.read().split()
bigrams = zip(words, words[1:])
counts = Counter(bigrams)
data = {' '.join(k): v for k, v in dict(counts).items()}
if 'Product ID' in data:
print('Count of "Product ID": ', data['Product ID'])

Python: Multiple Search in same Text-File

I have a huge text-file, with data something like this :
Name : ABC
Bank : Bank1
Account-No : 01234567
Amount: 123456
Spouse : CDF
Name : ABD
Bank : Bank1
Account-No : 01234568
Amount: 12345
Spouse : BDF
Name : ABE
Bank : Bank2
Account-No : 01234569
Amount: 12344
Spouse : CDG
.
.
.
.
.
I need to fetch Account-No and Amount and then write them to the new file
Account-No: 01234567
Amount : 123456
Account-No: 01234568
Amount : 12345
Account-No: 01234569
Amount : 12344
.
.
.
I tried to search the text-file through mmap to get the position of Account-No, but I am not able
to get the next account-no through this.
import mmap
fname = input("Enter the file name")
f1 = open(fname)
s = mmap.mmap(f1.fileno(),0,access=mmap.ACCESS_READ)
if s.find(b'Account-No') != -1:
r = s.find(b'Account-No')
f1.close()
In 'r' I have the first location of the Account-No, but I am not able to search from (r+1) to get
the next Account-No.
I can put this in loop, but the exact syntax for mmap is not working for me.
Can anyone please help me in this regard through mmap or any other method.
With pandas, we can do the following:
import pandas as pd
rowsOfLines = pd.read_table('my_file.txt', header=None)
with open('output_file.txt', 'w+') as file:
for index, row in rowsOfLines.iterrows():
splitLine = row.str.split()[0]
if 'Account-No' in splitLine:
file.write('{} \n'.format(row.to_string(index=False)))
elif 'Amount:' in splitLine:
file.write('{} \n'.format(row.to_string(index=False)))
Solution for huge files:
Here is a working example that you can easily customize by adding or removing field names to the "required_fields" list.
This solution allows you to handle a massive file because the whole file in not read into memory at the same time.
import tempfile
# reproduce your input file
# for the purpose of having a
# working example
input_filename = None
with tempfile.NamedTemporaryFile(delete=False) as f_orig:
input_filename = f_orig.name
f_orig.write("""Name : ABC
Bank : Bank1
Account-No : 01234567
Amout: 123456
Spouse : CDF
Name : ABD
Bank : Bank1
Account-No : 01234568
Amout: 12345
Spouse : BDF
Name : ABE
Bank : Bank2
Account-No : 01234569
Amout: 12344
Spouse : CDG""")
# start looking from the beginning of the file again
f_orig.seek(0)
# list the fields you want to keep
required_fields = [
'Account-No',
'Amout',
]
# filter and write, line by line
result_filename = None
with tempfile.NamedTemporaryFile(delete=False) as f_result:
result_filename = f_result.name
# process one line at a time (memory efficient)
while True:
line = f_orig.readline()
#check if we have reached the end of the file
if not line:
break
for field_name in required_fields:
# write fields of interest to new file
if field_name in line:
f_result.write(line)
f_result.write('\n') # just for formatting
# show result
with open(result_filename, 'r') as f:
print(f.read())
The result of this is:
Account-No : 01234567
Amout: 123456
Account-No : 01234568
Amout: 12345
Account-No : 01234569
Amout: 12344
Code:
listOfAllAccountsAndAmounts = [] # list to save all the account and lists
searchTexts = ['Account-No','Amout'] # what all you want to search
with open('a.txt', 'r') as inFile:
allLines = inFile.readlines() # read all the lines
# save all the indexes of those that have any of the words from the searchTexts list in them
indexOfAccounts = [ i for i, line in enumerate(allLines) if any( x in line for x in searchTexts) ]
for index in indexOfAccounts:
listOfAllAccountsAndAmounts.append(allLines[index][:-1].split(': '))
print(listOfAllAccountsAndAmounts)
Output:
[['Account-No ', '01234567'], ['Amout', '123456'], ['Account-No ', '01234568'], ['Amout', '12345'], ['Account-No ', '01234569'], ['Amout', '12344']]
If you don't want to split and save as it is:
listOfAllAccountsAndAmounts.append(allLines[index])
Output:
['Account-No : 01234567\n', 'Amout: 123456\n', 'Account-No : 01234568\n', 'Amout: 12345\n', 'Account-No : 01234569\n', 'Amout: 12344\n']
I have written to a list in case you want to process the information. You can also directly simply write the string to the new file without even using a list as shown by #Arda.
Can You read the whole text file and return will be the list,so iterate over the list and search for string "Account no." and "amount" in it and write over to the another file.

Exporting filtered Excel table with Python

Python 3.4.3 | Anaconda 2.3 | Pandas
I have filtered some data from an extensive excel. I have it running for two names:
import pandas as pd
import sys
#file loc
R1 = input('Data do Relatório desejado (dd.mm) ---> ')
loc = r'C:\Users\lucas.mascia\Downloads\relatorio-{0}.xlsx'.format(R1)
######################################################################
#Solicitantes
ps_sol = ["Mauro Cavalheiro Junior", "Aline Oliveira"]
#Aplicando filtros
for name in ps_sol:
#opening file
df = pd.read_excel(loc)
dfps = df[[2,15,16,17]]
#apply filter
f1 = dfps[(dfps['Cliente']=="POSTAL SAUDE")
& (dfps['Nome do solicitante']==name)]
#print info
print ('''
=============================================================
Relatorio do dia: {}
Cliente: POSTAL SAUDE
Solicitante: {}
=============================================================
'''.format(R1, name))
print (f1)
f1.to_excel('C:/Users/lucas.mascia/Downloads/ps_sol.xlsx', sheet_name=name)
At the end I am trying to export to another .xlsx file. But it is only saving the info of the last name in the list.
I want it to save for all names that i list in my ps_sol
Help please (:

Categories