Iteration skipping lines in a pandas dataframe

Iteration skipping lines in a pandas dataframe - python

I'm trying to iterate through a whole dataframe which is already organized.
The idea is to find when a common user has an main_user aswell, when the keys that I use in the code below have a match these users have a main_user.
The problem I have is that some line are being skipped through the iteration and I can't find the error in the code.
Here's the code I'm using:
dataframe = gf.read_excel_base(path, sheet_name)
organized_dataframe = gf.organize_dataframe(dataframe)
main_user_data = {
'Nome titular': '',
'Nome beneficiário': '',
'Id Plano de Benefícios': '',
'Id Contratado': ''
}
user_data = {
'Nome titular': '',
'Nome beneficiário': '',
'Id Plano de Benefícios': '',
'Id Contratado': ''
}
main_user_list = []
user_list = []
for i, a in enumerate(organized_dataframe['Id Contratado']):
if gf.is_main_user(organized_dataframe, i):
main_user_data = gf.user_to_dict(organized_dataframe, i)
else:
user_data = gf.user_to_dict(organized_dataframe, i)
print(user_data['Nome beneficiário'])
if (main_user_data['Nome titular'] and main_user_data['Id Plano de Benefícios'] and main_user_data['Id Contratado']) == (user_data['Nome titular'] and user_data['Id Plano de Benefícios'] and user_data['Id Contratado']):
print('deu match')
main_user_list.append(main_user_data['Nome beneficiário'])
user_list.append(user_data['Nome beneficiário'])
print(user_list)
The resulting list always stops somewhere in the middle of the dataframe, there's a lot of lines that will match the statements I've made in the code, but somehow the code does not go into them.

Related

Parse list to get new list with same structure

I applied a previous code for a log, to get the following list
log = ['',
'',
'ABC KLSC: XYZ',
'',
'some text',
'some text',
'%%ABC KLSC: XYZ',
'some text',
'',
'ID = 5',
'TME = KRE',
'DDFFLE = SOFYU',
'QWWRTYA = GRRZNY',
'',
'some text',
'-----------------------------------------------',
'',
'QUWERW WALS RUSZ CRORS ELME',
'P <NULL> R 98028',
'P <NULL> R 30310',
'',
'',
'Some text',
'',
'Some text',
'',
'--- FINISH'
]
and I want to filter those lines in order to get a list with only the lines that contains "=" and the
lines that are ordered in columns format (those below headers QUWERW, WALS, RUSZ, CRORS), but additionally, for those lines with column format, store
each value with its corresponding header.
I was able to filter the desired lines with code below (not sure here if there is a better condition to filter the lines with columns)
d1 = [line for line in log if len(line) > 50 or " = " in line]
d1
>>
[
'ID = 5',
'TME = KRE',
'DDFFLE = SOFYU',
'QWWRTYA = GRRZNY',
'QUWERW WALS RUSZ CRORS ELME',
'P <NULL> R 98028',
'P <NULL> R 30310',
]
But I don´t know how to get the output I'm looking for as follows. Thanks for any help
[
'ID = 5',
'TME = KRE',
'DDFFLE = SOFYU',
'QWWRTYA = GRRZNY',
'QUWERW = P',
'WALS = <NULL>',
'RUSZ = R',
'CRORS = 98028',
'QUWERW = P',
'WALS = <NULL>',
'RUSZ = R',
'CRORS = 30310'
]

Finding the = is straight-forward. One way to find the column values might be, as follows, to identify header rows that contain the headings, and then zipping the following rows when splitting by white-space.
items_list = []
for item in log:
if '=' in item:
items_list.append(item)
elif len(item.split()) > 3:
splits = item.split()
if all(header in splits for header in ['QUWERW', 'WALS', 'RUSZ', 'CRORS']):
headers = splits
else:
for lhs,rhs in zip(headers,splits):
items_list.append(f'{lhs} = {rhs}')
print('\n'.join(items_list))

Is it possible for a python script to check whether row exists in google sheets before writing that row?

I have a python script that searches for vehicles on a vehicle listing site and writes the results to a spreadsheet. What I want is to automate this script to run every night to get new listings, but what I don't want is to create numerous duplicates if the listing exists each day that the script is run.
So is it possible to get the script to check whether that row (potential duplicate) already exists before writing a new row?
To clarify the code I have works perfectly to print the results exactly how I want them into the google sheets document, what I am trying to do is to run a check before it prints new lines into the sheet to see if that result already exists. Is that clearer? With thanks in advance.
Here is a screenshot of an example where I might have a row already existing with the specific title, but one of the column cells may have a different value in it and I only want to update the original row with the latest/highest price value.
UPDATE:
I am trying something like this but it just seems to print everything rather than only if it doesn't already exist which is what I am trying to do.
listing = [title, img['src'], video, vin,loc, exterior_colour, interior_colour, 'N/A', mileage, gearbox, 'N/A', 'Live', auction_date,'', '£' + bid.attrs['amount'][:-3], 'The Market', '', '', '', '', year, make, model, variant]
list_of_dicts = sheet2.get_all_records()
# Convert listing into dictionary output to be read by following statement to see if listing exists in sheet before printing
i = iter(listing)
d_listing = dict(zip(i, i))
if not d_listing in list_of_dicts:
print(listing)
#print(title, img['src'], video, vin,loc, exterior_colour, interior_colour, 'N/A', mileage, gearbox, 'N/A', 'Live', auction_date,'', '£' + bid.attrs['amount'][:-3], 'The Market', '', '', '', '', year, make, model, variant)
index = 2
row = [title, img['src'], video, vin,loc, exterior_colour, interior_colour, 'N/A', mileage, gearbox, 'N/A', 'Live', auction_date,'', '£' + bid.attrs['amount'][:-3], 'The Market', '', '', '', '', year, make, model, variant]
sheet2.insert_row(row,index)
My code is:
import requests
import re
from bs4 import BeautifulSoup
import pandas
import gspread
from oauth2client.service_account import ServiceAccountCredentials
# use creds to create a client to interact with the Google Drive API
scope = ['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive']
creds = ServiceAccountCredentials.from_json_keyfile_name('creds.json', scope)
client = gspread.authorize(creds)
sheet = client.open("CAR AGGREGATOR")
sheet2 = sheet.worksheet("Auctions - Live")
url = "https://themarket.co.uk/live.xml"
get_url = requests.get(url)
get_text = get_url.text
soup = BeautifulSoup(requests.get(url).text, 'lxml')
for loc in soup.select('url > loc'):
loc = loc.text
r=requests.get(loc)
c=r.content
hoop = BeautifulSoup(c, 'html.parser')
soup = BeautifulSoup(c, 'lxml')
current_bid = soup.find('div', 'bid-step__header')
bid = soup.find('bid-display')
title = soup.find('h2').text.split()
year = title[0]
if not year:
year = ''
if any(make in 'ASTON ALFA HEALEY ROVER Arnolt Bristol Amilcar Amphicar LOREAN De Cadenet Cosworth'.split() for make in title):
make = title[1] + ' ' + title[2]
model = title[3]
try:
variant = title[4]
except:
variant = ''
else:
make = title[1]
model = title[2]
try:
variant = title[3]
if 'REIMAGINED' in variant:
variant = 'REIMAGINED BY SINGER'
if 'SINGER' in variant:
variant = 'REIMAGINED BY SINGER'
except:
variant = ''
title = year + ' ' + make + ' ' + model
img = soup.find('img')
vehicle_details = soup.find('ul', 'vehicle__overview')
try:
mileage = vehicle_details.find_all('li')[1].text.split()[2]
except:
mileage = ''
try:
vin = vehicle_details.find_all('li')[2].text.split()[2]
except:
vin = ''
try:
gearbox = vehicle_details.find_all('li')[4].text.split()[2]
except:
gearbox = 'N/A'
try:
exterior_colour = vehicle_details.find_all('li')[5].text.split()[1:]
exterior_colour = "-".join(exterior_colour)
except:
exterior_colour = 'N/A'
try:
interior_colour = vehicle_details.find_all('li')[6].text.split()[1:]
interior_colour = "-".join(interior_colour)
except:
interior_colour = 'N/A'
try:
video = soup.find('iframe')['src']
except:
video = ''
tag = soup.countdown
try:
auction_date = tag.attrs['formatted_date'].split()
auction_day = auction_date[0][:2]
auction_month = auction_date[1]
auction_year = auction_date[2]
auction_time = auction_date[3]
auction_date = auction_day + ' ' + auction_month + ' ' + auction_year + ' ' + auction_time
except:
continue
print(title, img['src'], video, vin,loc, exterior_colour, interior_colour, 'N/A', mileage, gearbox, 'N/A', 'Live', auction_date,'', '£' + bid.attrs['amount'][:-3], 'The Market', '', '', '', '', year, make, model, variant)
index = 2
row = [title, img['src'], video, vin,loc, exterior_colour, interior_colour, 'N/A', mileage, gearbox, 'N/A', 'Live', auction_date,'', '£' + bid.attrs['amount'][:-3], 'The Market', '', '', '', '', year, make, model, variant]
sheet2.insert_row(row,index)

I would load all data in two dictionaries, one representing the freshly scraped information, the other one the full information of the GoogleSheet. (To load the information from GoogleSheet, use its API, as described in Google's documentation.)
Both dictionaries, let's call them scraped and sheets, could have the titles as keys, and all the other columns as value (represented in a dictionary), so they would look like this:
{
"1928 Aston Martin V8": {
"Link": "...",
"Price": "12 $",
},
...
}
Then update the Sheets-dictionary with dict.update():
sheets.update(scraped)
and rewrite the Google Sheet with the data in sheets.
Without exactly knowing your update logic, I cannot give a more specific advice than this.

Check for a build up name if it exist in a list of names -python

I'm trying to build a cvs file that has all Active directory fields I need, from 2 external files.
the first file has a list of users that need to be created and some other info relevant to an AD object
and the second report is a list of exported SamAccountName and emails dumped from AD. So what I want to do is create a unique SamAccountName, I form my test SamaAccountName from the firstname and lastname of the first report and wan to compare it vs the second report. I'm currently storing ins alist all the data I get from the second report and I want to check my generated SamAccountName exists in that list
so far I'm not able to do so and only get a csv with the SamAccoutnNames I made up( it does not do the check )
note: I can't use any other plugin to check directly to Active Directory
import csv
def getSamA(fname, lname):
Sams = []
sama = lname[0:5] + fname[0:2]
with open('test-input.txt','r') as AD:
rows = csv.DictReader(AD)
for ad in rows:
Sams.append(ad['SamAccountName'])
#check if built sama is in list
if sama in Sams:
#if sama in list, add one more character to sama
sama = lname[0:5] + fname[0:3]
return sama.lower()
else:
return sama.lower()
with open('users.csv') as csv_file:
rows = csv.DictReader(csv_file)
with open('users-COI2-Multi.csv', 'w', newline='') as output:
header = ['FirstName','Initials','LastName','DisplayName','Description','Office','TelePhone','UserLogonName','SamAccountName','JobTitle','Department','Manager','Mobile','faxNumber','Notes','Assistant','employeeID','ex1','ex2','ex3','ex15','Office365License','ExpiresInDays','EmailToUSer','AddToGroup']
output_file = csv.DictWriter(output, fieldnames=header, delimiter=';')
output_file.writeheader()
for data in rows:
employeeId = data['Associate ID']
fName = data['First Name']
lName = data['Last Name']
Location = data['Location']
Department = data['Department']
Manager = data['Manager Name']
JobTitle = data['Title']
context = {
'FirstName' : fName,
'Initials' : getInitials(fName, lName),
'LastName' : lName,
'DisplayName' : getDisplayName(fName, lName),
'Description' : 'Account for: '+getDisplayName(fName, lName),
'Office': getOffice(Location).strip(),
'TelePhone' : '+1 XXX XXX XXXX',
'UserLogonName' : getMail(fName, lName),
'SamAccountName' : getSamA(fName, lName),
'JobTitle' : JobTitle,
'Department' : Department,
'Manager' : Manager,
'Mobile' : '',
'faxNumber' : '',
'Notes' : '',
'Assistant' : '',
'employeeID' : employeeId,
'ex1' : 'End User',
'ex2' : 'NoMailbox',
'ex3' : getSiteCode(Location),
'ex15' : getSKID(Location),
'Office365License' : '',
'ExpiresInDays' : '',
'EmailToUSer' : 'user#test.com',
'AddToGroup' : '',
}
output_file.writerow(context)

Converting dictionary values into a string object with a specific order

I am looking to create a function that will turn a dictionary with address values into a string value with a specific order. I also need to account for missing values (Some address wont have a second or third address line. I want my output to look like the below so that I can copy the text block, separated by a new line, into a database field.
name
contact
addr1
addr2 (if not empty)
addr3 (if not empty)
city, state zip
phone
I have the following to create the dictionary, but I am stuck on creating the string object that ignores the empty values and puts everything in the correct order.
def setShippingAddr(name, contact, addr1, addr2, addr3, city, state, zipCode, phone):
addDict = {'name': name, 'contact': contact, 'addr1': addr1,
'city': city, 'state': state, 'zip': zipCode, 'phone': phone}
if addr2 is True: # append dict if addr2/addr 3 are True
addDict['addr2'] = addr2
if addr3 is True:
addDict['addr3'] = addr3
shAddr = # This is where i need to create the string object
return shAddr

I would rewrite the function to only return the string, the dictionary is not necessary:
def setShippingAddr(name, contact, addr1, city, state, zipCode, phone, addr2=None, addr3=None):
shAddr = f'{name}\n{contact}\n{addr1}'
shAddr = f'{shAddr}\n{addr2}' if addr2 else shAddr
shAddr = f'{shAddr}\n{addr3}' if addr3 else shAddr
shAddr = f'{shAddr}\n{city}, {state} {zipCode}\n{phone}'
return shAddr

Considering that you may want to add new entries to the dictionary
def setShippingAddr(name, contact, addr1, addr2, addr3, city, state, zipCode, phone):
addDict = {'name': name, 'contact': contact, 'addr1': addr1,
'city': city, 'state': state, 'zip': zipCode, 'phone': phone}
if addr2 is True: # append dict if addr2/addr 3 are True
addDict['addr2'] = addr2
if addr3 is True:
addDict['addr3'] = addr3
shAddr = ''
for key in addDict:
shAddr += addDict[key] + '\n'
return shAddr

It looks like (assuming you're using python3), an f string would work here.
shAddr = f"{addDict['name']} {addDict['contract'] etc..."
You can add logic within the {}, so something like
{addDict['addr2'] if addDict['addr2'] else ""}
should work, depending on what the specific output you were looking for was.

I'm not sure I understand the part with the dictionary. You could just leave it out, right?
Then
def setShippingAddr(*args):
return "\n".join([str(arg) for arg in args if arg])
s = setShippingAddr("Delenges", "Me", "Streetstreet", "Borrough", False,
"Town of City", "Landcountry", 12353, "+1 555 4545454")
print(s)
prints
Delenges
Me
Streetstreet
Borrough
Town of City
Landcountry
12353
+1 555 4545454

Here's a more pythonic solution,
def dict_to_string(dic):
s = ''
for k, v in dic.items():
s += "{} : {}\n".format(k, v)
return s
addDict = {'name': 'name', 'contact': 'contact', 'addr1': 'addr1', 'addr2': '',
'city': 'city', 'state': 'state', 'zip': 'zipCode', 'phone': 'phone'}
print(dict_to_string(addDict))
In this case, I've used addr2, which has a blank value. If you want, addr2 to be omitted completely, then check for the value while iterating.
def dict_to_string(dic):
s = ''
for k, v in dic.items():
if k:
s += "{} : {}\n".format(k, v)
return s
addDict = {'name': 'name', 'contact': 'contact', 'addr1': 'addr1', 'addr2': '',
'city': 'city', 'state': 'state', 'zip': 'zipCode', 'phone': 'phone'}
print(dict_to_string(addDict))
Finally if the natural order of iterating is not what you want you can use the OrderedDict

Failing to append to dictionary. Python

I am experiencing a strange faulty behaviour, where a dictionary is only appended once and I can not add more key value pairs to it.
My code reads in a multi-line string and extracts substrings via split(), to be added to a dictionary. I make use of conditional statements. Strangely only the key:value pairs under the first conditional statement are added.
Therefore I can not complete the dictionary.
How can I solve this issue?
Minimal code:
#I hope the '\n' is sufficient or use '\r\n'
example = "Name: Bugs Bunny\nDOB: 01/04/1900\nAddress: 111 Jokes Drive, Hollywood Hills, CA 11111, United States"
def format(data):
dic = {}
for line in data.splitlines():
#print('Line:', line)
if ':' in line:
info = line.split(': ', 1)[1].rstrip() #does not work with files
#print('Info: ', info)
if ' Name:' in info: #middle name problems! /maiden name
dic['F_NAME'] = info.split(' ', 1)[0].rstrip()
dic['L_NAME'] = info.split(' ', 1)[1].rstrip()
elif 'DOB' in info: #overhang
dic['DD'] = info.split('/', 2)[0].rstrip()
dic['MM'] = info.split('/', 2)[1].rstrip()
dic['YY'] = info.split('/', 2)[2].rstrip()
elif 'Address' in info:
dic['STREET'] = info.split(', ', 2)[0].rstrip()
dic['CITY'] = info.split(', ', 2)[1].rstrip()
dic['ZIP'] = info.split(', ', 2)[2].rstrip()
return dic
if __name__ == '__main__':
x = format(example)
for v, k in x.iteritems():
print v, k

Your code doesn't work, at all. You split off the name before the colon and discard it, looking only at the value after the colon, stored in info. That value never contains the names you are looking for; Name, DOB and Address all are part of the line before the :.
Python lets you assign to multiple names at once; make use of this when splitting:
def format(data):
dic = {}
for line in data.splitlines():
if ':' not in line:
continue
name, _, value = line.partition(':')
name = name.strip()
if name == 'Name':
dic['F_NAME'], dic['L_NAME'] = value.split(None, 1) # strips whitespace for us
elif name == 'DOB':
dic['DD'], dic['MM'], dic['YY'] = (v.strip() for v in value.split('/', 2))
elif name == 'Address':
dic['STREET'], dic['CITY'], dic['ZIP'] = (v.strip() for v in value.split(', ', 2))
return dic
I used str.partition() here rather than limit str.split() to just one split; it is slightly faster that way.
For your sample input this produces:
>>> format(example)
{'CITY': 'Hollywood Hills', 'ZIP': 'CA 11111, United States', 'L_NAME': 'Bunny', 'F_NAME': 'Bugs', 'YY': '1900', 'MM': '04', 'STREET': '111 Jokes Drive', 'DD': '01'}
>>> from pprint import pprint
>>> pprint(format(example))
{'CITY': 'Hollywood Hills',
'DD': '01',
'F_NAME': 'Bugs',
'L_NAME': 'Bunny',
'MM': '04',
'STREET': '111 Jokes Drive',
'YY': '1900',
'ZIP': 'CA 11111, United States'}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Iteration skipping lines in a pandas dataframe - python

Related

Parse list to get new list with same structure

Is it possible for a python script to check whether row exists in google sheets before writing that row?

Check for a build up name if it exist in a list of names -python

Converting dictionary values into a string object with a specific order

Failing to append to dictionary. Python

Categories

Resources