comparing two lists and searching by a field, Python

comparing two lists and searching by a field, Python - python

I have two files I wish to compare and then produce a specific output:
1) Below are the contents of the username text file (this stores the latest films viewed by the user)
Sci-Fi,Out of the Silent Planet
Sci-Fi,Solaris
Romance, When Harry met Sally
2) Below are the contents of the films.txt file which stores all the films in the program that are available to the user
0,Genre, Title, Rating, Likes
1,Sci-Fi,Out of the Silent Planet, PG,3
2,Sci-Fi,Solaris, PG,0
3,Sci-Fi,Star Trek, PG,0
4,Sci-Fi,Cosmos, PG,0
5,Drama, The English Patient, 15,0
6,Drama, Benhur, PG,0
7,Drama, The Pursuit of Happiness, 12, 0
8,Drama, The Thin Red Line, 18,0
9,Romance, When Harry met Sally, 12, 0
10,Romance, You've got mail, 12, 0
11,Romance, Last Tango in Paris, 18, 0
12,Romance, Casablanca, 12, 0
An example of the output I require: The user has currently viewed two sci-fi and one Romance film. The output therefore should SEARCH the Films text file by Genre (identifying SCI-FI and ROMANCE), and should list the films in the films.txt file which have NOT been viewed by the user yet. In this case
3,Sci-Fi,Star Trek, PG,0
4,Sci-Fi,Cosmos, PG,0
10,Romance, You've got mail, 12, 0
11,Romance, Last Tango in Paris, 18, 0
12,Romance, Casablanca, 12, 0
I have the following code which attempts to do the above, but the output it produces is incorrect:
def viewrecs(username):
#set the username variable to the text file -to use it in the next bit
username = (username + ".txt")
#open the username file that stores latest viewings
with open(username,"r") as f:
#open the csv file reader for the username file
fReader=csv.reader(f)
#for each row in the fReader
for row in fReader:
#set the genre variable to the row[0], in which row[0] is all the genres (column 1 in username file)
genre=row[0]
#next, open the films file
with open("films.txt","r") as films:
#open the csv reader for this file (filmsReader as opposed to fReader)
filmsReader=csv.reader(films)
#for each row in the films file
for row in filmsReader:
#and for each field in the row
for field in row:
#print(field)
#print(genre)
#print(field[0])
if genre in field and row[2] not in fReader:
print(row)
Output (undesired):
['1', 'Sci-Fi', 'Out of the Silent Planet', ' PG', '3']
['2', 'Sci-Fi', 'Solaris', ' PG', '0']
['3', 'Sci-Fi', 'Star Trek', ' PG', '0']
['4', 'Sci-Fi', 'Cosmos', ' PG', '0']
I don't want a re-write or new solution, but, preferably, a fix to the above solution with its logical progression ...
#gipsy - your solution appears to have nearly worked. I used:
def viewrecs(username):
#set the username variable to the text file -to use it in the next bit
username = (username + ".txt")
#open the username file that stores latest viewings
lookup_set = set()
with open(username,"r") as f:
#open the csv file reader for the username file
fReader=csv.reader(f)
#for each row in the fReader
for row in fReader:
genre = row[1]
name = row[2]
lookup_set.add('%s-%s' % (genre, name))
with open("films.txt","r") as films:
filmsReader=csv.reader(films)
#for each row in the films file
for row in filmsReader:
genre = row[1]
name = row[2]
lookup_key = '%s-%s' % (genre, name)
if lookup_key not in lookup_set:
print(row)
The output is as below: It is printing ALL the lines in allfilms that are not in the first set, rather than just the ones based on the GENRE in the first set:
['0', 'Genre', ' Title', ' Rating', ' Likes']
['3', 'Sci-Fi', 'Star Trek', ' PG', ' 0']
['4', 'Sci-Fi', 'Cosmos', ' PG', ' 0']
['5', 'Drama', ' The English Patient', ' 15', ' 0']
['6', 'Drama', ' Benhur', ' PG', ' 0']
['7', 'Drama', ' The Pursuit of Happiness', ' 12', ' 0']
['8', 'Drama', ' The Thin Red Line', ' 18', ' 0']
['10', 'Romance', " You've got mail", ' 12', ' 0']
['11', 'Romance', ' Last Tango in Paris', ' 18', ' 0']
['12', 'Romance', ' Casablanca', ' 12', ' 0']
NOTE: I changed the format of the first set to be the same, for simplicity, of the all films entries:
1,Sci-Fi,Out of the Silent Planet, PG
2,Sci-Fi,Solaris, PG

How about using sets and separate lists to filter movies in appropriate genres that were not seen? We can even abuse the dictionaries' keys and values for this purpose:
def parse_file (file):
return map(lambda x: [w.strip() for w in x.split(',')], open(file).read().split('\n'))
def movies_to_see ():
seen = {film[0]: film[1] for film in parse_file('seen.txt')}
films = parse_file('films.txt')
to_see = []
for film in films:
if film[1] in seen.keys() and film[2] not in seen.values():
to_see.append(film)
return to_see

The solution using str.split() and str.join() functions:
# change file paths with your actual ones
with open('./text_files/user.txt', 'r') as userfile:
viewed = userfile.read().split('\n')
viewed_genders = set(g.split(',')[0] for g in viewed)
with open('./text_files/films.txt', 'r') as filmsfile:
films = filmsfile.read().split('\n')
not_viewed = [f for f in films
if f.split(',')[1] in viewed_genders and ','.join(f.split(',')[1:3]) not in viewed]
print('\n'.join(not_viewed))
The output:
3,Sci-Fi,Star Trek, PG,0
4,Sci-Fi,Cosmos, PG,0
10,Romance, You've got mail, 12, 0
11,Romance, Last Tango in Paris, 18, 0
12,Romance, Casablanca, 12, 0

Okay , build a set going through the first file with Genre + name as the entry.
Now iterate over the second file and lookup in the set you made above for an entry for Genre+ name, if not exists print that out.
Once I am home I can type some code.
As promised my code for this is below:
def viewrecs(username):
#set the username variable to the text file -to use it in the next bit
username = (username + ".txt")
# In this set we will collect the unique combinations of genre and name
genre_name_lookup_set = set()
# In this set we will collect the unique genres
genre_lookup_set = set()
with open(username,"r") as f:
#open the csv file reader for the username file
fReader=csv.reader(f)
#for each row in the fReader
for row in fReader:
genre = row[0]
name = row[1]
# Add the genre name combination to this set, duplicates will be taken care automatically as set won't allow dupes
genre_name_lookup_set.add('%s-%s' % (genre, name))
# Add genre to this set
genre_lookup_set.add(genre)
with open("films.txt","r") as films:
filmsReader=csv.reader(films)
#for each row in the films file
for row in filmsReader:
genre = row[1]
name = row[2]
# Build a lookup key using genre and name, example:Sci-Fi-Solaris
lookup_key = '%s-%s' % (genre, name)
if lookup_key not in genre_name_lookup_set and genre in genre_lookup_set:
print(row)

Related

How to format list data and write to csv file in selenium python?

I'm getting data from a website and storing them inside a list of variables. Now I need to send these data to a CSV file.
The website data is printed and shown below.
The data getting from the Website
['Company Name: PATRY PLC', 'Contact Name: Jony Deff', 'Company ID: 234567', 'CS ID: 236789', 'MI/MC:', 'Road Code:']
['Mailing Address:', 'Street: 19700 I-45 Spring, TX 77373', 'City: SPRING', 'State: TX', 'Postal Code: 77388', 'Country: US']
['Physical Address:', 'Street: 1500-1798 Runyan Ave Houston, TX 77039, USA', 'City: HOUSTON', 'State: TX', 'Postal Code: 77039', 'Country: US']
['Registration Period', 'Registration Date/Time', 'Registration ID', 'Status']
['2020-2025', 'MAY-10-2020 15:54:12', '26787856889l', 'Active']
I'm using for loop to get these data using the below code:
listdata6 = []
for c6 in cells6:
listdata6.append(c6.text)
Now I have all data inside the 5 list variables. How can I write these data into CSV file like the below format?

You seem to want to have two header rows.
But I'm afraid your CSV interpreter (which seem to be MS Excel) won't be able to merge cells like you show on the screenshot.
Based on the structure of your data (five lists where keys and values are mixed) looks like you probably have to construct both headers semi-manually.
Here is the code:
company_info = ['Company Name: PATRY PLC', 'Contact Name: Jony Deff', 'Company ID: 234567', 'CS ID: 236789', 'MI/MC:', 'Road Code:']
mailaddr_info = ['Mailing Address:', 'Street: 19700 I-45 Spring, TX 77373', 'City: SPRING', 'State: TX', 'Postal Code: 77388', 'Country: US']
physaddr_info = ['Physical Address:', 'Street: 1500-1798 Runyan Ave Houston, TX 77039, USA', 'City: HOUSTON', 'State: TX', 'Postal Code: 77039', 'Country: US']
reg_data = ['Registration Period', 'Registration Date/Time', 'Registration ID', 'Status']
status_data = ['2020-2025', 'MAY-10-2020 15:54:12', '26787856889l', 'Active']
# composing 1st header's row
header1 = ''.join(',' for i in range(len(company_info))) # add commas
header1 += mailaddr_info[0].strip(':') # adds 1st item which is header of that data
header1 += ''.join(',' for i in range(1, len(mailaddr_info)))
header1 += physaddr_info[0].strip(':') # adds 1st item which is header of that data
header1 += ''.join(',' for i in range(1, len(physaddr_info)))
header1 += ''.join(',' for i in range(len(reg_data))) # add commas
# composing 2nd header's row
header2 = ','.join( item.split(':')[0].strip(' ') for item in company_info) + ','
header2 += ','.join( item.split(':')[0].strip(' ') for item in mailaddr_info[1:]) + ','
header2 += ','.join( item.split(':')[0].strip(' ') for item in physaddr_info[1:]) + ','
header2 += ','.join( item.split(':')[0].strip(' ') for item in reg_data)
# finally, the data row. Note we replace comma with empty char because some items contain comma.
# You can further elaborate by encapsulating comma-containing items with quotes "" which
# is treated as text by CSV interpreters.
data_row = ','.join( item.split(':')[-1].strip(' ') for item in company_info)
data_row += ','.join( item.split(':')[-1].strip(' ').replace(',','') for item in mailaddr_info)
data_row += ','.join( item.split(':')[-1].strip(' ').replace(',','') for item in physaddr_info)+ ','
data_row += ','.join( item for item in status_data)
# writing the data to CSV file
with open("test_file.csv", "w") as f:
f.write(header1 + '\n')
f.write(header2 + '\n')
f.write(data_row + '\n')
If I import that file using MS Excel and set 'Comma' as separator in text import wizard you will get something like that:
You can wrap it into a helper class which takes these five lists and exposes write_csv() method to the outside world.

Is it possible for a python script to check whether row exists in google sheets before writing that row?

I have a python script that searches for vehicles on a vehicle listing site and writes the results to a spreadsheet. What I want is to automate this script to run every night to get new listings, but what I don't want is to create numerous duplicates if the listing exists each day that the script is run.
So is it possible to get the script to check whether that row (potential duplicate) already exists before writing a new row?
To clarify the code I have works perfectly to print the results exactly how I want them into the google sheets document, what I am trying to do is to run a check before it prints new lines into the sheet to see if that result already exists. Is that clearer? With thanks in advance.
Here is a screenshot of an example where I might have a row already existing with the specific title, but one of the column cells may have a different value in it and I only want to update the original row with the latest/highest price value.
UPDATE:
I am trying something like this but it just seems to print everything rather than only if it doesn't already exist which is what I am trying to do.
listing = [title, img['src'], video, vin,loc, exterior_colour, interior_colour, 'N/A', mileage, gearbox, 'N/A', 'Live', auction_date,'', '£' + bid.attrs['amount'][:-3], 'The Market', '', '', '', '', year, make, model, variant]
list_of_dicts = sheet2.get_all_records()
# Convert listing into dictionary output to be read by following statement to see if listing exists in sheet before printing
i = iter(listing)
d_listing = dict(zip(i, i))
if not d_listing in list_of_dicts:
print(listing)
#print(title, img['src'], video, vin,loc, exterior_colour, interior_colour, 'N/A', mileage, gearbox, 'N/A', 'Live', auction_date,'', '£' + bid.attrs['amount'][:-3], 'The Market', '', '', '', '', year, make, model, variant)
index = 2
row = [title, img['src'], video, vin,loc, exterior_colour, interior_colour, 'N/A', mileage, gearbox, 'N/A', 'Live', auction_date,'', '£' + bid.attrs['amount'][:-3], 'The Market', '', '', '', '', year, make, model, variant]
sheet2.insert_row(row,index)
My code is:
import requests
import re
from bs4 import BeautifulSoup
import pandas
import gspread
from oauth2client.service_account import ServiceAccountCredentials
# use creds to create a client to interact with the Google Drive API
scope = ['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive']
creds = ServiceAccountCredentials.from_json_keyfile_name('creds.json', scope)
client = gspread.authorize(creds)
sheet = client.open("CAR AGGREGATOR")
sheet2 = sheet.worksheet("Auctions - Live")
url = "https://themarket.co.uk/live.xml"
get_url = requests.get(url)
get_text = get_url.text
soup = BeautifulSoup(requests.get(url).text, 'lxml')
for loc in soup.select('url > loc'):
loc = loc.text
r=requests.get(loc)
c=r.content
hoop = BeautifulSoup(c, 'html.parser')
soup = BeautifulSoup(c, 'lxml')
current_bid = soup.find('div', 'bid-step__header')
bid = soup.find('bid-display')
title = soup.find('h2').text.split()
year = title[0]
if not year:
year = ''
if any(make in 'ASTON ALFA HEALEY ROVER Arnolt Bristol Amilcar Amphicar LOREAN De Cadenet Cosworth'.split() for make in title):
make = title[1] + ' ' + title[2]
model = title[3]
try:
variant = title[4]
except:
variant = ''
else:
make = title[1]
model = title[2]
try:
variant = title[3]
if 'REIMAGINED' in variant:
variant = 'REIMAGINED BY SINGER'
if 'SINGER' in variant:
variant = 'REIMAGINED BY SINGER'
except:
variant = ''
title = year + ' ' + make + ' ' + model
img = soup.find('img')
vehicle_details = soup.find('ul', 'vehicle__overview')
try:
mileage = vehicle_details.find_all('li')[1].text.split()[2]
except:
mileage = ''
try:
vin = vehicle_details.find_all('li')[2].text.split()[2]
except:
vin = ''
try:
gearbox = vehicle_details.find_all('li')[4].text.split()[2]
except:
gearbox = 'N/A'
try:
exterior_colour = vehicle_details.find_all('li')[5].text.split()[1:]
exterior_colour = "-".join(exterior_colour)
except:
exterior_colour = 'N/A'
try:
interior_colour = vehicle_details.find_all('li')[6].text.split()[1:]
interior_colour = "-".join(interior_colour)
except:
interior_colour = 'N/A'
try:
video = soup.find('iframe')['src']
except:
video = ''
tag = soup.countdown
try:
auction_date = tag.attrs['formatted_date'].split()
auction_day = auction_date[0][:2]
auction_month = auction_date[1]
auction_year = auction_date[2]
auction_time = auction_date[3]
auction_date = auction_day + ' ' + auction_month + ' ' + auction_year + ' ' + auction_time
except:
continue
print(title, img['src'], video, vin,loc, exterior_colour, interior_colour, 'N/A', mileage, gearbox, 'N/A', 'Live', auction_date,'', '£' + bid.attrs['amount'][:-3], 'The Market', '', '', '', '', year, make, model, variant)
index = 2
row = [title, img['src'], video, vin,loc, exterior_colour, interior_colour, 'N/A', mileage, gearbox, 'N/A', 'Live', auction_date,'', '£' + bid.attrs['amount'][:-3], 'The Market', '', '', '', '', year, make, model, variant]
sheet2.insert_row(row,index)

I would load all data in two dictionaries, one representing the freshly scraped information, the other one the full information of the GoogleSheet. (To load the information from GoogleSheet, use its API, as described in Google's documentation.)
Both dictionaries, let's call them scraped and sheets, could have the titles as keys, and all the other columns as value (represented in a dictionary), so they would look like this:
{
"1928 Aston Martin V8": {
"Link": "...",
"Price": "12 $",
},
...
}
Then update the Sheets-dictionary with dict.update():
sheets.update(scraped)
and rewrite the Google Sheet with the data in sheets.
Without exactly knowing your update logic, I cannot give a more specific advice than this.

Python: Search for string from dictionaryCSV file and display matching rows

I have this program right now where it allows users to choose from a category(pulling from the file). Then it will print the University data using dictionary.
What I want to do next on my code is for users to search for a specific string from that file and it will display all of the keys. It can be the whole word or part of the string from that file.
I need help on searching for a given string or part of a string and display matching categories (NameID, StudentName, University, Phone, State).
Example:
search: on
output: (Note: that this is in dictionary format)
{'NameID': 'JSNOW', ' StudentName': ' Jon Snow', ' University': ' UofWinterfell', ' Phone': ' 324234423', ' State': 'Westeros'}
{'NameID': 'JJONS', ' StudentName': ' Joe Jonson', ' University': ' NYU', ' Phone': ' 123432333', ' State': 'New York'}
My text file looks like this:
NameID, StudentName, University, Phone, State
JJONS, Joe Jonson, NYU, 123432333, New York
SROGE, Steve Rogers, UofI, 324324423, New York
JSNOW, Jon Snow, UofWinterfell, 324234423, Westeros
DTARG, Daenerys Targaryen, Dragonstone, 345345, NULL
This is what I have so far:
import csv
def load_data(file_name):
university_data=[]
with open("file.csv", mode='r') as csv_file:
csv_reader = csv.DictReader(csv_file, skipinitialspace=True)
for col in csv_reader:
university_data.append(dict(col))
print(university_data)
return university_data
# def search_file():
# for l in data:
# no idea what to do here
def main():
filename='file.csv'
university_data = load_data(filename)
print('[1] University\n[2] Student Name\n[3] Exit\n[4] Search')
while True:
choice=input('Enter choice 1/2/3? ')
if choice=='1':
for university in university_data:
print(university['University'])
elif choice=='2':
for university in university_data:
print(university['StudentName'])
elif choice =='3':
print('Thank You')
break
elif choice =='4':
search_file()
else:
print('Invalid selection')
main()
I need choice 4 to work. I would ignore the choice 1 and 2 because they just display the names and is not in dictionary format.

You have to figure out which field you are searching by and then iterate over the list of dicts.
def search_file(field, query):
for l in data:
if l.get(field, None) == query:
return l

Convert a csv into category-subcategory using array

Above is the input table i have in csv
I am trying to use array and while loops in python. I am new to this language. Loops should occur twice to give Category\sub-category\sub-category_1 order...I am trying to use split().Ouput should be like below
import csv
with open('D:\\test.csv', 'rb') as f:
reader = csv.reader(f, delimiter='',quotechar='|')
data = []
for name in reader:
data[name] = []

And if you read the lines of your csv and access the data then you can manipulate the way you want later.
cats = {}
with open('my.csv', "r") as ins:
# check each line of the fine
for line in ins:
# remove double quotes: replace('"', '')
# remove break line : rstrip()
a = str(line).replace('"', '').rstrip().split('|')
if a[0] != 'CatNo':
cats[int(a[0])] = a[1:];
for p in cats:
print 'cat_id: %d, value: %s' % (p, cats[p])
# you can access the value by the int ID
print cats[1001]
the output:
cat_id: 100, value: ['Best Sellers', 'Best Sellers']
cat_id: 1001, value: ['New this Month', 'New Products\\New this Month']
cat_id: 10, value: ['New Products', 'New Products']
cat_id: 1003, value: ['Previous Months', 'New Products\\Previous Months']
cat_id: 110, value: ['Promotional Material', 'Promotional Material']
cat_id: 120, value: ['Discounted Products & Special Offers', 'Discounted Products & Special Offers']
cat_id: 1002, value: ['Last Month', 'New Products\\Last Month']
['New this Month', 'New Products\\New this Month']
Updated script for your question:
categories = {}
def get_parent_category(cat_id):
if len(cat_id) <= 2:
return '';
else:
return cat_id[:-1]
with open('my.csv', "r") as ins:
for line in ins:
# remove double quotes: replace('"', '')
# remove break line : rstrip()
a = str(line).replace('"', '').rstrip().split('|')
cat_id = a[0]
if cat_id != 'CatNo':
categories[cat_id] = {
'parent': get_parent_category(cat_id),
'desc': a[1],
'long_desc': a[2]
};
print 'Categories relations:'
for p in categories:
parent = categories[p]['parent']
output = categories[p]['desc']
while parent != '':
output = categories[parent]['desc'] + ' \\ ' + output
parent = categories[parent]['parent']
print '\t', output
output:
Categories relations:
New Products
New Products \ Best Sellers
New Products \ Discounted Products & Special Offers
New Products \ Best Sellers \ Previous Months
New Products \ Best Sellers \ Last Month
New Products \ Best Sellers \ New this Month

how to get values from the keys in dictionary

I have some code for car registration for parking. I have created a dictionary with car registration number as keys and rest information as values. I am trying to get details (values) of each registration by entering the registration number. Even if the id is in the dictionary it's showing message not found in the dictionary.
global variable
data_dict = {}
def createNameDict(filename):
path = "C:\Users\user\Desktop"
basename = "ParkingData_Part2.txt"
filename = path + "\\" + basename
file = open(filename)
contents = file.read()
print contents,"\n"
data_list = [lines.split(",",1) for lines in contents.split("\n")]
#data_list.sort()
#print data_list
#dict_list = []
for line in data_list:
keys = line[0]
values = line[1]
data_dict[keys] = values
print data_dict,"\n"
print data_dict.keys(),"\n"
print data_dict.values(),"\n"
print data_list
def detailForRegistrationNumber(regNumber):
regNumber == "keys"
if regNumber in data_dict:
print data_dict[regNumber]
else:
print regNumber, "Not in dictionary"
The error message I am getting is:
======= Loading Progam =======
>>> detailForRegistrationNumber('EDF768')
EDF768 Not in dictionary
But the dictionary has the above registration number:
{'HUY768': ' Wilbur Matty, 8912, Creche_Parking', 'GH7682': ' Clara Hill, 7689, AgHort_Parking', 'GEF123': ' Jill Black, 3456, Creche_Parking', 'WER546': ' Olga Grey, 9898, Creche_Parking', 'TY5678': ' Jane Miller, 8987, AgHort_Parking', 'ABC234': ' Fred Greenside, 2345, AgHort_Parking', 'KLOI98': ' Martha Miller, 4563, Vet_Parking', **'EDF768'**: ' Bill Meyer, 2456, Vet_Parking', 'JU9807': ' Jacky Blair, 7867, Vet_Parking', 'DF7800': ' Jacko Frizzle, 4532, Creche_Parking', 'ADF645': ' Cloe Freckle, 6789, Vet_Parking'}

I think your problem is that your function def createNameDict(filename): doesn't return anything, so the data_dict inside it is just a local variable!
Make the last line of the function return data_dict and then use it like data_dict = createNameDict(filename). There is no need for the global variable part, so just remove that.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

comparing two lists and searching by a field, Python - python

Related

How to format list data and write to csv file in selenium python?

Is it possible for a python script to check whether row exists in google sheets before writing that row?

Python: Search for string from dictionaryCSV file and display matching rows

Convert a csv into category-subcategory using array

how to get values from the keys in dictionary

Categories

Resources