I am trying to iterate through a little array and create urls out of the information contained therein.
Here's a dummy setup of what I'm trying to do:
import urllib.parse
DEEP_CAT_URL = 'https://google.com/search?q=%s'
def GetCatLink(cats=()):
for cat in cats:
cat_id = cat['name']
color = cat.get('color')
if color:
deep_cat_url = DEEP_CAT_URL % urllib.parse.quote(color+'+cat',safe='+')
return (deep_cat_url)
CATS = [
{
'color': 'red',
'name': 'Redd Foxx',
},
{
'color': 'black',
'name': 'Donnie Darko',
},
]
print("There are "+str(len(CATS))+" cats to consider")
for h in range(len(CATS)):
cnum=str(h+1)
print("Cat #"+cnum+":")
if CATS[h]["color"] != '':
print("The name of Cat #{} is {}. The color of {} is {}.".format(cnum,CATS[h]["name"],CATS[h]["name"], CATS[h]["color"]))
x=CATS[h]
print(x)
print(GetCatLink(CATS))
It kind of works, but outputs:
There are 2 cats to consider
Cat #1:
The name of Cat #1 is Redd Foxx. The color of Redd Foxx is red.
{'color': 'red', 'name': 'Redd Foxx'}
https://google.com/search?q=red+cat
Cat #2:
The name of Cat #2 is Donnie Darko. The color of Donnie Darko is black.
{'color': 'black', 'name': 'Donnie Darko'}
https://google.com/search?q=red+cat
The goal here is to have two urls:
Cat#1 https://google.com/search?q=red+cat
Cat#2 https://google.com/search?q=black+cat
You're looping over all cats in GetCatLink and on the first iteration, you return a URL - so you always get the URL for the first cat in CATS. There's no reason to do any looping over there, why not pass the cat you want a URL for to the function and just process that one?
Instead of:
def GetCatLink(cats=()):
for cat in cats:
cat_id = cat['name']
color = cat.get('color')
if color:
deep_cat_url = DEEP_CAT_URL % urllib.parse.quote(color+'+cat',safe='+')
return (deep_cat_url)
Use:
def get_cat_link(cat):
cat_id = cat['name']
color = cat.get('color')
return DEEP_CAT_URL % urllib.parse.quote(color + '+cat', safe='+')
And instead of this:
print(GetCatLink(CATS))
This:
print(get_cat_link(x))
There's more to be said about your code - there's some surplus stuff in there, but this addresses your main issue.
(As #JonClements correctly comments: you will run into trouble if you try make requests with the URLs you're creating, unless you use them in a normal browser. Google won't give you the result you expect if you just use urllib or requests without getting "tricksy".)
The function GetCatLink and its invocation needs to be modified. Then everything works well.
def GetCatLink(cat):
color = cat.get('color')
if color:
return DEEP_CAT_URL % urllib.parse.quote(color + '+cat', safe='+')
for h in range(len(CATS)):
cnum = str(h + 1)
print("Cat #" + cnum + ":")
if CATS[h]["color"] != '':
print("The name of Cat #{} is {}. The color of {} is {}.".format(cnum, CATS[h]["name"], CATS[h]["name"],
CATS[h]["color"]))
x = CATS[h]
print(x)
print(GetCatLink(CATS[h]))
Related
Im a beginner and please help, what exactly i need to use to next one:
i want to use dictionary data with name & color from init_dict to show_dict and print it all
from printing_functions_1 import init_dict as pr
from printing_functions_1 import show_dict as sd
pr('DArya', 'Total Black', another_color = 'purple', lovely_film = 'mystic')
sd(another_color = 'purple', lovely_film = 'mystic')
def init_dict(name, color, **argv):
argv['Name'] = name
argv['Color'] = color
def show_dict(**argv):
for key, value in argv.items():
print(key, value)
expect somethinglike this from output with show_dict:
another_color purple
lovely_film mystic
Name DArya
Color Total Black
When a match of replacements.csv > Link Changed > 'Yes' is found, I want to carry out the following:
match column replacements.csv > Fruit to main.csv > External Links
replace matching fruits found in main.csv > External Links with replacements.csv > Fruit Link
To demonstrate, I need the required output to be shown as below:
replacements.csv
Fruit,Fruit Link,Link Changed
banana,https://en.wikipedia.org/wiki/Banana,
blueberry,https://en.wikipedia.org/wiki/Blueberry,
strawberry,https://en.wikipedia.org/wiki/Strawberry,Yes
raspberry,https://en.wikipedia.org/wiki/Raspberry,Yes
cherry,https://en.wikipedia.org/wiki/Cherry,
apple,https://en.wikipedia.org/wiki/Apple,Yes
main.csv
Title,External Links
Smoothie Recipes,"['banana', 'blueberry', 'strawberry', 'raspberry', 'apple']"
Fruit Pies,"['cherry', 'apple']"
required output
Title,External Links
Smoothie Recipes,"['banana', 'blueberry', 'https://en.wikipedia.org/wiki/Strawberry', 'https://en.wikipedia.org/wiki/Raspberry', 'https://en.wikipedia.org/wiki/Apple']"
Fruit Pies,"['cherry', 'https://en.wikipedia.org/wiki/Apple']"
Code
import pandas as pd
replacements = pd.read_csv('replacements.csv')
main = pd.read_csv('main.csv')
all_scrapes = []
fruits_found = []
## Replace main.csv > External Links when replacements.csv > Link Changed = Yes
def swap_urls(fruit, fruit_link):
counter = 0
while counter < len(main):
title = main['Title'][counter]
external_links = main['External Links'][counter]
fruit_count = len(external_links.split(","))
fruit_item_row = main['External Links'][counter].replace("'","").replace("[","").replace("]","").replace(" ","") # [0] represents main.csv row
items = 0
while items < fruit_count:
single_fruit_list = fruit_item_row.split(',')[items]
if fruit in single_fruit_list:
print('Current Fruit Item:', single_fruit_list)
external_links = external_links.replace(fruit, fruit_link)
#fruits_found.append(fruit)
product = {
'Title': title,
'External Link': external_links,
#'Fruits Found': fruits_found,
}
print(' Product:', product)
all_scrapes.append(product)
else:
pass
items +=1
counter +=1
return all_scrapes
## Pass Fruit & Fruit Link values to function swap_urls when replacements.csv > Link Changed = Yes
y = 0
while y < len(replacements):
fruit = replacements['Fruit'][y]
fruit_link = replacements['Fruit Link'][y]
link_changed = replacements['Link Changed'][y]
if replacements['Link Changed'][y] == 'Yes':
print(f'replacement.csv row [{y}]: {fruit}, Fruit Link: {fruit_link}, Link Changed: \x1b[92m{link_changed}\x1b[0m')
swap_urls(fruit, fruit_link)
else:
print(f'replacement.csv row [{y}]: {fruit}, Fruit Link: {fruit_link}, Link Changed: No')
y +=1
## Save results to File
df = pd.DataFrame(all_scrapes)
print('DF:\n', df)
df.to_excel('Result.xlsx', index=False)
Issue
I'm able to identify the fruits in replacements.csv with their counterparts in main.csv, however I'm unable to update main.csv > External Links as a single entry when multiple fruits are found. See generated output file results.xlsx
Any help would be much appreciated.
Here is a relatively simple way to do this:
r = pd.read_csv('replacements.csv')
df = pd.read_csv('main.csv')
# make a proper list from the strings in 'External Links':
import ast
df['External Links'] = df['External Links'].apply(ast.literal_eval)
# make a dict for mapping
dct = r.dropna(subset=['Link Changed']).set_index('Fruit')['Fruit Link'].to_dict()
>>> dct
{'strawberry': 'https://en.wikipedia.org/wiki/Strawberry',
'raspberry': 'https://en.wikipedia.org/wiki/Raspberry',
'apple': 'https://en.wikipedia.org/wiki/Apple'}
# map, leaving the key by default
df['External Links'] = (
df['External Links'].explode().map(lambda k: dct.get(k, k))
.groupby(level=0).apply(pd.Series.tolist)
)
# result
>>> df
Title External Links
0 Smoothie Recipes [banana, blueberry, https://en.wikipedia.org/w...
1 Fruit Pies [cherry, https://en.wikipedia.org/wiki/Apple]
# result, as csv (to show quotation marks etc.)
>>> df.to_csv(index=False)
Title,External Links
Smoothie Recipes,"['banana', 'blueberry', 'https://en.wikipedia.org/wiki/Strawberry', 'https://en.wikipedia.org/wiki/Raspberry', 'https://en.wikipedia.org/wiki/Apple']"
Fruit Pies,"['cherry', 'https://en.wikipedia.org/wiki/Apple']"
import pandas as pd
replacements = pd.read_csv("replacements.csv")
main = pd.read_csv("main.csv")
# returns replacement link or fruit
def fruit_link(x):
if x not in (replacements['Fruit'].values):
return x
return replacements.loc[replacements['Fruit'] == x, 'Fruit Link'].values[0]\
if replacements.loc[replacements['Fruit'] == x, 'Link Changed'].values == 'Yes' else x
# split string of list to list
main["External Links"] = main["External Links"].apply(lambda x: x[1:-1].split(', '))
# explode main to fruits
main = main.explode("External Links")
# remove quotes from fruit names
main["External Links"] = main["External Links"].apply(lambda x: x[1:-1])
# applying fruit_link to retrieve link or fruit
main["External Links"] = main["External Links"].apply(fruit_link)
# implode back
main = main.groupby('Title').agg({'External Links': lambda x: x.tolist()}).reset_index()
OUTPUT:
Title External Links
0 Fruit Pies [cherry, https://en.wikipedia.org/wiki/Apple]
1 Smoothie Recipes [grape, banana, blueberry, https://en.wikipedia.org/wiki/Strawberry, https://en.wikipedia.org/wiki/Raspberry, https://en.wikipedia.org/wiki/Apple, plum]
I have a long string, and I've extracted the substrings I wanted. I am looking for a method which uses less lines of code to get my output. I'm after all the sub strings which start with CN=., and removing everything else up-to the semi-colon..
example list output (see picture)
The script I'm currently using is below
import re
import fnmatch
import os
# System call
os.system("")
# Class of different styles
class style():
BLACK = '\033[30m'
RED = '\033[31m'
GREEN = '\033[32m'
YELLOW = '\033[33m'
BLUE = '\033[34m'
MAGENTA = '\033[35m'
CYAN = '\033[36m'
WHITE = '\033[37m'
UNDERLINE = '\033[4m'
RESET = '\033[0m'
CNString = "CN=User2,OU=blurb,OU=Test,DC=Test,DC=Testal;CN=User4,OU=blurb,OU=Test,DC=Test,DC=Testal;CN=User56,OU=blurb,OU=Test,DC=Test,DC=Testal;CN=User9,OU=blurb,OU=Test,DC=Test,DC=Testal;CN=Jane45 user,OU=blurb,OU=Test,DC=Test,DC=Testal;CN=User-Donna,OU=blurb,OU=Test,DC=Test,DC=Testal;CN=User76 smith,OU=blurb,OU=Test4,DC=Test,DC=Testal;CN=Pink Panther,OU=blurb,OU=Test,DC=Testing,DC=Testal;CN=Testuser78,OU=blurb,OU=Tester,DC=Test,DC=Testal;CN=great Scott,OU=blurb,OU=Test,DC=Test,DC=Local;CN=Leah Human,OU=blurb,OU=Test,DC=Test,DC=Testal;CN=Alan Desai,OU=blurb,OU=Test,DC=Test,DC=Testal;CN=Duff Beer,OU=Groups,OU=Test,DC=Test,DC=Testal;CN=Jane Doe,OU=Users,OU=Test76,DC=Test,DC=Testal;CN=simple user67,OU=Users,OU=Test,DC=Test,DC=Testal;CN=test O'Lord,OU=Users,OU=Test,DC=Concero,DC=Testal"
newstring1 = CNString.replace(';','];')
print(newstring1)
newstring2 = newstring1.replace(',OU=',',[OU=')
print(newstring2)
newstring3 = newstring2.replace(',[OU','],[OU')
print(newstring3)
newstring4 = newstring3.replace('],[OU',',[OU')
print(newstring4)
newstring5 = newstring4.replace('];',']];')
print(newstring5)
endstring = "]]"
newstring6 = newstring5 + endstring
print(newstring6)
newstring7 = re.sub("\[.*?\]","()",newstring6)
print(newstring7)
print(style.YELLOW + "Line Break")
newstring8 = newstring7.replace(',()]','')
print(style.RESET + newstring8)
newstring9 = newstring8.split(';')
for cnname in newstring9:
print(style.GREEN + cnname)
Not sure why your code is juggling with those square brackets. Wouldn't this do it?
names = re.findall(r"\bCN=[^,;]*", CNString)
cn_list = [elem.split(",")[0] for elem in CNString.split(";") if elem.startswith("CN=")]
If I print cn_list I obtain:
['CN=User2', 'CN=User4', 'CN=User56', 'CN=User9', 'CN=Jane45 user', 'CN=User-Donna', 'CN=User76 smith', 'CN=Pink Panther', 'CN=Testuser78', 'CN=great Scott', 'CN=Leah Human', 'CN=Alan Desai', 'CN=Duff Beer', 'CN=Jane Doe', 'CN=simple user67', "CN=test O'Lord"]
I made the following code which works but I want to improve it. I don't want to re-read the file, but if I delete sales_input.seek(0) it won't iterate throw each row in sales. How can i improve this?
def computeCritics(mode, cleaned_sales_input = "data/cleaned_sales.csv"):
if mode == 1:
print "creating customer.critics.recommendations"
critics_output = open("data/customer/customer.critics.recommendations",
"wb")
ID = getCustomerSet(cleaned_sales_input)
sales_dict = pickle.load(open("data/customer/books.dict.recommendations",
"r"))
else:
print "creating books.critics.recommendations"
critics_output = open("data/books/books.critics.recommendations",
"wb")
ID = getBookSet(cleaned_sales_input)
sales_dict = pickle.load(open("data/books/users.dict.recommendations",
"r"))
critics = {}
# make critics dict and pickle it
for i in ID:
with open(cleaned_sales_input, 'rb') as sales_input:
sales = csv.reader(sales_input) # read new
for j in sales:
if mode == 1:
if int(i) == int(j[2]):
sales_dict[int(j[6])] = 1
else:
if int(i) == int(j[6]):
sales_dict[int(j[2])] = 1
critics[int(i)] = sales_dict
pickle.dump(critics, critics_output)
print "done"
cleaned_sales_input looks like
6042772,2723,3546414,9782072488887,1,9.99,314968
6042769,2723,3546414,9782072488887,1,9.99,314968
...
where number 6 is the book ID and number 0 is the customer ID
I want to get a dict wich looks like
critics = {
CustomerID1: {
BookID1: 1,
BookID2: 0,
........
BookIDX: 0
},
CustomerID2: {
BookID1: 0,
BookID2: 1,
...
}
}
or
critics = {
BookID1: {
CustomerID1: 1,
CustomerID2: 0,
........
CustomerIDX: 0
},
BookID1: {
CustomerID1: 0,
CustomerID2: 1,
...
CustomerIDX: 0
}
}
I hope this isn't to much information
Here are some suggestions:
Let's first look at this code pattern:
for i in ID:
for j in sales:
if int(i) == int(j[2])
notice that i is only being compared with j[2]. That's its only purpose in the loop. int(i) == int(j[2]) can only be True at most once for each i.
So, we can completely remove the for i in ID loop by rewriting it as
for j in sales:
key = j[2]
if key in ID:
Based on the function names getCustomerSet and getBookSet, it sounds as if
ID is a set (as opposed to a list or tuple). We want ID to be a set since
testing membership in a set is O(1) (as opposed to O(n) for a list or tuple).
Next, consider this line:
critics[int(i)] = sales_dict
There is a potential pitfall here. This line is assigning sales_dict to
critics[int(i)] for each i in ID. Each key int(i) is being mapped to the very same dict. As we loop through sales and ID, we are modifying sales_dict like this, for example:
sales_dict[int(j[6])] = 1
But this will cause all values in critics to be modified simultaneously, since all keys in critics point to the same dict, sales_dict. I doubt that is what you want.
To avoid this pitfall, we need to make copies of the sales_dict:
critics = {i:sales_dict.copy() for i in ID}
def computeCritics(mode, cleaned_sales_input="data/cleaned_sales.csv"):
if mode == 1:
filename = 'customer.critics.recommendations'
path = os.path.join("data/customer", filename)
ID = getCustomerSet(cleaned_sales_input)
sales_dict = pickle.load(
open("data/customer/books.dict.recommendations", "r"))
key_idx, other_idx = 2, 6
else:
filename = 'books.critics.recommendations'
path = os.path.join("data/books", filename)
ID = getBookSet(cleaned_sales_input)
sales_dict = pickle.load(
open("data/books/users.dict.recommendations", "r"))
key_idx, other_idx = 6, 2
print "creating {}".format(filename)
ID = {int(item) for item in ID}
critics = {i:sales_dict.copy() for i in ID}
with open(path, "wb") as critics_output:
# make critics dict and pickle it
with open(cleaned_sales_input, 'rb') as sales_input:
sales = csv.reader(sales_input) # read new
for j in sales:
key = int(j[key_idx])
if key in ID:
other_key = int(j[other_idx])
critics[key][other_key] = 1
critics[key] = sales_dict
pickle.dump(dict(critics), critics_output)
print "done"
#unutbu's answer is better but if you are stuck with this structure you can put the whole file in memory:
sales = []
with open(cleaned_sales_input, 'rb') as sales_input:
sales_reader = csv.reader(sales_input)
[sales.append(line) for line in sales_reader]
for i in ID:
for j in sales:
#do stuff
I'm trying to implement colour cycling on my text in Python, ie i want it to cycle through the colour of every character typed (amongst other effects) My progress so far has been hacked together from an ansi colour recipe improvement suggestions welcomed.
I was also vaguely aware of, but never used: termcolor, colorama, curses
during the hack i managed to make the attributes not work (ie reverse blink etc) and its not perfect probably mainly because I dont understand these lines properly:
cmd.append(format % (colours[tmpword]+fgoffset))
c=format % attrs[tmpword] if tmpword in attrs else None
if anyone can clarify that a bit, I would appreciate it. this runs and does something, but its not quite there. I changed the code so instead of having to separate colour commands from your string you can include them.
#!/usr/bin/env python
'''
"arg" is a string or None
if "arg" is None : the terminal is reset to his default values.
if "arg" is a string it must contain "sep" separated values.
if args are found in globals "attrs" or "colors", or start with "#" \
they are interpreted as ANSI commands else they are output as text.
#* commands:
#x;y : go to xy
# : go to 1;1
## : clear screen and go to 1;1
#[colour] : set foreground colour
^[colour] : set background colour
examples:
echo('#red') : set red as the foreground color
echo('#red ^blue') : red on blue
echo('#red #blink') : blinking red
echo() : restore terminal default values
echo('#reverse') : swap default colors
echo('^cyan #blue reverse') : blue on cyan <=> echo('blue cyan)
echo('#red #reverse') : a way to set up the background only
echo('#red #reverse #blink') : you can specify any combinaison of \
attributes in any order with or without colors
echo('#blink Python') : output a blinking 'Python'
echo('## hello') : clear the screen and print 'hello' at 1;1
colours:
{'blue': 4, 'grey': 0, 'yellow': 3, 'green': 2, 'cyan': 6, 'magenta': 5, 'white': 7, 'red': 1}
'''
'''
Set ANSI Terminal Color and Attributes.
'''
from sys import stdout
import random
import sys
import time
esc = '%s['%chr(27)
reset = '%s0m'%esc
format = '1;%dm'
fgoffset, bgoffset = 30, 40
for k, v in dict(
attrs = 'none bold faint italic underline blink fast reverse concealed',
colours = 'grey red green yellow blue magenta cyan white'
).items(): globals()[k]=dict((s,i) for i,s in enumerate(v.split()))
bpoints = ( " [*] ", " [!] ", )
def echo(arg=None, sep=' ', end='\n', rndcase=True, txtspeed=0.03, bnum=0):
cmd, txt = [reset], []
if arg:
if bnum != 0:
sys.stdout.write(bpoints[bnum-1])
# split the line up into 'sep' seperated values - arglist
arglist=arg.split(sep)
# cycle through arglist - word seperated list
for word in arglist:
if word.startswith('#'):
### First check for a colour command next if deals with position ###
# go through each fg and bg colour
tmpword = word[1:]
if tmpword in colours:
cmd.append(format % (colours[tmpword]+fgoffset))
c=format % attrs[tmpword] if tmpword in attrs else None
if c and c not in cmd:
cmd.append(c)
stdout.write(esc.join(cmd))
continue
# positioning (starts with #)
word=word[1:]
if word=='#':
cmd.append('2J')
cmd.append('H')
stdout.write(esc.join(cmd))
continue
else:
cmd.append('%sH'%word)
stdout.write(esc.join(cmd))
continue
if word.startswith('^'):
### First check for a colour command next if deals with position ###
# go through each fg and bg colour
tmpword = word[1:]
if tmpword in colours:
cmd.append(format % (colours[tmpword]+bgoffset))
c=format % attrs[tmpword] if tmpword in attrs else None
if c and c not in cmd:
cmd.append(c)
stdout.write(esc.join(cmd))
continue
else:
for x in word:
if rndcase:
# thankyou mark!
if random.randint(0,1):
x = x.upper()
else:
x = x.lower()
stdout.write(x)
stdout.flush()
time.sleep(txtspeed)
stdout.write(' ')
time.sleep(txtspeed)
if txt and end: txt[-1]+=end
stdout.write(esc.join(cmd)+sep.join(txt))
if __name__ == '__main__':
echo('##') # clear screen
#echo('#reverse') # attrs are ahem not working
print 'default colors at 1;1 on a cleared screen'
echo('#red hello this is red')
echo('#blue this is blue #red i can ^blue change #yellow blah #cyan the colours in ^default the text string')
print
echo()
echo('default')
echo('#cyan ^blue cyan blue')
print
echo()
echo('#cyan this text has a bullet point',bnum=1)
print
echo('#yellow this yellow text has another bullet point',bnum=2)
print
echo('#blue this blue text has a bullet point and no random case',bnum=1,rndcase=False)
print
echo('#red this red text has no bullet point, no random case and no typing effect',txtspeed=0,bnum=0,rndcase=False)
# echo('#blue ^cyan blue cyan')
#echo('#red #reverse red reverse')
# echo('yellow red yellow on red 1')
# echo('yellow,red,yellow on red 2', sep=',')
# print 'yellow on red 3'
# for bg in colours:
# echo(bg.title().center(8), sep='.', end='')
# for fg in colours:
# att=[fg, bg]
# if fg==bg: att.append('blink')
# att.append(fg.center(8))
# echo(','.join(att), sep=',', end='')
#for att in attrs:
# echo('%s,%s' % (att, att.title().center(10)), sep=',', end='')
# print
from time import sleep, strftime, gmtime
colist='#grey #blue #cyan #white #cyan #blue'.split()
while True:
try:
for c in colist:
sleep(.1)
echo('%s #28;33 hit ctrl-c to quit' % c,txtspeed=0)
echo('%s #29;33 hit ctrl-c to quit' % c,rndcase=False,txtspeed=0)
#echo('#yellow #6;66 %s' % strftime('%H:%M:%S', gmtime()))
except KeyboardInterrupt:
break
except:
raise
echo('#10;1')
print
should also mention that i have absolutely no idea what this line does :) - well i see that it puts colours into a dictionary object, but how it does it is confusing. not used to this python syntax yet.
for k, v in dict(
attrs = 'none bold faint italic underline blink fast reverse concealed',
colours = 'grey red green yellow blue magenta cyan white'
).items(): globals()[k]=dict((s,i) for i,s in enumerate(v.split()))
This is a rather convoluted code - but, sticking to you r question, about the lines:
cmd.append(format % (colours[tmpword]+fgoffset))
This expression appends to the list named cmd the interpolation of the string contained in the variable format with the result of the expression (colours[tmpword]+fgoffset))- which concatenates the code in the color table (colours) named by tmpword with fgoffset.
The format string contains '1;%dm' which means it expects an integer number, whcih will replace the "%d" inside it. (Python's % string substitution inherits from C's printf formatting) . You "colours" color table ont he other hand is built in a convoluted way I'd recomend in no code, setting directly the entry in "globals" for it - but let's assume it does have the correct numeric value for each color entry. In that case, adding it to fgoffset will generate color codes out of range (IRCC, above 15) for some color codes and offsets.
Now the second line in which you are in doubt:
c=format % attrs[tmpword] if tmpword in attrs else None
This if is just Python's ternary operator - equivalent to the C'ish expr?:val1: val2
It is equivalent to:
if tmpword in attrs:
c = format % attrs[tmpword]
else:
c = format % None
Note that it has less precedence than the % operator.
Maybe you would prefer:
c= (format % attrs[tmpword]) if tmpword in attrs else ''
instead