Is there a way to detect exisiting link from a text file in python - python

I have code in jupyter notebook with the help of requests to get confirmation on whether that url existed or not and after that prints out the output into the text file. Here is the line code for that
import requests
Instaurl = open("dictionaries/insta.txt", 'w', encoding="utf-8")
cli = ['duolingo', 'ryanair', 'mcguinness.paddy', 'duolingodeutschland', 'duolingobrasil']
exist=[]
url = []
for i in cli:
r = requests.get("https://www.instagram.com/"+i+"/")
if r.apparent_encoding == 'Windows-1252':
exist.append(i)
url.append("instagram.com/"+i+"/")
Instaurl.write(url)
Let's say that inside the cli list, i accidentally added the same existing username as before into the text file (duolingo for example). Is there a way where if the requests found the same URL from the text file, it would not be added into the the text file again?
Thank you!

You defined a list:
cli = ['duolingo', ...]
It sounds like you would prefer to define a set:
cli = {'duolingo', ...}
That way, duplicates will be suppressed.
It happens for dups in the initial
assignment, and for any duplicate cli.add(entry) you might attempt later.

Related

How to send ket from text file

my question is how i can use the code: variable.send_keys(text file)
i tried to do this but its not typing enything someone know what is wrog in my code?
i wnat to send word not with "" i want to send them from text file because the text going to change
to something else evry time
description = driver.find_element_by_id("product_short_description")
descriptionTypy = ActionChains(driver)
descriptionTypy.click(description)
descriptionTypy.perform()
f = open("descriptionTyper", 'r')
description.send_keys(f)
Try using this instead of just click
descriptionTypy.click(on_element=description)

How to download list data from SharePoint Online to a csv (preferably) or json file?

I have accessed a list in SharePoint Online with Python and want to save the list data to a file (csv or json) to transform it and sort some metadata for a migration
I have full access to the Sharepoint site I am connecting(client ID, secret..).
from office365.runtime.auth.authentication_context import AuthenticationContext
from office365.runtime.client_request import ClientRequest
from office365.sharepoint.client_context import ClientContext
I have set my settings:
app_settings = {
'url': 'https://company.sharepoint.com/sites/abc',
'client_id': 'id',
'client_secret': 'secret'
}
Connecting to the site:
context_auth = AuthenticationContext(url=app_settings['url'])
context_auth.acquire_token_for_app(client_id=app_settings['client_id'],
client_secret=app_settings['client_secret'])
ctx = ClientContext(app_settings['url'], context_auth)
Getting the lists and checking the titles:
lists = ctx.web.lists
ctx.load(lists)
ctx.execute_query()
for lista in lists:
print(lista.properties["Title"]) # this gives me the titles of each list and it works.
lists is a ListCollection Object
From the previous code, I see that I want to get the list titled: "Analysis A":
a1 = lists.get_by_title("Analysis A")
ctx.load(a1)
ctx.execute_query() # a1 is a List item - non-iterable
Then I get the data in that list:
a1w = a1.get_items()
ctx.load(a1w)
ctx.execute_query() # a1w is a ListItemCollection - iterable
idea 1: df to json/csv
df1 = pd.DataFrame(a1w) #doens't work)
idea 2:
follow this link: How to save a Sharepoint list as a file?
I get an error while executing the json.loads command:
JSONDecodeError: Extra data: line 1 column 5 (char 4)
Alternatives:
I tried Shareplum, but can't connect with it, like I did with office365-python-rest. My guess is that it doesn't have an authorisation option with client id and client secret (as far as I can see)
How would you do it? Or am I missing something?
Sample test demo for your reference.
context_auth = AuthenticationContext(url=app_settings['url'])
context_auth.acquire_token_for_app(client_id=app_settings['client_id'],
client_secret=app_settings['client_secret'])
ctx = ClientContext(app_settings['url'], context_auth)
list = ctx.web.lists.get_by_title("ListA")
items = list.get_items()
ctx.load(items)
ctx.execute_query()
dataList = []
for item in items:
dataList.append({"Title":item.properties["Title"],"Created":item.properties["Created"]})
print("Item title: {0}".format(item.properties["Title"]))
pandas.read_json(json.dumps(dataList)).to_csv("output.csv", index = None,header=True)
Idea 1
It's hard to tell what can go wrong without the error trace. But I suspect it's likely to do with malformed data that you are passing as the argument. See here from the documentation to know exactly what's expected.
Do also consider updating your question with relevant stack error traces.
Idea 2
JSONDecodeError: Extra data: line 1 column 5 (char 4)
This error simply means that the Json string is not a valid format. You can validate JSON strings by using this service. This often tells you the point of error which you can then use it to manually fix the problem.
This error could also be caused if the object that is being parsed is a python object. You can avoid this by jsonifying each line as you go
data_list= []
for line in open('file_name.json', 'r'):
data_list.append(json.loads(line))
This avoids storing intermediate python objects. Also see this related issue if nothing works.

Get the 'Last saved by' (windows file) with python

How can I get the username value from the "Last saved by" property from any windows file?
e.g.: I can see this info right clicking on a word file and accessing the detail tab. See the picture below:
Does any body knows how can I get it using python code?
Following the comment from #user1558604, I searched a bit on google and reached a solution. I tested on extensions .docx, .xlsx, .pptx.
import zipfile
import xml.dom.minidom
# Open the MS Office file to see the XML structure.
filePath = r"C:\Users\Desktop\Perpetual-Draft-2019.xlsx"
document = zipfile.ZipFile(filePath)
# Open/read the core.xml (contains the last user and modified date).
uglyXML = xml.dom.minidom.parseString(document.read('docProps/core.xml')).toprettyxml(indent=' ')
# Split lines in order to create a list.
asText = uglyXML.splitlines()
# loop the list in order to get the value you need. In my case last Modified By and the date.
for item in asText:
if 'lastModifiedBy' in item:
itemLength = len(item)-20
print('Modified by:', item[21:itemLength])
if 'dcterms:modified' in item:
itemLength = len(item)-29
print('Modified On:', item[46:itemLength])
The result in the console is:
Modified by: adm.UserName
Modified On: 2019-11-08"

Discord bot - issue saving a text file after hosting

OK, I have been trying to think of a solution/find a solution myself for quite some time but everything I am attempting either ends up not a solution, or too complex for me to attempt without knowing it will work.
I have a discord bot, made in python. The bots purpose is to parse a blog for HTML links, when a new HTML link is posted, it will the post the link into discord.
I am using a textfile to save the latest link, and then parsing the website every 30seconds to check if a new link has been posted by comparing the link at position 0 in the array to the link in the textfile.
Now, I have managed to host my bot on Heroku with some success however I have since learned that Heroku cannot modify my textfile since it pulls the code from github, any changes are reverted after ~24hours.
Since learning this I have attempted to host the textfile on an AWS S3 bucket, however I have now learned that it can add and delete files, but not modify existing ones, and can only write new files from existing files on my system, meaning if I could do this, I wouldn't need to do this since I would be able to modify the file actually on my system and not need to host it anywhere.
I am looking for hopefully simple solutions/suggestions.
I am open to changing the hosting/whatever is needed, however I cannot pay for hosting.
Thanks in advance.
EDIT
So, I am editing this because I have a working solution thanks to a suggestion commented below.
The solution is to get my python bot to commit the new file to github, and then use that commited file's content as the reference.
import base64
import os
from github import Github
from github import InputGitTreeElement
user = os.environ.get("GITHUB_USER")
password = os.environ.get("GITHUB_PASSWORD")
g = Github(user,password)
repo = g.get_user().get_repo('YOUR REPO NAME HERE')
file_list = [
'last_update.txt'
]
file_names = [
'last_update.txt',
]
def git_commit():
commit_message = 'News link update'
master_ref = repo.get_git_ref('heads/master')
master_sha = master_ref.object.sha
base_tree = repo.get_git_tree(master_sha)
element_list = list()
for i, entry in enumerate(file_list):
with open(entry) as input_file:
data = input_file.read()
if entry.endswith('.png'):
data = base64.b64encode(data)
element = InputGitTreeElement(file_names[i], '100644', 'blob', data)
element_list.append(element)
tree = repo.create_git_tree(element_list, base_tree)
parent = repo.get_git_commit(master_sha)
commit = repo.create_git_commit(commit_message, tree, [parent])
master_ref.edit(commit.sha)
I then have a method called 'check_latest_link' which checks my github repo's RAW format, and parses that HTML to source the contents and then assigns that content as a string to my variable 'last_saved_link'
import requests
def check_latest_link():
res = requests.get('[YOUR GITHUB PAGE LINK - RAW FORMAT]')
content = res.text
return(content)
Then in my main method I have the follow :
#client.event
async def task():
await client.wait_until_ready()
print('Running')
while True:
channel = discord.Object(id=channel_id)
#parse_links() is a method to parse HTML links from a website
news_links = parse_links()
last_saved_link = check_latest_link()
print('Running')
await asyncio.sleep(5)
#below compares the parsed HTML, to the saved reference,
#if they are not the same then there is a new link to post.
if last_saved_link != news_links[0]:
#the 3 methods below (read_file, delete_contents and save_to_file)
#are methods that simply do what they suggest to a text file specified elsewhere
read_file()
delete_contents()
save_to_file(news_links[0])
#then we have the git_commit previously shown.
git_commit()
#after git_commit, I was having an issue with the github reference
#not updating for a few minutes, so the bot posts the message and
#then goes to sleep for 500 seconds, this stops the bot from
#posting duplicate messages. because this is an async function,
#it will not stop other async functions from executing.
await client.send_message(channel, news_links[0])
await asyncio.sleep(500)
I am posting this so I can close the thread with an "Answer" - please refer to post edit.

Username and Password login

I'd like to create a Login in which will open a text/csv file read the "Valid" usernames and passwords from the file and then if whatever the user has added has matched what was in the file then it will allow access to the rest of the program
How would i integrate the code below into one of which opens a file reads valid usernames and passwords and checks it against the users input
Currently i have something which works but there is only one password which i have set in the code.
Password = StringVar()
Username = StringVar()
def EnterPassword():
file = open('Logins.txt', 'w') #Text file i am using
with open('Logins.txt') as file:
data = file.read() #data=current text in text file
UsernameAttempt = Username.get()#how to get value from entry box
PasswordAttempt = Password.get()#how to get value from entry box
if PasswordAttempt == '' and UsernameAttempt == '':
self.delete()
Landlord = LandlordMenu()
else:
PasswordError = messagebox.showerror('Password/Username Entry','Incorrect Username or Password entered.\n Please try again.')
PasswordButton = Button(self.GenericGui,text = 'Landlord Section',height = 3, width = 15, command = EnterPassword, font = ('TkDefaultFont',14),relief=RAISED).place(x=60,y=175)
Some assistance would be appreciated
Please have a look at some documentation. Your question in "Coding Comments" -> #how to get value from entry box is easy to be solved using official documentation.
For reading files there is also official documentation on strings and file operations (reading file line by line into string, using string.split(';') to get arrays instead of row-strings).
Please do read documentation before writing applications. You do not need to know the complete API of all python modules but where to look. It is very exhausting to be dependent on other users / developers when there is no actual need for it (as there is very detailed documentation and tons of howtows for that kind of stuff).
This is not meant to be offensive but to show you how easy you can get documentation. Both results where first-results from a search engine. (ddg)
Please keep in mind that SO is neither a code writing service nor a let-me-google-that-for-you forum.

Categories