How to replace especific text dinamicaly inside DOC file using python - python

I'm wondering how I could replace a especific text cited by a parentheses: (wordExample), a double comma "wordExample" or any other marker.
Here's a example:
I would have a Example.doc file with the following text:
"Hello, this is just a dummy text written teo get Ethereums price: {cripto:ETH} dolars"
My python script would find the DOC file localy and get all the variables needed to be replaced. As we have the example:
{cripto:ETH}
Another script I have would find the variables asked for. Knowing it could haved asked for any set of cripto prices, like "cripto:BTC". Now I have the ETH's price and the following date.
Now it should generate another DOC file replacing {cripto:ETH} with ETH's price and a screenshot of it's graph.
Thats it! I just want to know how I can get do that sistem of getting the DOC file and replacing some of it's elements dinamicaly

Related

How does "runs" works in python docx

I'm having difficulties understanding the "run" object. As described, run object identifies the same style of text continuation. However, when I run a paragraph which all words are in the same style, "runs" still return me more than one line.
To make sure I didn't miss any possible style issues, I created a new word doc and typed as below
Hjkhkuhu joiuiuouoiuo iouiouououoi iouiououiuiuiui hhvvhgh hgjjhjhhh hjhjhjhjhjhj hjhjhj, jjkjkjk jkjkjkjkiuio uiouiouoo! jkjkjlkjlk
And I run below code:
from docx import Document
doc = Document('test.docx')
for p in doc.paragraphs:
for run in p.runs:
print(run.text)
And here is the result I got:
Hjkhkuhu
joiuiuouoiuo
iouiouououoi
iouiououiuiuiui
hhvvhgh
hgjjhjhhh
hjhjhjhjhjhj
hjhjhj
jjkjkjk
jkjkjkjkiuio
uiouiouoo ! jkjkjlkjlk
Why is this the case? Did I miss anything?
Having spent a few days tussling with docx runs now..
Paragraphs <w:p> contain one or more "style" runs <w:r> containing one or more texts <w:t>
But Docx runs are very easy to break, and when broken they can hide it very well.
Just having two texts the same format isn't necessarily enough to make it the same run. They don't automatically join, and changing format on text in a run then changing it back is enough to give you two separate but identically formatted runs.
(Greater experts than I dig more into runs/text here DOCX w:t (text) elements crossing multiple w:r (run) elements?)
This caused me a lot of problems with a 'tag substitution within runs' task, leading me to conclude that the only way to guarantee text is all in one run, is to enter it yourself (with unchanging format) in one fell swoop.

AWS Textract table extraction broke rows with integers that has comma inside it into another column

I would like to use AWS Textract to convert my image into tables in python and download it as CSV.
So, I followed the documentation and examples code from AWS here:
https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/python/example_code/textract/textract_python_table_parser.py
Apparently the code in the link above will separate the commas in the integers into another column. I will explain with image and steps to reproduce the error below:
So this is the example of my table in image form.
If you want to reproduce the error, clone the code in the github repo and type the following code in your cmd/terminal
python textract_python_table_parser.py <your-image-filename.png>
The error is as attached below:
As you can see in the ["Amount (USD)"] column, values with commas inside it will break into the ["Transaction Date"] column. Even I read the csv file in pandas also didn't work.
I wonder if which line of code in the GitHub repo broke the comma separation into another column
Just found out that in the GitHub link, for line 114, just add "" to the curly bracket:
csv += '"{}"'.format(text) + ","
The reason is to transform all the texts into string so CSV won't take the commas inside the string into consideration during formating.

Issue with strings and dictionaries in python

I'm using the click package to get input for one or more variables which get loaded in as a combined dictionary. Each entry is then joined and the combined string is added to the end of a base URL and sent through the requests package to receive some xml data.
Earlier I had an issue with one of the variables that let you search through a range, such as
[value1, value2]
Python added double quotes around it so the search function didn't operate correctly, so I used
.replace('"', '')
on the joined string before combined with the base url and that seemed to fix that problem. The issue now is that individual input that contains more than one word now doesn't produce the same output as the actual search engine online. I have to use quotes when I input the information to keep it as a single argument, but then the quotes get removed by the function above and I believe that is what is causing the issue.
I think if I have a way to access individual entries of this dictionary and remove the double quotes from only certain entries then that should get the job done. But if I am overlooking something please let me know.
Help is appreciated.
Code added below:
import click
import requests
#click.command()
#click.option(--variable1)
#click.option(--variable2)
query_list=[variable1, variable2]
query=''.join(query_list)
base_url = "abc.com...."
response=requests.get(base_url,query)

Using .replace() to change date within link alters other words within link

I am using some code that takes advantage of webbrowser pkg to follow a link in Python and download data. Within the link the date is referred to multiple times in format DDMMMYY (ie. 24Jul19) - I am using the code to make sure the date refers to today properly, so when I start the code it prompts the user to enter yest's date, (in this case 23Jul19) and scans the link for all instances of this date, then replaces it with 24Jul19, so that I now how the proper link for today's date. This seems to work fine (all instances of 23Jul19 change to 24Jul19 without issue), but for some reason it alters a completely different piece of the link which leads me to downloading blank data fields.
The links are all stored as strings within a text file that is read into python. The program first prompts the user to enter yesterday's date, and enter today's date. When the program is closed, it writes back the link with the new dates, but adds a totally different change to the code, and I am not sure what's happening here.
Date = raw_input("Enter Today's Date (DDMMMYY): ")
Date_Yest = raw_input("Enter Previous Date (DDMMMYY): ")
x = []
with open("links.txt") as f:
for l in f:
x.append(l.strip())
for i in x:
if 'A' in i:
A_file = i
if 'B' in i:
B_file = i
and then I am using the replace function as such:
with open('Loan_links.txt','w') as text_file:
text_file.write(A_file+"\n")
text_file.write("\n")
text_file.write(B_file+"\n")
The original link (without providing private details) looks something like this :
...ignorecolumns=Model+Calc%2C&param0=64192&param1=USER&param2=23Jul19&param3=23Jul19...
When the program runs, the dates in the portion of the link properly change from 23Jul19 to 24Jul19, BUT before the place that says "ignorecolumns" it adds a whole other string of words that aren't in the first link at all. I'm not sure if this has something to do with the way to code is interacting with the weblink itself through the browser. But I have ZERO clue why or how there are so many characters being added to this link. It is all valid words, info, etc., but not included in the original link in the text file AT ALL.
It adds:
...allreplacementcolumnnames=Offer%3DClose+Offer%2CBid%3DClose+Bid%2CDepth%3DClose+Depth%2C...
It's obviously financial data I'm working with here, but for privacy's sake that's all I can expand on link wise.
Any idea what would cause the link to add a ton of extra text in general?

How do I set a variable to a regex string in Python Script for notepad++?

I am trying to set a variable (x) to a string in a text file using regular expression.
Within the file I am searching there exists several lines of code and always one of those lines is a ticket number WS########. Looks like this
~File~
out.test
WS12345678
something here
pineapple joe
etc.
~Code~
def foundSomething(m):
console.write('{0}\n'.format(m.group(0), str(m.span(0))))
editor1.research('([W][S]\d\d\d\d\d\d\d\d)', foundSomething)
Through my research i've managed to get the above code to work, it outputs WS12345678 to the console when the cooresponding text exists within a file.
How do I put WS12345678 to a variable so I can save that file with the corresponding number?
EDIT
To put it in pseudo code I am trying to
x = find "WS\d\d\d\d\d\d\d\d"
file.save(x)
Solution
Thank you #Kasra AD for the solution. I was able to create a workaround.
import re #import regular expression
test = editor1.getText() #variable = all the text in an editor window
savefilename = re.search(r"WS\d{8}",test).group(0) #setting the savefile variable
console.write(savefilename) #checking the variable
To find a specific string within a file in notepad++ using the PythonScript plugin you can pull everything from 1 editor into a string and run a regex search on that.
You need to return the result in your function then simply assign to a variable :
def foundSomething(m):
return console.write('{0}\n'.format(m.group(0), str(m.span(0))))
my_var=foundSomething(input_arg)
and also for extract your desire string you can use the following regex :
>>> s="""out.test
... WS12345678
... something here
... pineapple joe"""
>>> import re
>>> re.search(r'WS\d{8}',s).group(0)
'WS12345678'

Categories