python mysql update failing when the string is huge - python

I have a table where a text column needs be updated. The column report_2_comments is the text column. When I update the column with a small string - "This is test message", I dont have any issues but when I update using the given below message, I get this error.
e == not enough arguments for format string
context = DbContext()
try:
qry = """Update Report_Table
set {2} = '{3}'
where valid_flag = 'Y' and report_status = 'C'
and report_name = '{0}'
and date(report_run_date) = '{1}';
""".format('Daily Errors Report'
, '2016-07-09', 'report_2_comments',
'8-3-2016 01:00 EST Affected DC region 1,000 errors over 2.5 hours (2%) 8-3-2016 13:00 EST Affected Virginia 500 errors over 11 hours (2%) 1233 8-3-2016 13:00 EST Affected 212/1412121001 - Date/skljld (sdlkjd)NOT_FOUND) 90,800 errors over 11 hours (2%) sldkdsdsd Fiber cut 8-3-2016 17:00 EST Affected 16703 - sdsdsd, WV (Tune Error) 15,400 errors over 7.5 hours (0.6%) sdkjd dskdjhsd sdkjhd')
print 'update qry == ', qry
output = context.execute(qry,())
except Exception as e:
print 'e == ', e
qry
Update vbo.Report_Table
set report_2_comments = '8-3-2016 01:00 EST Affected DC region 1,000 errors over 2.5 hours (2%) 8-3-2016 13:00 EST Affected Virginia 500 errors over 11 hours (2%) 1233 8-3-2016 13:00 EST Affected 212/1412121001 - Date/skljld (sdlkjd)NOT_FOUND) 90,800 errors over 11 hours (2%) sldkdsdsd Fiber cut 8-3-2016 17:00 EST Affected 16703 - sdsdsd, WV (Tune Error) 15,400 errors over 7.5 hours (0.6%) sdkjd dskdjhsd sdkjhd'
where valid_flag = 'Y' and report_status = 'C'
and report_name = 'Daily Errors Report'
and date(report_run_date) = '2016-07-09';
Table definition.
CREATE TABLE Report_Table (
id bigint(19) NOT NULL auto_increment,
report_name varchar(200),
report_run_date datetime,
report_status char(25),
valid_flag char(1),
report_1_comments text(65535),
report_2_comments text(65535),
report_3_comments text(65535),
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Related

How can I get specific time periods ohlcv in CCXT bitget?

Even if I type "since" keyword in ccxt bitget, always get only the latest information. The same code worked on ftx, what's the problem?
bitget = ccxt.bitget({'apiKey' : self.KEY,
'secret' : self.SECRET_KEY,
'enableRateLimit' : True,
'options' : {'defaultType':'swap'}
})
yyyymmdd = '20220301'
since = int(datetime(int(yyyymmdd[:4]),int(yyyymmdd[4:6]),int(yyyymmdd[6:])).timestamp()*1000)
ohlcv = bitget.fetch_ohlcv('BTC/USDT', '1m', since, limit = 1000)
ohlcv = pd.DataFrame(ohlcv)
ohlcv.columns = ['time','open','high','low','close','volume']
ohlcv['time'] = ohlcv['time'].apply(lambda x : datetime.fromtimestamp(x/1000).strftime('%Y%m%d %H:%M'))
time open high low close volume
0 20220322 14:36 42957.24 42959.97 42927.88 42927.88 1.8439
1 20220322 14:37 42927.88 42957.04 42927.88 42951.36 1.2933
2 20220322 14:38 42951.36 42951.36 42928.46 42932.59 0.6664
3 20220322 14:39 42932.59 42938.0 42916.22 42916.22 2.0336
4 20220322 14:40 42916.22 42916.22 42891.29 42897.49 2.0132
5 20220322 14:41 42897.49 42900.14 42880.96 42884.51 1.6279
6 20220322 14:42 42884.51 42893.26 42870.46 42870.46 2.3478
.
.
.
How can I get specific time period information ?
maybe since=since
also millisecond unix
since = int(datetime.datetime.strptime("2021-05-18 11:20:00+00:00", "%Y-%m-%d %H:%M:%S%z").timestamp() * 1000)
EDIT: looks like your unix was right my bad

Issue with SQLite query between 2 dates

I'm working on a Python project which use SQlite3 database.
I created database with only one table called "Message" with this kind of data:
connexion = sqlite3.connect(BDD)
c = connexion.cursor()
c.execute(f""" CREATE TABLE IF NOT EXISTS {arg}(
Id INTEGER PRIMARY KEY AUTOINCREMENT,
Jour_Heure_Reception text,
Jour_Heure_Reponse text,
Theme text,
Motif text,
Risque_incident_client text,
Transfert_sans_action text,
Matricule text,
Origine text)""")
connexion.commit()
connexion.close();
My goal is to search between 2 dates in the Jour_Heure_Reponse column, and return the number of entries by matricule.
To do that, I use this SQlite query:
def nbr(arg):
"requette dans la bdd de statistiques mails retournant le nombre de messages par utilisateur sur la période arg"
#Création de la liste des utilisateurs ayant saisies des entrées dans la BDD sur la période
connexion = sqlite3.connect(BDD)
c = connexion.cursor()
date_selection = str((datetime.now() - timedelta(arg)).strftime('%d/%m/%Y'))
yesterday = str(datetime.now().strftime('%d/%m/%Y'))
c.execute(f"""
select Matricule from Message Where
Jour_Heure_Reception >= "{date_selection}"
and Jour_Heure_Reponse < "{yesterday}" """ )
agents = c.fetchall()
liste_agents = []
for i in agents:
if not i[0] in liste_agents:
liste_agents.append(i[0])
c.close()
# calcul du nombre d'entrées pour chaque matricules présent dans la liste crée précédemment
connexion = sqlite3.connect(BDD)
c = connexion.cursor()
liste_affichage = []
for i in liste_agents:
c.execute(f"""SELECT * FROM Message where
Matricule = "{i}" and
Jour_Heure_Reception >= "{date_selection}" and
Jour_Heure_Reponse < "{yesterday}" """)
test = c.fetchall()
print(i)
for i in test:
print(i[1])
data_list = [str(i),str(len(test))]
liste_affichage.append(data_list)
c.close()
The problème is that one:
When i call mi nbr func, it return nothing if mi arg is not 1, and even with 1 arg, the result is not logical.
for exemple, calling nrb(1), return this ( i only print dates ):
02/06/2020
02/06/2020
02/06/2020
02/06/2020
02/06/2020
02/06/2020
02/06/2020
03/06/2020
03/06/2020
03/06/2020
03/06/2020
02/07/2020
02/07/2020
02/07/2020
03/07/2020
03/08/2020
02/09/2020
03/09/2020
03/09/2020
09/08/2020
09/08/2020
09/08/2020
09/08/2020
09/08/2020
01/10/2020
02/10/2020
02/10/2020
As you can see, the timedelta is not respected.
As the datatype is stored as Text, I send dates as str after a time.strftime() conversion.
Where am I going wrong?
Sqlite does not have a dedicated date type. It supports storing date using a text field in ISO-8601 format. You will have to do date format conversion, so that you can perform your comparison. For the format details and documentation see this:
https://www.sqlite.org/lang_datefunc.html
You should use %Y-%m-%d as format string.

How to read a particular line of interest from a text file?

Here I have a text file. I want to read Adress, Beneficiary, Beneficiary Bank, Acc Nbr, Total US$, Date which is at the top, RUT, BOX. I tried writing some code by myself but I am not able to correctly get the required information and moreover if the length of character changes I will not get correct output. How should I do this such that I will get every required information in a particular string.
The main problem will arise when my slicings will go wrong. For eg: I am using line[31:] for Acc Nbr. But if the address change then my slicing will also go wrong
My Text.txt
2014-11-09 BOX 1531 20140908123456 RUT 21 654321 0123
Girry S.A. CONTADO
G 5 Y Serie A
NO 098765
11 al Rayo 321 - Oqwerty 108 Monteaudio - Gruguay
Pharm Cosco, Inc - Britania PO Box 43215
Dirección Hot Springs AR 71903 - Estados Unidos
Oescripción Importe
US$
DO 7640183 - 50% of the Production Degree 246,123
Beneficiary Bank: Bankue Heritage (Gruguay) S.A Account Nbr: 1234563 Swift: MANIUYMM
Adress: Tencon 108 Monteaudio, Gruguay.
Beneficiary: Girry SA Acc Nbr: 1234567
Servicios prestados en el exterior, exentos de IVA o IRAE
Subtotal US$ 102,500
Iva US$ ---------------
Total US$ 102,500
I.V.A AL DIA Fecha de Vencimiento
IMPRENTA IRIS LTDA. - RUT 210161234015 - 0/40987 17/11/2015
CONSTANCIA N9 1234559842 -04/2013
CONTADO A 000.001/ A 000.050 x 2 VIAS
QWERTYAS ZXCVBIZADA
R. U.T. Bamprador Asdfumldor Final
Fecha 12/12/2014
1º ORIGINAL CLLLTE (Blanco) 2º CASIA AQWERVO (Rosasd)
My Code:
txt = 'Text.txt'
lines = [line.rstrip('\n') for line in open(txt)]
for line in lines:
if 'BOX' in line:
Date = line.split("BOX")[0]
BOX = line.split('BOX ', 1)[-1].split("RUT")[0]
RUT = line.split('RUT ',1)[-1]
print 'Date : ' + Date
print 'BOX : ' + BOX
print 'RUT : ' + RUT
if 'Adress' in line:
Adress = line[8:]
print 'Adress : ' + Adress
if 'NO ' in line:
Invoice_No = line.split('NO ',1)[-1]
print 'Invoice_No : ' + Invoice_No
if 'Swift:' in line:
Swift = line.split('Swift: ',1)[-1]
print 'Swift : ' + Swift
if 'Fecha' in line and '/' in line:
Invoice_Date = line.split('Fecha ',1)[-1]
print 'Invoice_Date : ' + Invoice_Date
if 'Beneficiary Bank' in line:
Beneficiary_Bank = line[18:]
Ben_Acc_Nbr = line.split('Nbr: ', 1)[-1]
print 'Beneficiary_Bank : ' + Beneficiary_Bank.split("Acc")[0]
print 'Ben_Acc_Nbr : ' + Ben_Acc_Nbr.split("Swift")[0]
if 'Beneficiary' in line and 'Beneficiary Bank' not in line:
Beneficiary = line[13:]
print 'Beneficiary : ' + Beneficiary.split("Acc")[0]
if 'Acc Nbr' in line:
Acc_Nbr = line.split('Nbr: ', 1)[-1]
print 'Acc_Nbr : ' + Acc_Nbr
if 'Total US$' in line:
Total_US = line.split('US$ ', 1)[-1]
print 'Total_US : ' + Total_US
Output:
Date : 2014-11-09
BOX : 1531 20140908123456
RUT : 21 654321 0123
Invoice_No : 098765
Swift : MANIUYMM
Beneficiary_Bank : Bankue Heritage (Gruguay) S.A
Ben_Acc_Nbr : 1234563
Adress : Tencon 108 Monteaudio, Gruguay.
Beneficiary : Girry SA
Acc_Nbr : 1234567
Total_US : 102,500
Invoice_Date : 12/12/2014
Some Code Changes
I have made some changes but still I am not convinced as I need to provide spaces also in split.
I would recommend you to use regular expressions to extract information you need. It helps to avoid the calculation of the numbers of offset characters.
import re
with open('C:\Quad.txt') as f:
for line in f:
match = re.search(r"Acc Nbr: (.*?)", line)
if match is not None:
Acc_Nbr = match.group(1)
print Acc_Nbr
# etc...
you can search to obtain index of it. for example:
if 'Acc Nbr' in line:
Acc_Nbr = line[line.find("Acc Nbr") + 10:]
print Acc_Nbr
note that find gives you index of first char of item you searched.

Organize by Twitter unique identifier using python

I have a CSV file with each line containing information pertaining to a particular tweet (i.e. each line contains Lat, Long, User_ID, tweet and so on). I need to read the file and organize the tweets by the User_ID. I am trying to end up with a given User_ID attached to all of the tweets with that specific ID.
Here is what I want:
user_id: 'lat', 'long', 'tweet'
: 'lat', 'long', 'tweet'
user_id2: 'lat', 'long', 'tweet'
: 'lat', 'long', 'tweet'
: 'lat', 'long', 'tweet'
and so on...
This is a snip of my code that reads in the CSV file and creates a list:
UID = []
myID = []
ID = []
f = None
with open(csv_in,'rU') as f:
myreader = csv.reader(f, delimiter=',')
for row in myreader:
# Assign columns in csv to variables.
latitude = row[0]
longitude = row[1]
user_id = row[2]
user_name = row[3]
date = row[4]
time = row[5]
tweet = row[6]
flag = row[7]
compound = row[8]
Vote = row[9]
# Read variables into separate lists.
UID.append(user_id + ', ' + latitude + ', ' + longitude + ', ' + user_name + ', ' + date + ', ' + time + ', ' + tweet + ', ' + flag + ', ' + compound)
myID = ', '.join(UID)
ID = myID.split(', ')
I'd suggest you use pandas for this. It will allow you not only to list your tweets by user_id, as in your question, but also to do many other manipulations quite easily.
As an example, take a look at this python notebook from NLTK. At the end of it, you see an operation very closed to yours, reading a csv file containing tweets,
In [25]:
import pandas as pd
​
tweets = pd.read_csv('tweets.20150430-223406.tweet.csv', index_col=2, header=0, encoding="utf8")
You can also find a simple operation: looking for the tweets of a certain user,
In [26]:
tweets.loc[tweets['user.id'] == 557422508]['text']
Out[26]:
id
593891099548094465 VIDEO: Sturgeon on post-election deals http://...
593891101766918144 SNP leader faces audience questions http://t.c...
Name: text, dtype: object
For listing the tweets by user_id, you would simply do something like the following (this is not in the original notebook),
In [9]:
tweets.set_index('user.id')[0:4]
Out[9]:
created_at favorite_count in_reply_to_status_id in_reply_to_user_id retweet_count retweeted text truncated
user.id
107794703 Thu Apr 30 21:34:06 +0000 2015 0 NaN NaN 0 False RT #KirkKus: Indirect cost of the UK being in ... False
557422508 Thu Apr 30 21:34:06 +0000 2015 0 NaN NaN 0 False VIDEO: Sturgeon on post-election deals http://... False
3006692193 Thu Apr 30 21:34:06 +0000 2015 0 NaN NaN 0 False RT #LabourEoin: The economy was growing 3 time... False
455154030 Thu Apr 30 21:34:06 +0000 2015 0 NaN NaN 0 False RT #GregLauder: the UKIP east lothian candidat... False
Hope it helps.

Parsing unstructured text in Python

I wanted to parse a text file that contains unstructured text. I need to get the address, date of birth, name, sex, and ID.
. 55 MORILLO ZONE VIII,
BARANGAY ZONE VIII
(POB.), LUISIANA, LAGROS
F
01/16/1952
ALOMO, TERESITA CABALLES
3412-00000-A1652TCA2
12
. 22 FABRICANTE ST. ZONE
VIII LUISIANA LAGROS,
BARANGAY ZONE VIII
(POB.), LUISIANA, LAGROS
M
10/14/1967
AMURAO, CALIXTO MANALO13
In the example above, the first 3 lines is the address, the line with just an "F" is the sex, the DOB would be the line after "F", name after the DOB, the ID after the name, and the no. 12 under the ID is the index/record no.
However, the format is not consistent. In the second group, the address is 4 lines instead of 3 and the index/record no. is appended after the name (if the person doesn't have an ID field).
I wanted to rewrite the text into the following format:
name, ID, address, sex, DOB
Here is a first stab at a pyparsing solution (easy-to-copy code at the pyparsing pastebin). Walk through the separate parts, according to the interleaved comments.
data = """\
. 55 MORILLO ZONE VIII,
BARANGAY ZONE VIII
(POB.), LUISIANA, LAGROS
F
01/16/1952
ALOMO, TERESITA CABALLES
3412-00000-A1652TCA2
12
. 22 FABRICANTE ST. ZONE
VIII LUISIANA LAGROS,
BARANGAY ZONE VIII
(POB.), LUISIANA, LAGROS
M
10/14/1967
AMURAO, CALIXTO MANALO13
"""
from pyparsing import LineEnd, oneOf, Word, nums, Combine, restOfLine, \
alphanums, Suppress, empty, originalTextFor, OneOrMore, alphas, \
Group, ZeroOrMore
NL = LineEnd().suppress()
gender = oneOf("M F")
integer = Word(nums)
date = Combine(integer + '/' + integer + '/' + integer)
# define the simple line definitions
gender_line = gender("sex") + NL
dob_line = date("DOB") + NL
name_line = restOfLine("name") + NL
id_line = Word(alphanums+"-")("ID") + NL
recnum_line = integer("recnum") + NL
# define forms of address lines
first_addr_line = Suppress('.') + empty + restOfLine + NL
# a subsequent address line is any line that is not a gender definition
subsq_addr_line = ~(gender_line) + restOfLine + NL
# a line with a name and a recnum combined, if there is no ID
name_recnum_line = originalTextFor(OneOrMore(Word(alphas+',')))("name") + \
integer("recnum") + NL
# defining the form of an overall record, either with or without an ID
record = Group((first_addr_line + ZeroOrMore(subsq_addr_line))("address") +
gender_line +
dob_line +
((name_line +
id_line +
recnum_line) |
name_recnum_line))
# parse data
records = OneOrMore(record).parseString(data)
# output the desired results (note that address is actually a list of lines)
for rec in records:
if rec.ID:
print "%(name)s, %(ID)s, %(address)s, %(sex)s, %(DOB)s" % rec
else:
print "%(name)s, , %(address)s, %(sex)s, %(DOB)s" % rec
print
# how to access the individual fields of the parsed record
for rec in records:
print rec.dump()
print rec.name, 'is', rec.sex
print
Prints:
ALOMO, TERESITA CABALLES, 3412-00000-A1652TCA2, ['55 MORILLO ZONE VIII,', 'BARANGAY ZONE VIII', '(POB.), LUISIANA, LAGROS'], F, 01/16/1952
AMURAO, CALIXTO MANALO, , ['22 FABRICANTE ST. ZONE', 'VIII LUISIANA LAGROS,', 'BARANGAY ZONE VIII', '(POB.), LUISIANA, LAGROS'], M, 10/14/1967
['55 MORILLO ZONE VIII,', 'BARANGAY ZONE VIII', '(POB.), LUISIANA, LAGROS', 'F', '01/16/1952', 'ALOMO, TERESITA CABALLES', '3412-00000-A1652TCA2', '12']
- DOB: 01/16/1952
- ID: 3412-00000-A1652TCA2
- address: ['55 MORILLO ZONE VIII,', 'BARANGAY ZONE VIII', '(POB.), LUISIANA, LAGROS']
- name: ALOMO, TERESITA CABALLES
- recnum: 12
- sex: F
ALOMO, TERESITA CABALLES is F
['22 FABRICANTE ST. ZONE', 'VIII LUISIANA LAGROS,', 'BARANGAY ZONE VIII', '(POB.), LUISIANA, LAGROS', 'M', '10/14/1967', 'AMURAO, CALIXTO MANALO', '13']
- DOB: 10/14/1967
- address: ['22 FABRICANTE ST. ZONE', 'VIII LUISIANA LAGROS,', 'BARANGAY ZONE VIII', '(POB.), LUISIANA, LAGROS']
- name: AMURAO, CALIXTO MANALO
- recnum: 13
- sex: M
AMURAO, CALIXTO MANALO is M
you have to exploit whatever regularity and structure the text does have.
I suggest you read one line at a time and match it to a regular expression to determine its type, fill in the appropriate field in a person object. writing out that object and starting a new one whenever you get a field that you already have filled in.
It may be overkill, but the leading edge machine learning algorithms for this type of problem are based on conditional random fields. For example, Accurate Information Extraction from Research Papers
using Conditional Random Fields.
There is software out there that makes training these models relatively easy. See Mallet or CRF++.
You can probably do this with regular expressions without too much difficulty. If you have never used them before, check out the python documentation, then fire up redemo.py (on my computer, it's in c:\python26\Tools\scripts).
The first task is to split the flat file into a list of entities (one chunk of text per record). From the snippet of text you gave, you could split the file with a pattern matching the beginning of a line, where the first character is a dot:
import re
re_entity_splitter = re.compile(r'^\.')
entities = re_entity_splitter.split(open(textfile).read())
Note that the dot must be escaped (it's a wildcard character by default). Note also the r before the pattern. The r denotes 'raw string' format, which excuses you from having to escape the escape characters, resulting in so-called 'backslash plague.'
Once you have the file split into individual people, picking out the gender and birthdate is a snap. Use these:
re_gender = re.compile(r'^[MF]')
re_birth_Date = re.compile(r'\d\d/\d\d/\d\d')
And away you go. You can paste the flat file into re demo GUI and experiment with creating patterns to match what you need. You'll have it parsed in no time. Once you get good at this, you can use symbolic group names (see docs) to pick out individual elements quickly and cleanly.
Here's a quick hack job.
f = open('data.txt')
def process(file):
address = ""
for line in file:
if line == '': raise StopIteration
line = line.rstrip() # to ignore \n
if line in ('M','F'):
sex = line
break
else:
address += line
DOB = file.readline().rstrip() # to ignore \n
name = file.readline().rstrip()
if name[-1].isdigit():
name = re.match(r'^([^\d]+)\d+', name).group(1)
ID = None
else:
ID = file.readline().rstrip()
file.readline() # ignore the record #
print (name, ID, address, sex, DOB)
while True:
process(f)

Categories