How do i properly send tuples of information to a database - python

Hey so i have this CSV file which is structured in this way: ['message', 'Date', 'Name', 'Location of a train station']
['vies', 'Mon 07 Nov 2022 18:43', 'Mia', 'Haarlem']
['vies', 'Mon 07 Nov 2022 18:43', 'Mia', 'Amsterdam']
['vies', 'Mon 07 Nov 2022 18:43', 'Mia', 'Sittard']
['vies', 'Mon 07 Nov 2022 18:43', 'Mia', 'Venlo']
['vies', 'Mon 07 Nov 2022 18:43', 'Mia', 'Helmond']
['Het zou wel wat schoner mogen zijn', 'Tue 08 Nov 2022 00:49', 'Tijmen', 'Hilversum']
['Het zou wel wat schoner mogen zijn', 'Tue 08 Nov 2022 00:49', 'anoniem', 'Roosendaal']
Now i want to insert this information into my postgresql database
import csv
import psycopg2
with open('C:\\Users\\Danis Porovic\\PycharmProjects\\Module1\\berichten.csv', 'r') as
csv_file:
csv_reader = csv.reader(csv_file)
index = list(zip(*csv.reader(csv_file)))
messages =index[0]
data = index[1]
names = index[2]
stations = index[3]
con = psycopg2.connect(
host = "localhost",
database = "fabriek",
user = "postgres",
password = "DanisMia1")
cur = con.cursor()
cur.execute("insert into klant (naam) values (%s);", (names,))
con.commit()
con.close()
How would i go about inserting all names into a column succesfully in my database?
The current zip method i'm using at the top makes a tuple out of the 4 strings. Would inserting tuples even work?
This is how the tuple looks like of the names for example:
('Mia', 'Danis', 'Jeffrey', 'Tim', 'Joppe', 'Tijmen', 'anoniem')

Related

How to save the contents of a for loop in a variable

Hey so i have a program here that reads into a CSV file which i created earlier
import csv
import psycopg2
with open('C:\\Users\\Danis Porovic\\PycharmProjects\\Module1\\berichten.csv', 'r') as csv_file:
csv_reader = csv.reader(csv_file)
for line in csv_reader:
print(line)
the for loop gives this as a result
['vies', 'Mon 07 Nov 2022 18:43', 'Mia', 'Haarlem']
['vies', 'Mon 07 Nov 2022 18:43', 'Mia', 'Amsterdam']
['vies', 'Mon 07 Nov 2022 18:43', 'Mia', 'Sittard']
['vies', 'Mon 07 Nov 2022 18:43', 'Mia', 'Venlo']
['vies', 'Mon 07 Nov 2022 18:43', 'Mia', 'Helmond']
['Het zou wel wat schoner mogen zijn', 'Tue 08 Nov 2022 00:49', 'Tijmen', 'Hilversum']
['Het zou wel wat schoner mogen zijn', 'Tue 08 Nov 2022 00:49', 'anoniem', 'Roosendaal']
Now how could i save all this information from the for loop in a single variable?
Something like this should do the trick:
result = []
for line in csv_reader:
result.append(line)
Or using list comprehension:
result = [line for line in csv_reader]
Edit: After comments outlining in more detail what you're looking for. We can transpose the list you usually get so each sublist represents a column from the .csv rather than a row:
import csv
with open('a.csv', 'r') as csv_file:
regular_list = list(csv.reader(csv_file))
transposed_result = list(map(list, zip(*regular_list)))
print(transposed_result)
Shoutout to #Mark Tolonen for removing the for loop iteration and jumping right to csv.reader(csv_file)

Sort dictionary by dict value

I have such dict:
resp = {'1366451044687880192': {'created_at': 'Mon Mar 01 18:11:31 +0000 2021', 'id': 1366463640451233323}, '1366463640451256323': {'created_at': 'Mon Mar 05 19:01:34 +0000 2021', 'id': 1366463640451256323}}
Is it possible to sort it by created_at value?
Tried but doesnt work:
sorted(resp.values(), key=lambda item: item[0]['created_at'])
Try it online!
resp = {
'1366451044687880192': {
'created_at': 'Mon Mar 01 18:11:31 +0000 2021',
'id': 1366463640451233323
},
'1366463640451256323': {
'created_at': 'Mon Mar 05 19:01:34 +0000 2021',
'id': 1366463640451256323
}
}
print(sorted(resp.values(), key=lambda item: item['created_at']))
Output:
[
{'created_at': 'Mon Mar 01 18:11:31 +0000 2021', 'id': 1366463640451233323},
{'created_at': 'Mon Mar 05 19:01:34 +0000 2021', 'id': 1366463640451256323}
]
Or you can sort key-values (items) through (Try it online!):
sorted(resp.items(), key = lambda item: item[1]['created_at'])
which outputs:
[
('1366451044687880192', {'created_at': 'Mon Mar 01 18:11:31 +0000 2021', 'id': 1366463640451233323}),
('1366463640451256323', {'created_at': 'Mon Mar 05 19:01:34 +0000 2021', 'id': 1366463640451256323})
]

Key error while appending the details to dictionary

I have test log like below. Trying to read it in better way. Got key error while adding elements to the dictionary. while checking the if condition there is no output is generated and while doing elif got key error
Jan 23 2016 10:30:08AM - bla bla Server-1A linked
Jan 23 2016 11:04:56AM - bla bla Server-1B linked
Jan 23 2016 1:18:32PM - bla bla Server-1B dislinked from server
Jan 23 2016 4:16:09PM - bla bla DOS activity from 201.10.0.4
Jan 23 2016 9:43:44PM - bla bla Server-1A dislinked from server
Feb 1 2016 12:40:28AM - bla bla Server-1A linked
Feb 1 2016 1:21:52AM - bla bla DOS activity from 192.168.123.4
Mar 29 2016 1:13:07PM - bla bla Server-1A dislinked from server
Code
result = []
_dict = {}
spu = []
with open(r'C:\Users\Desktop\test.log') as f:
for line in f:
date, rest = line.split(' - ', 1)
conn_disconn = rest.split(' ')[3]
server_name = rest.split(' ')[2]
if line.strip()[-1].isdigit():
dos = re.findall('[0-9]+(?:\.[0-9]+){3}',line)
spu.extend(dos)
##Error part is below
if conn_disconn == 'linked':
dict_to_append = {server_name: [(conn_disconn, date)]}
print (dict_to_append)
_dict[server_name] = dict_to_append
result.append(dict_to_append)
elif conn_disconn == 'dislinked':
_dict[server_name][server_name].append(conn_disconn,date)
del _dict[server_name]
print (result)
Expected out
[{'Server-1A': [('linked', 'Jan 23 2016 11:30:08AM'), ('dislinked', 'Jan 23 2016 10:43:44PM')]},
{'Server-1B': [('linked', 'Jan 23 2016 12:04:56AM'), ('dislinked', 'Jan 23 2016 2:18:32PM')]},
{'Server-1A': [('linked', 'Feb 1 2016 1:40:28AM'), ('dislinked', 'Mar 29 2016 2:13:07PM')]},
{'Server-1A': [('linked', 'Jan 23 2016 11:30:08AM'), ('dislinked', 'Jan 23 2016 10:43:44PM')]},
{'Server-1B': [('linked', 'Jan 23 2016 12:04:56AM'), ('dislinked', 'Jan 23 2016 2:18:32PM')]},
{'Server-1A': [('linked', 'Feb 1 2016 1:40:28AM'), ('dislinked', 'Mar 29 2016 2:13:07PM')]},
{'Server-1A': [('linked', 'Jan 23 2016 11:30:08AM'), ('dislinked', 'Jan 23 2016 10:43:44PM')]},
{'Server-1B': [('linked', 'Jan 23 2016 12:04:56AM'), ('dislinked', 'Jan 23 2016 2:18:32PM')]},
{'Server-1A': [('linked', 'Feb 1 2016 1:40:28AM'), ('dislinked', 'Mar 29 2016 2:13:07PM')]},
{Dos:['201.10.0.4','192.168.123.4']}]
When you are checking if conn_disconn == 'linked': , conn_disconn has linked\n so it is not adding to dictionary and you are getting the key error.
import re
result = []
_dict = {}
spu = []
with open("r'C:\Users\Desktop\test.log'") as f:
for line in f:
date, rest = line.split(' - ', 1)
conn_disconn = rest.split(' ')[3].strip()
server_name = rest.split(' ')[2]
if line.strip()[-1].isdigit():
dos = re.findall('[0-9]+(?:\.[0-9]+){3}',line)
spu.extend(dos)
##Error part is below
if conn_disconn == 'linked':
dict_to_append = {server_name: [(conn_disconn, date)]}
print (dict_to_append)
_dict[server_name] = dict_to_append[server_name]
result.append(dict_to_append)
elif conn_disconn == 'dislinked':
_dict[server_name].append((conn_disconn,date))
del _dict[server_name]
print (result)
Output:
[{'Server-1A': [('linked', 'Jan 23 2016 10:30:08AM'), ('dislinked', 'Jan 23 2016 9:43:44PM')]}, {'Server-1B': [('linked', 'Jan 23 2016 11:04:56AM'), ('dislinked', 'Jan 23 2016 1:18:32PM')]}, {'Server-1A': [('linked', 'Feb 1 2016 12:40:28AM'), ('dislinked', 'Mar 29 2016 1:13:07PM')]}]
append takes one argument but you have given two in some cases. Look at this line's append parameters in your code.
_dict[server_name][server_name].append(conn_disconn,date)
Instead of that you need to add parantheses in order to pass tuple like this:
_dict[server_name][server_name].append((conn_disconn,date))
Try this:
data=[]
dff.seek(0)
for line in dff:
try:
date = re.search(r'\b^.*PM|\b^.*AM', line).group()
server = re.search(r'\b(?:Server-\d[A-Z]|Server-1B)\b', line).group()
linked = re.search(r'\b(?:linked|dislinked)\b', line).group().split()[0]
except:
continue
data.append({server: [(linked, date)]})
data
Out[2374]:
#[{'Server-1A': [('linked', 'Jan 23 2016 10:30:08AM')]},
# {'Server-1B': [('linked', 'Jan 23 2016 11:04:56AM')]},
# {'Server-1B': [('dislinked', 'Jan 23 2016 1:18:32PM')]},
# {'Server-1A': [('dislinked', 'Jan 23 2016 9:43:44PM')]},
# {'Server-1A': [('linked', 'Feb 1 2016 12:40:28AM')]},
# {'Server-1A': [('dislinked', 'Mar 29 2016 1:13:07PM')]}#]

Concatenate ListA elements with partially matching ListB elements

Say I have two python lists as:
ListA = ['Jan 2018', 'Feb 2018', 'Mar 2018']
ListB = ['Sales Jan 2018','Units sold Jan 2018','Sales Feb 2018','Units sold Feb 2018','Sales Mar 2018','Units sold Mar 2018']
I need to get an output as:
List_op = ['Jan 2018 Sales Jan 2018 Units sold Jan 2018','Feb 2018 Sales Feb 2018 Units sold Feb 2018','Mar 2018 Sales Mar 2018 Units sold Mar 2018']
My approach so far:
res=set()
for i in ListB:
for j in ListA:
if j in i:
res.add(f'{i} {j}')
print (res)
this gives me result as:
{'Units sold Jan 2018 Jan 2018', 'Sales Feb 2018 Feb 2018', 'Units sold Mar 2018 Mar 2018', 'Units sold Feb 2018 Feb 2018', 'Sales Jan 2018 Jan 2018', 'Sales Mar 2018 Mar 2018'}
which is definitely not the solution I'm looking for.
What I think is regular expression could be a handful here but I'm not sure how to approach. Any help in this regard is highly appreciated.
Thanks in advance.
Edit:
Values in ListA and ListB are not necessarily to be in order. Therefore for a particular month/year value in ListA, the same month/year value from ListB has to be matched and picked for both 'Sales' and 'Units sold' component and needs to be concatenated.
My main goal here is to get the list which I can use later to generate a statement that I'll be using to write Hive query.
Added more explanation as suggested by #andrew_reece
Assuming no additional edge cases that need taking care of, your original code is not bad, just needs a slight update:
List_op = []
for a in ListA:
combined = a
for b in ListB:
if a in b:
combined += " " + b
List_op.append(combined)
List_op
['Jan 2018 Sales Jan 2018 Units sold Jan 2018',
'Feb 2018 Sales Feb 2018 Units sold Feb 2018',
'Mar 2018 Sales Mar 2018 Units sold Mar 2018']
Supposing ListA and ListB are sorted:
ListA = ['Jan 2018', 'Feb 2018', 'Mar 2018']
ListB = ['Sales Jan 2018','Units sold Jan 2018','Sales Feb 2018','Units sold Feb 2018','Sales Mar 2018','Units sold Mar 2018']
print([v1 + " " + v2 for v1, v2 in zip(ListA, [v1 + " " + v2 for v1, v2 in zip(ListB[::2], ListB[1::2])])])
This will print:
['Jan 2018 Sales Jan 2018 Units sold Jan 2018', 'Feb 2018 Sales Feb 2018 Units sold Feb 2018', 'Mar 2018 Sales Mar 2018 Units sold Mar 2018']
In my example I firstly concatenate ListB variables together and then join ListA with this new list.
String concatenation can become expensive. In Python 3.6+, you can use more efficient f-strings within a list comprehension:
res = [f'{i} {j} {k}' for i, j, k in zip(ListA, ListB[::2], ListB[1::2])]
print(res)
['Jan 2018 Sales Jan 2018 Units sold Jan 2018',
'Feb 2018 Sales Feb 2018 Units sold Feb 2018',
'Mar 2018 Sales Mar 2018 Units sold Mar 2018']
Using itertools.islice, you can avoid the expense of creating new lists:
from itertools import islice
zipper = zip(ListA, islice(ListB, 0, None, 2), islice(ListB, 1, None, 2))
res = [f'{i} {j} {k}' for i, j, k in zipper]

Python Plot X Access Time Values Not Formatted Correctly

I have a large amount of data in CSV Format that looks like this:
(u'Sat Jan 17 18:56:05 +0000 2015', u'anx321', 'RT #ManojHarry27: If India loses 2015 worldcup, Karishma\ntanna will be held responsible !!! #BB8', '0.0453125', '0.325')
(u'Sat Jan 17 18:56:13 +0000 2015', u'FrancisKimberl3', 'Python form imploration overgrowth-the consummative the very best as representing construction upsurge: sDGy', '1.0', '0.39')
(u'Sat Jan 17 18:56:18 +0000 2015', u'AllTechBot', 'RT #ruby_engineer: A workshop on monads with C++14 http://t.co/OKFc91J0QJ #hacker #rubyonrails #python #AllTech', '0.0', '0.0')
(u'Sat Jan 17 18:56:22 +0000 2015', u'python_job', ' JOB ALERT #ITJob #Job #New York - Senior Software Engineer Python Backed by First Round http://t.co/eqVxoMzYMG view full details', '0.245454545455', '0.44595959596')
(u'Sat Jan 17 18:56:23 +0000 2015', u'weepingtaco', 'Python: basic but beautiful', '0.425', '0.5625')
(u'Sat Jan 17 18:56:27 +0000 2015', u'python_IT_jobs', ' JOB ALERT #ITJob #Job #New York - Senior Software Engineer Python Backed by First Round http://t.co/gavWyraNqE view full details', '0.245454545455', '0.44595959596')
(u'Sat Jan 17 18:56:32 +0000 2015', u'accusoftinfoway', 'RT #findmjob: DevOps Engineer http://t.co/NasdBEEnRp #aws #perl #mysql #linux #hadoop #python #Puppet #jobs #hiring #careers', '0.0', '0.0')
(u'Sat Jan 17 18:56:32 +0000 2015', u'accusoftinfoway', 'RT #arnicas: Very useful - end to end deploying python flask on AWS RT #matt_healy: Great tutorial: https://t.co/RsiM09qJsJ #flask #python ', '0.595', '0.375')
(u'Sat Jan 17 18:56:36 +0000 2015', u'denisegregory10', "Oh you can't beat a good 'python' argument! http://t.co/ELo3GvNsuE via #youtube", '0.875', '0.6')
(u'Sat Jan 17 18:56:38 +0000 2015', u'NoSQLDigest', 'RT #KirkDBorne: R and #Python starter code for participating in #BoozAllen #DataScience Bowl: http://t.co/Q5C01eya95 #abdsc #DataSciBowl #B', '0.0', '0.0')
(u'Sat Jan 17 19:00:05 +0000 2015', u'RedditPython', '"academicmarkdown": a Python module for academic writing with Markdown. Haven\'t tried it o... https://t.co/uv8yFaz6cv http://t.co/EhiIIO7uTW', '0.0', '0.0')
(u'Sat Jan 17 19:00:28 +0000 2015', u'shopawol', 'Only 8.5 and 12 left make sure to get yours \nhttp://t.co/4rxmHqP2Qs\n#wdywt #goawol #sneakerheads http://t.co/wACIOdlGwY', '0.166666666667', '0.62962962963')
(u'Sat Jan 17 19:00:31 +0000 2015', u'AuthorBee', "RT #_kevin_ewb_: I know what your girl won't she just wanna kick it like the #WorldCup ", '0.0', '0.0')
(u'Sat Jan 17 19:00:37 +0000 2015', u'g33kmaddy', 'RT #KirkDBorne: R and #Python starter code for participating in #BoozAllen #DataScience Bowl: http://t.co/Q5C01eya95 #abdsc #DataSciBowl #B', '0.0', '0.0')
(u'Sat Jan 17 19:00:45 +0000 2015', u'Altfashion', 'Photo: A stunning photo of Kaoris latex dreams beautiful custom python bra. Photographer: MagicOwenTog... http://t.co/KdWnr3I8xP', '0.675', '1.0')
(u'Sat Jan 17 19:00:46 +0000 2015', u'oh226twt', 'Python programming: Easy and Step by step Guide for Beginners: Learn Python (English Edition) http://t.co/9optdOCrtE 1532', '0.216666666667', '0.416666666667')
(u'Sat Jan 17 19:00:50 +0000 2015', u'DvSpacefest', 'RT #Pomerantz: Potential team in the Learning XPRIZE looking for Python coders. Details: https://t.co/nGgrmYmXCa', '0.0', '1.0')
(u'Sat Jan 17 19:01:04 +0000 2015', u'cun45', 'SPORTS And More: #Cycling #Ciclismo U23 #Portugal #WorldCup team o... http://t.co/FBeqatfu85', '0.5', '0.5')
(u'Sat Jan 17 19:01:12 +0000 2015', u'insofferentexo', 'RT #FISskijumping: Dawid is already at the hill in Zakopane, in a larger than life format! #skijumping #worldcup http://t.co/SDOnxDwfIX', '0.0', '0.5')
(u'Sat Jan 17 19:01:17 +0000 2015', u'beuhe', 'Madrid Tawarkan Khedira ke Dortmund: Real Madrid dikabarkan telah menawarkan Sami Khedira ... http://t.co/R5YCKjECtm #football #worldcup', '0.2', '0.3')
(u'Sat Jan 17 19:01:18 +0000 2015', u'ITJobs_Karen', ' JOB ALERT #ITJob #Job #Paradise Valley - Python / Django Developer http://t.co/0Xn1k0cL5B view full details', '0.35', '0.55')
(u'Sat Jan 17 19:01:22 +0000 2015', u'DonnerBella', 'So confused about #meninist . Monty Python, is that you?', '-0.4', '0.7')
(u'Sat Jan 17 19:01:34 +0000 2015', u'DoggingTeens', '#Dogging,#OutdoorSex,#Sluts,#GangBang,#Stockings,#Uk_Sex: 13 Inch Black Python Being Sucked http://t.co/n9Yv4nhcxo', '-0.166666666667', '0.433333333333')
(u'Sat Jan 17 19:02:03 +0000 2015', u'WorldCupFNH', 'Soccer-La Liga results and standings: #FNH #WorldCup #Russia2018 #WC2018 http://t.co/3JOOnBQzvG', '0.0', '0.0')
(u'Sat Jan 17 19:02:03 +0000 2015', u'WorldCupFNH', 'Soccer-La Liga summaries: #FNH #WorldCup #Russia2018 #WC2018 http://t.co/AZgxr5Z9EV', '0.0', '0.0')
(u'Sat Jan 17 19:02:03 +0000 2015', u'WorldCupFNH', "Soccer-Late Congo goal spoils Equatorial Guinea's party: #FNH #WorldCup #Russia2018 #WC2018 http://t.co/W6Ff4HikxH", '0.0', '0.0')
(u'Sat Jan 17 19:02:04 +0000 2015', u'WorldCupFNH', 'Soccer-Ligue 1 top scorers: #FNH #WorldCup #Russia2018 #WC2018 http://t.co/WS2lcZnzKu', '0.5', '0.5')
(u'Sat Jan 17 19:02:04 +0000 2015', u'WorldCupFNH', 'Soccer-Pearce answers critics as Forest seal unlikely win: #FNH #WorldCup #Russia2018 #WC2018 http://t.co/Qb5PKuls6z', '0.15', '0.45')
(u'Sat Jan 17 19:02:04 +0000 2015', u'WorldCupFNH', 'Soccer-Israeli championship results and standings: #FNH #WorldCup #Russia2018 #WC2018 http://t.co/dce9Qn9oI5', '0.0', '0.0')
(u'Sat Jan 17 19:02:07 +0000 2015', u'Jeff88Ho', 'RT #artwisanggeni: #python jweede.recipe.template 1.2.3: Buildout recipe for making files out of Jinja2 templates http://t.co/dgeuuFWf19', '0.0', '0.0')
(u'Sat Jan 17 19:02:07 +0000 2015', u'Jeff88Ho', 'RT #artwisanggeni: #python aclhound 1.7.5: ACL Compiler http://t.co/fNOFSYd7FJ', '0.0', '0.0')
(u'Sat Jan 17 19:02:08 +0000 2015', u'Jeff88Ho', 'RT #artwisanggeni: #python Flask-Goat 0.2.0: Flask plugin for security and user administration via GitHub OAuth & organization http://t.co/', '0.0', '0.0')
(u'Sat Jan 17 19:02:08 +0000 2015', u'Jeff88Ho', 'RT #artwisanggeni: #python filewatch 0.0.6: Python File Watcher http://t.co/fIHLagCqvf', '0.0', '0.0')
(u'Sat Jan 17 19:02:16 +0000 2015', u'HeatherA789', "Programming Python: Start Learning Python Today, Even If You've Never Coded Before (A Beginner's Guide): http://t.co/3Ss4cwCvP6", '0.0', '0.0')
(u'Sat Jan 17 19:02:18 +0000 2015', u'HeatherA789', 'Python: Learn Python in One Day and Learn It Well. Python for Beginners with Hands-on Project.: Python: Learn http://t.co/zvLIpydd6V', '0.0', '0.0')
(u'Sat Jan 17 19:02:26 +0000 2015', u'AlexeiCherenkov', 'It looks like I should learn Python. Do you think I can do this during 3 hours tomorrow? Yes-Rt; No-Fav.', '0.0', '0.0')
(u'Sat Jan 17 19:02:33 +0000 2015', u'cleansheet', "#WorldCup Cricket World Cup: Australia should've picked a leg-spinner and named Steve Smith vice-captain ... http://t.co/kgXgUVbHDd", '0.0', '0.0')
(u'Sat Jan 17 19:02:34 +0000 2015', u'cleansheet', '#WorldCup Younger Northug earns 1st cross-country World Cup victory http://t.co/y7jozMriFG', '0.0', '0.0')
(u'Sat Jan 17 19:02:35 +0000 2015', u'cleansheet', '#WorldCup ICC World Cup 2015: School massacre survivors inspire Pakistan team http://t.co/Tj1jpCZsj6', '0.0', '0.0')
(u'Sat Jan 17 19:02:35 +0000 2015', u'cleansheet', '#WorldCup We Want to Win World Cup for Peshawar Schoolkids: Misbah-ul-Haq http://t.co/RbeBkrv69s', '0.8', '0.4')
(u'Sat Jan 17 19:02:38 +0000 2015', u'world_latest', 'New: Equatorial Guinea 1-1 Congo http://t.co/32sfrrbBOW #follow #worldcup world_latest world_latest', '0.136363636364', '0.454545454545')
(u'Sat Jan 17 19:02:39 +0000 2015', u'FAHAD_CTID', 'RT #fawadiii: #FAHAD_CTID #VeronaPerqukuu Hahaha. Hanw ;) bdw worldcup bhi hai 15 sy :D', '0.483333333333', '0.8')
(u'Sat Jan 17 19:02:43 +0000 2015', u'amazon_mybot', '#3: Python http://t.co/LLzeKQQBon', '0.0', '0.0')
(u'Sat Jan 17 19:02:45 +0000 2015', u'LarryMesast', '#javascript #html5 #UX #Python #agile #DDD', '0.5', '0.75')
(u'Sat Jan 17 19:02:46 +0000 2015', u'washim987', 'RT #anjali_damania: I was angry at #shaziailmi & #thekiranbedi My husband calms me down & says. Haame Worldcup jitna hai. Sirf Pakistan se ', '-0.327777777778', '0.644444444444')
(u'Sat Jan 17 19:03:02 +0000 2015', u'sksh_rana', '"#ManojHarry27: If India loses 2015 worldcup, Karishma\ntanna will be held responsible !!! #BB8"\n#TheFarahKhan #BeingSalmanKhan', '0.0453125', '0.325')
(u'Sat Jan 17 19:03:14 +0000 2015', u't_kohyama', '#_3mame PythonMatlabPython', '0.0', '0.0')
(u'Sat Jan 17 19:03:16 +0000 2015', u'AntonShipulin', '#photo #worldcup #flowerceremony #sprint #Ruhpolding http://t.co/fe9qpiwsqJ', '0.0', '0.0')
(u'Sat Jan 17 19:03:22 +0000 2015', u'karthik_vik', 'RT #ValaAfshar: Highest paying programming languages, ranked by salary:\n\n1 Ruby\n2 Objective C\n3 Python\n4 Java\n\nhttp://t.co/RudytdjFLC http:', '0.0', '0.1')
Right now I plot the data with the following script:
import matplotlib
matplotlib.use('Agg')
from matplotlib.mlab import csv2rec
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from pylab import *
from datetime import datetime
import dateutil
from dateutil import parser
import re
import os
import operator
import csv
input_filename="test_output.csv"
output_image_namep='polarity.png'
output_image_name2='subjectivity.png'
input_file = open(input_filename, 'r')
data = csv2rec(input_file, names=['time', 'name', 'message', 'polarity', 'subjectvity'])
time_list = []
polarity_list = []
''' I am aware there's a much more concise way of doing this'''
for line in data:
td = line['time']
''' stupid regex '''
s = re.sub('\(\u', '', td)
dtime = parser.parse(s)
dtime = re.sub('-', '', str(dtime))
dtime = re.sub(' ', '', dtime)
dtime = re.sub('\+00:00', '', dtime)
dtime = re.sub(':', '', dtime)
dtime = dtime[:-2]
try:
subjectivity = float(line['subjectivity'].replace("'", '').replace(")", ''))
except:
pass
print dtime, polarity
time_list.append( str(dtime) )
polarity_list.append( polarity )
rcParams['figure.figsize'] = 10, 4
rcParams['font.size'] = 8
fig = plt.figure()
plt.plot([time_list], [polarity_list], 'ro')
axes = plt.gca()
axes.set_ylim([-1,1])
plt.savefig(output_image_namep)
It ends up looking like:
Which is fine but I would like the X axis to display the date labels correctly. Right now I'm doing some ugly regex to strip the date down to YYYYMMDDHHMM.
What about this:
import time
def format_time_label(original):
return time.strftime('%Y%m%d%H%M',
time.strptime(original, "%a %b %d %H:%M:%S +0000 %Y"))
Example:
>>> format_time_label('Sat Jan 17 19:00:50 +0000 2015')
'201501171900'
This works only if every date in your data has timezone offset +0000, as there seems to be no code in Python standard library to recognize this.
You can change parsing format expression accordingly to account for leftovers from your data format:
def format_time_label(original):
return time.strftime('%Y%m%d%H%M',
time.strptime(original, "(u'%a %b %d %H:%M:%S +0000 %Y'"))
>>> format_time_label("(u'Sat Jan 17 18:56:05 +0000 2015'")
'201501171856'

Categories