I have the following Django code that is being run on PostgreSQL and Huey (an automatic scheduler). The problem is that, whenever the periodic task is run, Django removes the previous rows in a table instead of adding on to existent ones.
Scheduled code:
#periodic_task(crontab(minute='*/1'))
def scheduled():
team = nitrotype.Team('PR2W')
team_id = team.data["info"]["teamID"]
timestamp = datetime.datetime.now()
for members in team.data["members"]:
racer_id = members["userID"]
races = members["played"]
time = members["secs"]
typed = members["typed"]
errs = members["errs"]
rcd = RaceData(
racer_id=racer_id,
team_id=team_id,
timestamp=timestamp,
races=races,
time=time,
typed=typed,
errs=errs
)
rcd.save()
Basically, the above code is going to run every minute. Here's the database (PSQL) data that I started with:
nttracker=# TABLE data_racedata;
racer_id | team_id | timestamp | races | time | typed | errs
----------+---------+------------+--------+---------+----------+--------
35051013 | 765879 | 1623410530 | 4823 | 123226 | 793462 | 42975
35272676 | 765879 | 1623410530 | 8354 | 211400 | 1844434 | 38899
36690038 | 765879 | 1623410530 | 113 | 2849 | 16066 | 995
38486084 | 765879 | 1623410530 | 34448 | 903144 | 8043345 | 586297
38625235 | 765879 | 1623410530 | 108 | 2779 | 20919 | 1281
39018052 | 765879 | 1623410530 | 1908 | 48898 | 395187 | 24384
39114823 | 765879 | 1623410530 | 2441 | 64170 | 440503 | 32594
...
(50 rows)
Afterward, I run Huey, which executes scheduled() every one minute. Here's what I end up with after two minutes (in other words, two iterations):
nttracker=# TABLE data_racedata;
racer_id | team_id | timestamp | races | time | typed | errs
----------+---------+------------+--------+---------+----------+--------
35051013 | 765879 | 1623410992 | 4823 | 123226 | 793462 | 42975
35272676 | 765879 | 1623410992 | 8354 | 211400 | 1844434 | 38899
36690038 | 765879 | 1623410992 | 113 | 2849 | 16066 | 995
38486084 | 765879 | 1623410992 | 34448 | 903144 | 8043345 | 586297
38625235 | 765879 | 1623410992 | 108 | 2779 | 20919 | 1281
39018052 | 765879 | 1623410992 | 1908 | 48898 | 395187 | 24384
39114823 | 765879 | 1623410992 | 2441 | 64170 | 440503 | 32594
...
(50 rows)
Note: most of the data just happened to be the same, the timestamp's always different among data generated from automated executions.
I'd like 150 rows instead of 50 rows as I want the data to accumulate rather than replace previous ones. Can anyone please tell me where I've got wrong?
If anyone needs additional log outputs, please comment below.
EDIT Model
class RaceData(models.Model):
racer_id = models.IntegerField(primary_key=True)
team_id = models.IntegerField()
timestamp = UnixDateTimeField()
races = models.IntegerField()
time = models.IntegerField()
typed = models.IntegerField()
errs = models.IntegerField()
Thanks in advance.
Related
I have this massive dataset and I need to subset the data by using criteria. This is for illustration:
| Group | Name | Value |
|--------------------|-------------|-----------------|
| A | Bill| 256 |
| A | Jack| 268 |
| A | Melissa| 489 |
| B | Amanda | 787 |
| B | Eric| 485 |
| C | Matt| 1236 |
| C | Lisa| 1485 |
| D | Ben | 785 |
| D | Andrew| 985 |
| D | Cathy| 1025 |
| D | Suzanne| 1256 |
| D | Jim| 1520 |
I know how to handle this problem manually, such as:
import pandas as pd
df=pd.read_csv('Test.csv')
A=df[df.Group =="A "].to_numpy()
B=df[df.Group =="B "].to_numpy()
C=df[df.Group =="C "].to_numpy()
D=df[df.Group =="D "].to_numpy()
But considering the size of the data, it will take a lot of time if I handle it in this way.
With that in mind, I would like to know if it is possible to build an iteration with an IF statement that can look at the values in column “Group”(table above) . I was thinking, IF statement to see if the first value is the same with one the below if so, group them and create a new array/ dataframe.
My table in postgis is this with name samplecol:
vessel_hash | name_city | station | speed | latitude | longitude | course | heading | timestamp | the_geom
--------------+-----------+---------+-------+-------------+-------------+--------+---------+--------------------------+----------------------------------------------------
103079215239 |newyork | 841 | 5 | -5.41844510 | 36.12160900 | 314 | 511 | 2016-06-12T06:31:04.000Z | 0101000020E61000001BF33AE2900F424090AF4EDF7CAC15C0
103079215239 |washangton | 3008 | 0 | -5.41778710 | 36.12144900 | 117 | 511 | 2016-06-12T06:43:27.000Z | 0101000020E6100000E2900DA48B0F424042C3AC61D0AB15C0
103079215239 |paris | 841 | 17 | -5.42236900 | 36.12356900 | 259 | 511 | 2016-06-12T06:50:27.000Z | 0101000020E610000054E6E61BD10F42407C60C77F81B015C0
103079215239 |room | 841 | 17 | -5.41781710 | 36.12147900 | 230 | 511 | 2016-06-12T06:27:03.000Z | 0101000020E61000004D13B69F8C0F424097D6F03ED8AB15C0
103079215239 |pensilvenia| 841 | 61 | -5.42201900 | 36.13256100 | 157 | 511 | 2016-06-12T06:08:04.000Z | 0101000020E6100000CFDC43C2F71042409929ADBF25B015C0
103079215239 |jorjia | 841 | 9 | -5.41834020 | 36.12225000 | 359 | 511 | 2016-06-12T06:33:03.000Z | 0101000020E6100000CFF753E3A50F42408D68965F61AC15C0
Output query will must be like this:
name_city | latitude | longitude
-----------+-------------+----------
newyork |-5.41844510 | 36.12160900
pensilvenia|-5.42201900 | 36.13256100
jorjia |-5.41834020 | 36.12225000
My code is this:
poisInpolygon = """SELECT samplecol.latitude,samplecol.name_city samplecol.longitude
FROM samplecol
WHERE ST_Contains(samplecol.the_geom,('POLYGON((-15.0292969 47.6357836, -15.2050781 47.5172007,
-16.2597656 29.3821751, 35.0683594 26.1159859, 38.0566406 47.6357836, -15.0292969 47.6357836))'));"""
cursor.execute(poisInpolygon)
exists1 = cursor.fetchall()
count1 = 0;
for ex1 in exists1:
count1 = count1+1
print ex1,"\n"
print "points", count1
I try this code and query but return 0. What is problem? What is the correct query?
I believe it is the classic misplace of long,lat. Based on your data sample, if we create the_geom using longitude, latitude your points land somewhere in Tanzania and therefore far away from your polygon:
SELECT ST_SetSRID(ST_MakePoint(longitude,latitude),4326) FROM samplecol
UNION ALL SELECT 'SRID=4326;POLYGON((-15.0292969 47.6357836, -15.2050781 47.5172007,
-16.2597656 29.3821751, 35.0683594 26.1159859, 38.0566406 47.6357836, -15.0292969 47.6357836))'::geometry;
If we switch the order to latitude,longitude your points land close to Gibraltar and the geometries do overlap:
SELECT ST_SetSRID(ST_MakePoint(latitude,longitude),4326) FROM samplecol
UNION ALL SELECT 'SRID=4326;POLYGON((-15.0292969 47.6357836, -15.2050781 47.5172007,
-16.2597656 29.3821751, 35.0683594 26.1159859, 38.0566406 47.6357836, -15.0292969 47.6357836))'::geometry;
So my guess is that you stored longitude and latitude values in the wrong columns ;)
I have to compare 2 different sources and identify all the mismatches for all IDs
Source_excel table
+-----+-------------+------+----------+
| id | name | City | flag |
+-----+-------------+------+----------+
| 101 | Plate | NY | Ready |
| 102 | Back washer | NY | Sold |
| 103 | Ring | MC | Planning |
| 104 | Glass | NMC | Ready |
| 107 | Cover | PR | Ready |
+-----+-------------+------+----------+
Source_dw table
+-----+----------+------+----------+
| id | name | City | flag |
+-----+----------+------+----------+
| 101 | Plate | NY | Planning |
| 102 | Nut | TN | Expired |
| 103 | Ring | MC | Planning |
| 104 | Top Wire | NY | Ready |
| 105 | Bolt | MC | Expired |
+-----+----------+------+----------+
Expected result
+-----+-------------+----------+------------+----------+------------+---------+------------------+
| ID | excel_name | dw_name | excel_flag | dw_flag | excel_city | dw_city | RESULT |
+-----+-------------+----------+------------+----------+------------+---------+------------------+
| 101 | Plate | Plate | Ready | Planning | NY | NY | FLAG_MISMATCH |
| 102 | Back washer | Nut | Sold | Expired | NY | TN | NAME_MISMATCH |
| 102 | Back washer | Nut | Sold | Expired | NY | TN | FLAG_MISMATCH |
| 102 | Back washer | Nut | Sold | Expired | NY | TN | CITY_MISMATCH |
| 103 | Ring | Ring | Planning | Planning | MC | MC | ALL_MATCH |
| 104 | Glass | Top Wire | Ready | Ready | NMC | NY | NAME_MISMATCH |
| 104 | Glass | Top Wire | Ready | Ready | NMC | NY | CITY_MISMATCH |
| 107 | Cover | | Ready | | PR | | MISSING IN DW |
| 105 | | Bolt | | Expired | | MC | MISSING IN EXCEL |
+-----+-------------+----------+------------+----------+------------+---------+------------------+
I'm new to python and I have tried the below query but it not giving the expected result.
import pandas as pd
source_excel = pd.read_csv('C:/Mypython/Newyork/excel.csv',encoding = "ISO-8859-1")
source_dw = pd.read_csv('C:/Mypython/Newyork/dw.csv',encoding = "ISO-8859-1")
comparison_result = pd.merge(source_excel,source_dw,on='ID',how='outer',indicator=True)
comparison_result.loc[(comparison_result['_merge'] == 'both') & (name_x != name_y), 'Result'] = 'NAME_MISMATCH'
comparison_result.loc[(comparison_result['_merge'] == 'both') & (city_x != city_y), 'Result'] = 'CITY_MISMATCH'
comparison_result.loc[(comparison_result['_merge'] == 'both') & (flag_x != flag_y), 'Result'] = 'FLAG_MISMATCH'
comparison_result.loc[comparison_result['_merge'] == 'left_only', 'Result'] = 'Missing in dw'
comparison_result.loc[comparison_result['_merge'] == 'right_only', 'Result'] = 'Missing in excel'
comparison_result.loc[comparison_result['_merge'] == 'both', 'Result'] = 'ALL_Match'
csv_column = comparison_result[['ID','name_x','name_y','city_x','city_y','flag_x','flag_y','Result']]
print(csv_column)
Is there any other way I can check all the condition and report each in separate row. If separate row not possible, atleast i need in same column separated by all mismatches. something like FLAG_MISMATCH,CITY_MISMATCH
You could do:
df = pd.merge(Source_excel, Source_dw, on = 'ID', how = 'left', suffixes = (None, '_dw'))
This will create a new dataframe like the one you want, although you'll have to reorder the columns as you want. Note that the '_dw' is a suffix and not a prefix in this case.
You can reorder the columns as you like by using this code:
#Complement with the order you want
df = df[['ID', 'excel_name']]
For the result column I think you'll have to create a column for each condition you're trying to check (at least that's the way I know how to). Here's an example:
#This will return 1 if there's a match and 0 otherwise
df['result_flag'] = df.apply(lambda x: 1 if x.excel_flag == x.flag_dw else 0, axis = 1)
Here is a way to do the scoring:
df['result'] = 0
# repeated mask / df.loc statements suggests a loop, over a list of tuples
mask = df['excel_flag'] != df['df_flag']
df.loc[mask, 'result'] += 1
mask = df['excel_name'] != df['dw_name']
df.loc[mask, 'result'] += 10
df['result'] = df['result'].map({ 0: 'all match',
1: 'flag mismatch',
10: 'name mismatch',
11: 'all mismatch',})
I have 2 views as below:
experiments:
select * from experiments;
+--------+--------------------+-----------------+
| exp_id | exp_properties | value |
+--------+--------------------+-----------------+
| 1 | indicator:chemical | phenolphthalein |
| 1 | base | NaOH |
| 1 | acid | HCl |
| 1 | exp_type | titration |
| 1 | indicator:color | faint_pink |
+--------+--------------------+-----------------+
calculations:
select * from calculations;
+--------+------------------------+--------------+
| exp_id | exp_report | value |
+--------+------------------------+--------------+
| 1 | molarity:base | 0.500000000 |
| 1 | volume:acid:in_ML | 23.120000000 |
| 1 | volume:base:in_ML | 5.430000000 |
| 1 | moles:H | 0.012500000 |
| 1 | moles:OH | 0.012500000 |
| 1 | molarity:acid | 0.250000000 |
+--------+------------------------+--------------+
I managed to pivot each of these views individually as below:
experiments_pivot:
+-------+--------------------+------+------+-----------+----------------+
|exp_id | indicator:chemical | base | acid | exp_type | indicator:color|
+-------+--------------------+------+------+-----------+----------------+
| 1 | phenolphthalein | NaOH | HCl | titration | faint_pink |
+------+---------------------+------+------+-----------+----------------+
calculations_pivot:
+-------+---------------+---------------+--------------+-------------+------------------+-------------------+
|exp_id | molarity:base | molarity:acid | moles:H | moles:OH | volume:acid:in_ML| volume:base:in_ML |
+-------+---------------+---------------+--------------+-------------+------------------+-------------------+
| 1 | 0.500000000 | 0.250000000 | 0.012500000 | 0.012500000 | 23.120000000 | 5.430000000 |
+------+---------------------+------+------+-----------+----------------------------------------------------+
My question is how to get these two pivot results as a single row? Desired result is as below:
+-------+--------------------+------+------+-----------+----------------+--------------+---------------+--------------+-------------+------------------+------------------+
|exp_id | indicator:chemical | base | acid | exp_type | indicator:color|molarity:base | molarity:acid | moles:H | moles:OH | volume:acid:in_ML| volume:base:in_ML |
+-------+--------------------+------+------+-----------+----------------+--------------+---------------+--------------+-------------+------------------+------------------+
| 1 | phenolphthalein | NaOH | HCl | titration | faint_pink | 0.500000000 | 0.250000000 | 0.012500000 | 0.012500000 | 23.120000000 | 5.430000000 |
+------+---------------------+------+------+-----------+----------------+--------------+---------------+--------------+-------------+------------------+------------------+
Database Used: Mysql
Important Note: Each of these views can have increasing number of rows. Hence I considered "dynamic pivoting" for each of the view individually.
For reference -- Below is a prepared statement I used to pivot experiments in MySQL(and a similar statement to pivot the other view as well):
set #sql = Null;
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'MAX(IF(exp_properties = ''',
exp_properties,
''', value, NULL)) AS ',
concat("`",exp_properties, "`")
)
)into #sql
from experiments;
set #sql = concat(
'select exp_id, ',
#sql,
' from experiment group by exp_id'
);
prepare stmt from #sql;
execute stmt;
I'm running a bunch of queries using Python and psycopg2. I create one large temporary table w/ about 2 million rows, then I get 1000 rows at a time from it by using cur.fetchmany(1000) and run more extensive queries involving those rows. The extensive queries are self-sufficient, though - once they are done, I don't need their results anymore when I move on to the next 1000.
However, about 1000000 rows in, I got an exception from psycopg2:
psycopg2.OperationalError: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
Funnily enough, this happened when I was executing a query to drop some temporary tables that the more extensive queries created.
Why might this happen? Is there any way to avoid it? It was annoying that this happened halfway through, meaning I have to run it all again. What might max_locks_per_transaction have to do with anything?
NOTE: I'm not doing any .commit()s, but I'm deleting all the temporary tables I create, and I'm only touching the same 5 tables anyway for each "extensive" transaction, so I don't see how running out of table locks could be the problem...
when you create a table, you get an exclusive lock on it that lasts to the end of the transaction. Even if you then go ahead and drop it.
So if I start a tx and create a temp table:
steve#steve#[local] *=# create temp table foo(foo_id int);
CREATE TABLE
steve#steve#[local] *=# select * from pg_locks where pid = pg_backend_pid();
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted
---------------+----------+-----------+------+-------+------------+---------------+---------+-----------+----------+--------------------+-------+---------------------+---------
virtualxid | | | | | 2/105315 | | | | | 2/105315 | 19098 | ExclusiveLock | t
transactionid | | | | | | 291788 | | | | 2/105315 | 19098 | ExclusiveLock | t
relation | 17631 | 10985 | | | | | | | | 2/105315 | 19098 | AccessShareLock | t
relation | 17631 | 214780901 | | | | | | | | 2/105315 | 19098 | AccessExclusiveLock | t
object | 17631 | | | | | | 2615 | 124616403 | 0 | 2/105315 | 19098 | AccessExclusiveLock | t
object | 0 | | | | | | 1260 | 16384 | 0 | 2/105315 | 19098 | AccessShareLock | t
(6 rows)
These 'relation' locks aren't dropped when I drop the table:
steve#steve#[local] *=# drop table foo;
DROP TABLE
steve#steve#[local] *=# select * from pg_locks where pid = pg_backend_pid();
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted
---------------+----------+-----------+------+-------+------------+---------------+---------+-----------+----------+--------------------+-------+---------------------+---------
virtualxid | | | | | 2/105315 | | | | | 2/105315 | 19098 | ExclusiveLock | t
object | 17631 | | | | | | 1247 | 214780902 | 0 | 2/105315 | 19098 | AccessExclusiveLock | t
transactionid | | | | | | 291788 | | | | 2/105315 | 19098 | ExclusiveLock | t
relation | 17631 | 10985 | | | | | | | | 2/105315 | 19098 | AccessShareLock | t
relation | 17631 | 214780901 | | | | | | | | 2/105315 | 19098 | AccessExclusiveLock | t
object | 17631 | | | | | | 2615 | 124616403 | 0 | 2/105315 | 19098 | AccessExclusiveLock | t
object | 17631 | | | | | | 1247 | 214780903 | 0 | 2/105315 | 19098 | AccessExclusiveLock | t
object | 0 | | | | | | 1260 | 16384 | 0 | 2/105315 | 19098 | AccessShareLock | t
(8 rows)
In fact, it added two more locks... It seems if I continually create/drop that temp table, it adds 3 locks each time.
So I guess one answer is that you will need enough locks to cope with all these tables being added/dropped throughout the transaction. Alternatively, you could try to reuse the temp tables between queries, simply truncate them to remove all the temp data?
Did you create multiple savepoints with the same name without releasing them?
I followed these instructions, repeatedly executing
SAVEPOINT savepoint_name but without ever executing any corresponding RELEASE SAVEPOINT savepoint_name statements. PostgreSQL was just masking the old savepoints, never freeing them. It kept track of each until it ran out of memory for locks. I think my postgresql memory limits were much lower, it only took ~10,000 savepoints for me to hit max_locks_per_transaction.
Well, are you running the entire create + queries inside a single transaction? This would perhaps explain the issue. Just because it happened when you were dropping tables would not necessarily mean anything, that may just happen to be the point when it ran out of free locks.
Using a view might be an alternative to a temporary table and would definitely by my first pick if you're creating this thing and then immediately removing it.