How do I create new JSON data after every script run - python

I have JSON data stored in the variable data.
I want to make it write to a text file after every time it runs so I know which data json that is new instead of re-writting the same Json.
Currently, I am trying this:
Saving = firstname + ' ' + lastname+ ' - ' + email
with open('data.json', 'a') as f:
json.dump(Saving, f)
f.write("\n")
which just adds up to the json file and the beginning of the script where the first code starts, I clean it with
Infotext = "First name : Last name : Email"
with open('data.json', 'w') as f:
json.dump(Infotext, f)
f.write("\n")
How can I make instead of re-write the same Json, instead create new file with Infotext information and then add up with Saving?
Output in Json:
"First name : Last name : Email"
Hello World - helloworld#test.com
Hello2 World - helloworld2#test.com
Hello3 World - helloworld3#test.com
Hello4 World - helloworld4#test.com
Thats the outprint I wish to be. So basically it needs to start with
"First name : Last name : Email"
And then the Names, Lastname Email will add up below that until there is no names anymore.
So basically easy to say now - What I want is that instead of clearing and add to the same json file which is data.json, I want it to create to a newfile called data1.json - then if I rerun the program again tommorow etc - it gonna be data2.json and so on.

Just use a datetime in the file name, to create a unique file each time the code is run. In this case, granularity goes down to per-second so, if the code is run more than once per second, you will overwrite the existing contents of a file. In that case, step down to file names with microseconds in their name.
import datetime as dt
import json
time_script_run = dt.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')
with open('{}_data.json'.format(time_script_run), 'w') as outfile:
json.dump(Infotext, outfile)
This has multiple drawbacks:
You'll have an ever-growing number of files
Even if you load the file with the latest datetime in its name (and finding that file grows in run time), you can only see data as it was in the single time before the last run; the full history is very difficult to look up.
I think you're better using a light-weight database such as sqlite3:
import sqlite3
import random
import time
import datetime as dt
# Create DB
with sqlite3.connect('some_database.db') as conn:
c = conn.cursor()
# Just for this example, we'll clear the whole table to make it repeatable
try:
c.execute("DROP TABLE user_emails")
except sqlite3.OperationalError: # First time you run this code
pass
c.execute("""CREATE TABLE IF NOT EXISTS user_emails(
datetime TEXT,
first_name TEXT,
last_name TEXT,
email TEXT)
""")
# Now let's create some fake user behaviour
for x in range(5):
now = dt.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
c.execute("INSERT INTO user_emails VALUES (?, ?, ?, ?)",
(now, 'John', 'Smith', random.randint(0, 1000)))
time.sleep(1) # so we get new timestamps
# Later on, doing some work
with sqlite3.connect('some_database.db') as conn:
c = conn.cursor()
# Get whole user history
c.execute("""SELECT * FROM user_emails
WHERE first_name = ? AND last_name = ?
""", ('John', 'Smith'))
print("All data")
for row in c.fetchall():
print(row)
print('...............................................................')
# Or, let's get the last email address
print("Latest data")
c.execute("""
SELECT * FROM user_emails
WHERE first_name = ? AND last_name = ?
ORDER BY datetime DESC
LIMIT 1;
""", ('John', 'Smith'))
print(c.fetchall())
Note: the data retrieval runs really quickly in this code, it only takes ~5 secs to run because I use time.sleep(1) in generating the fake user data.

The JSON file should contain a list of strings. You should read the current contents of the file into a variable, append to the variable, then rewrite the file.
with open("data.json", "r") as f:
data = json.load(f)
data.append(firstname + ' ' + lastname+ ' - ' + email)
with open("data.json", "w") as f:
json.dump(data, f)

I think what you could do is to use seek() for files and write in the related position of the json file . for example you need to update firstname , you seek for the : after firstname , and update the text there.
There are examples here :
https://www.tutorialspoint.com/python/file_seek.htm

Related

Read lines from .txt file into sql query

I want to run a .txt file line for line through an SQL query. The .txt file consists of songtitles that may or may not exist in the database. If there is more than one option that fits the songtitle in the database a selection menu should appear. If there is only one option no further action is needed. If the line in the .txt file is not in the database a print statment shoud appear saying the song is not found.
To test this I made a .txt file with each of the three options described above:
Your (this gives 7 hits)
Bohemian (this gives 1 hit)
Thriller (this gives 0 hits)
I created the .txt file in another .py file, like this:
with open('MijnMuziek.txt', 'w') as f:
f.writelines("""
your
bohemian
thriller""")
f.close()
But if I run the code below in a separate .py file it only prints 'Choose from the following options: ' and than gives an error message saying index is out of range.
import sqlite3
music_database = sqlite3.connect("C:\\Users\marlo\Downloads\chinook_LOI.db")
cursor = music_database.cursor()
def read_file(filename):
with open(filename) as f:
for track in f:
cursor.execute(f"""SELECT DISTINCT t.TrackId, t.Name, art.Name
FROM tracks t
JOIN albums alb ON t.AlbumId = alb.AlbumId
JOIN artists art ON alb.ArtistId = art.ArtistId
WHERE t.Name LIKE '{track}%'""")
def selection_menu():
for position, song in enumerate(tracks_available):
print(str(position + 1), *song[1:3], sep='\t')
choice = int(input('Choose from the following options: '))
print('You chose:', *tracks_available[choice - 1], sep='\t')
read_file('MijnMuziek.txt')
tracks_available = cursor.fetchall()
selection_menu()
music_database.close()
When I put only one option in the .txt file (f.writelines('your')) the code does work and I get a selection menu.But with more than one line in the .txt file it does not work.
How do I solve this?
I don't have your database to test this, but this is
a way to do it.
It makes sense to open & close the database in the read function.
It also is a good idea to avoid global variable use and instead pass
them into functions.
I included protection against blank lines in your text file.
I didn't fix the SQL injection for you because I'd need to google
how it works with the LIKE % you use...
import sqlite3
DATABASE_FILE = r"C:\\Users\marlo\Downloads\chinook_LOI.db"
def read_tracks_from_file(filename, database_file):
music_database = sqlite3.connect(database_file)
cursor = music_database.cursor()
tracks_available = []
with open(filename) as f:
for track in f:
if track:
cursor.execute(f"""SELECT DISTINCT t.TrackId, t.Name, art.Name
FROM tracks t
JOIN albums alb ON t.AlbumId = alb.AlbumId
JOIN artists art ON alb.ArtistId = art.ArtistId
WHERE t.Name LIKE '{track}%'""")
for track in cursor.fetchall():
tracks_available.append(track)
music_database.close()
return tracks_available
def selection_menu(track_selection):
for position, song in enumerate(track_selection, start=1):
print(str(position), *song[1:3], sep='\t')
choice = int(input('Choose from the following options: '))
print('You chose:', *track_selection[choice - 1], sep='\t')
tracks_available = read_tracks_from_file(filename='MijnMuziek.txt',
database_file=DATABASE_FILE)
selection_menu(track_selection=tracks_available)

Python chunks write to excel

I am new to python and I m learning by doing.
At this moment, my code is running quite slow and it seems to take longer and longer by each time I run it.
The idea is to download an employee list as CSV, then to check the location of each Employee ID by running it trough a specific page then writing it to an excel file.
We have around 600 associates on site each day and I need to find their location and to keep refreshing it each 2-4 minutes.
EDIT:
For everyone to have a better understanding, I have a CSV file ( TOT.CSV ) that contains Employee ID's, Names and other information of the associates that I have on site.
In order to get their location, I need to run each employee ID from that CSV file trough https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?employeeId= 1 by 1 and at the same time to write it in another CSV file ( Location.csv ). Right now, it does in about 10 minutes and I want to understand if the way I did it is the best possible way, or if there is something else that I could try.
My code looks like this:
# GET EMPLOYEE ID FROM THE CSV
data = read_csv("Z:\\_Tracker\\Dump\\attendance\\TOT.csv")
# converting column data to list
TOT_employeeID = data['Employee ID'].tolist()
# Clean the Location Sheet
with open("Z:\\_Tracker\\Dump\\attendance\\Location.csv", "w") as f:
pass
print("Previous Location data cleared ... ")
# go through EACH employee ID to find out location
for x in TOT_employeeID:
driver.get(
"https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?employeeId=" + x)
print("Getting Location data for EmployeeID: " + x)
locData = driver.find_element(By.TAG_NAME, 'body').text
aaData = str(locData)
realLoc = aaData.split('"')
# write to excel
with open("Z:\\_Tracker\\Dump\\attendance\\Location.csv",
"a") as f:
writer = csv.writer(f)
writer.writerow(realLoc)
time.sleep(5)
print("Employee Location data downloaded...")
Is there a way I can do this faster?
Thank you in advance!
Regards,
Alex
Something like this.
import concurrent.futures
def process_data(data: pd.DataFrame) -> None:
associates = data['Employee ID'].unique()
with concurrent.futures.ProcessPoolExecutor() as executer:
executer.map(get_location, associates)
def get_location(associate: str) -> None:
driver.get(
"https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?"
f"employeeId={associate}")
print(f"Getting Location data for EmployeeID: {associate}")
realLoc = str(driver.find_element(By.TAG_NAME, 'body').text).split('"')
with open("Z:\\_Tracker\\Dump\\attendance\\Location.csv", "a") as f:
writer = csv.writer(f)
writer.writerow(realLoc)
if __name__ == "__main__":
data = read_csv("Z:\\_Tracker\\Dump\\attendance\\TOT.csv")
process_data(data)
You could try separating the step of reading the information and writing the information to your CSV file, like below:
# Get Employee Location Information
# Create list for employee information, to be used below
employee_Locations = []
for x in TOT_employeeID:
driver.get("https://guided-coaching-dub.corp.amazon.com/api/employee-location-svc/GetLastSeenLocationOfEmployee?employeeId=" + x)
print("Getting Location data for EmployeeID: " + x)
locData = driver.find_element(By.TAG_NAME, 'body').text
aaData = str(locData)
realLoc = [aaData.split('"')]
employee_Locations.extend(realLoc)
# Write to excel - Try this as a separate step
with open("Z:\\_Tracker\\Dump\\attendance\\Location.csv","a") as f:
writer = csv.writer(f, delimiter='\n')
writer.writerow(employee_Locations)
print("Employee Location data downloaded...")
You may see some performance gains by collecting all your information first, then writing to your CSV file

how to automatically create table based on CSV into postgres using python

I am a new Python programmer and trying to import a sample CSV file into my Postgres database using python script.
I have CSV file with name abstable1 it has 3 headers:
absid, name, number
I have many such files in a folder
I want to create a table into PostgreSQL with the same name as the CSV file for all.
Here is the code which I tried to just create a table for one file to test:
import psycopg2
import csv
import os
#filePath = 'c:\\Python27\\Scripts\\abstable1.csv'
conn = psycopg2.connect("host= hostnamexx dbname=dbnamexx user= usernamexx password= pwdxx")
print("Connecting to Database")
cur = conn.cursor()
#Uncomment to execute the code below to create a table
cur.execute("""CREATE TABLE abs.abstable1(
absid varchar(10) PRIMARY KEY,
name integer,
number integer
)
""")
#to copy the csv data into created table
with open('abstable1.csv', 'r') as f:
next(f)
cur.copy_from(f, 'abs.abstable1', sep=',')
conn.commit()
conn.close()
This is the error that I am getting:
File "c:\Python27\Scripts\testabs.py", line 26, in <module>
cur.copy_from(f, 'abs.abstable1', sep=',')
psycopg2.errors.QueryCanceled: COPY from stdin failed: error in .read() call: exceptions.ValueError Mixing iteration and read methods would lose data
CONTEXT: COPY abstable1, line 1
Any recommendation or alternate solution to resolve this issue is highly appreciated.
Here's what worked for me by: import glob
This code automatically reads all CSV files in a folder and Creates a table with Same name as of the file.
Although I'm still trying to figure out how to extract specific datatypes according to the data in CSV.
But as far as table creation is concerned, this works like a charm for all CSV files in a folder.
import csv
import psycopg2
import os
import glob
conn = psycopg2.connect("host= hostnamexx dbname=dbnamexx user= usernamexx password=
pwdxx")
print("Connecting to Database")
csvPath = "./TestDataLGA/"
# Loop through each CSV
for filename in glob.glob(csvPath+"*.csv"):
# Create a table name
tablename = filename.replace("./TestDataLGA\\", "").replace(".csv", "")
print tablename
# Open file
fileInput = open(filename, "r")
# Extract first line of file
firstLine = fileInput.readline().strip()
# Split columns into an array [...]
columns = firstLine.split(",")
# Build SQL code to drop table if exists and create table
sqlQueryCreate = 'DROP TABLE IF EXISTS '+ tablename + ";\n"
sqlQueryCreate += 'CREATE TABLE'+ tablename + "("
#some loop or function according to your requiremennt
# Define columns for table
for column in columns:
sqlQueryCreate += column + " VARCHAR(64),\n"
sqlQueryCreate = sqlQueryCreate[:-2]
sqlQueryCreate += ");"
cur = conn.cursor()
cur.execute(sqlQueryCreate)
conn.commit()
cur.close()
i tried your code and works fine
import psycopg2
conn = psycopg2.connect("host= 127.0.0.1 dbname=testdb user=postgres password=postgres")
print("Connecting to Database")
cur = conn.cursor()
'''cur.execute("""CREATE TABLE abstable1(
absid varchar(10) PRIMARY KEY,
name integer,
number integer
)
""")'''
with open('lolo.csv', 'r') as f:
next(f)
cur.copy_from(f, 'abstable1', sep=',', columns=('absid', 'name', 'number'))
conn.commit()
conn.close()
although i had to make some changes for it to work:
i had to name the table abstable1 because using abs.abstable1 postgres assumes that i'm using the schema abs, maybe you created that schema on your database if not check on that, also i'm using python 3.7
i noticed that you are using python 2.7(which i think is no longer supported), this may cause issues, since you say you are learning i would recommend that you use python 3 since it is more used now and you most likely encounter code written on it and you would have to be adapting your code to fit your python 2.7
I post my solution here based on #Rose answer.
I used sqlalchemy, a JSON file as config and glob.
import json
import glob
from sqlalchemy import create_engine, text
def create_tables_from_files(files_folder, engine, config):
try:
for filename in glob.glob(files_folder+"\*csv"):
tablename = filename.replace(files_folder, "").replace('\\', "").replace(".csv", "")
input_file = open(filename, "r")
columns = input_file.readline().strip().split(",")
create_query = 'DROP TABLE IF EXISTS ' + config["staging_schema"] + "." + tablename + "; \n"
create_query +='CREATE TABLE ' + config["staging_schema"] + "." + tablename + " ( "
for column in columns:
create_query += column + " VARCHAR, \n "
create_query = create_query[:-4]
create_query += ");"
engine.execute(text(create_query).execution_options(autocommit=True))
print(tablename + " table created")
except:
print("Error at uploading tables")

SQL query returns blank output when running inside Python script

I have a python script that is supposed to loop through a text file and gather the domain as an argument from each line in the text file. Then it is supposed to use the domain as an argument in a SQL query. The issue is when I'm passing in the domain_name as an argument the JSON output the script produces is blank. If I set the domain_name argument in my sql query directly inside the query then the script outputs perfect JSON format. As you can see in the top of my script right below def connect_to_db() I start to loop through the text file. I'm not sure where in my code the error is occurring by any assistance would be greatly appreciated!
Code
from __future__ import print_function
try:
import psycopg2
except ImportError:
raise ImportError('\n\033[33mpsycopg2 library missing. pip install psycopg2\033[1;m\n')
sys.exit(1)
import re
import sys
import json
import pprint
DB_HOST = 'crt.sh'
DB_NAME = 'certwatch'
DB_USER = 'guest'
def connect_to_db():
filepath = 'test.txt'
with open(filepath) as fp:
for cnt, domain_name in enumerate(fp):
print("Line {}: {}".format(cnt, domain_name))
print(domain_name)
domain_name = domain_name.rstrip()
conn = psycopg2.connect("dbname={0} user={1} host={2}".format(DB_NAME, DB_USER, DB_HOST))
cursor = conn.cursor()
cursor.execute(
"SELECT c.id, x509_commonName(c.certificate), x509_issuerName(c.certificate) FROM certificate c, certificate_identity ci WHERE c.id = ci.certificate_id AND ci.name_type = 'dNSName' AND lower(ci.name_value) = lower('%s') AND x509_notAfter(c.certificate) > statement_timestamp();".format(
domain_name))
unique_domains = cursor.fetchall()
# print out the records using pretty print
# note that the NAMES of the columns are not shown, instead just indexes.
# for most people this isn't very useful so we'll show you how to return
# columns as a dictionary (hash) in the next example.
pprint.pprint(unique_domains)
outfilepath = domain_name + ".json"
with open(outfilepath, 'a') as outfile:
outfile.write(json.dumps(unique_domains, sort_keys=True, indent=4))
if __name__ == "__main__":
connect_to_db()
Don't use format to create your SQL statement. Use ? placeholders and then a tuple of the values to insert:
c.execute('''SELECT c.id, x509_commonName(c.certificate),
x509_issuerName(c.certificate) FROM certificate c, certificate_identity ci WHERE
c.id= ci.certificate_id AND ci.name_type = 'dNSName' AND lower(ci.name_value) =
lower(?) AND x509_notAfter(c.certificate) > statement_timestamp()''',(domain_name,))
More generically:
c.execute('''SELECT columnX FROM tableA where columnY = ? AND columnZ =?'''
(desired_columnY_value,desired_columnZ_value))

How to add a header to an existing csv file?

I know this is a very basic question.
I have a CSV file, which contains data already. This file is generated automatically not using opening with Dictreader or open object.
Goal
I want to open an existing file
Append the Header in the first row (Shift the first row data)
Save the file
Return the file
Any clues?
cursor.execute(sql, params + (csv_path,))
This command generates file, without header.
Code
sql, params = queryset.query.sql_with_params()
sql += ''' INTO OUTFILE %s
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n' '''
csv_path = os.path.join(settings.MEDIA_ROOT + '\\tmp', csv_filename)
cursor = connection.cursor()
cursor.execute(sql, params + (csv_path,))
columns = [column[0] for column in cursor.description] #error
Tried
SELECT `website` UNION SELECT `request_system_potentialcustomers`.`website` FROM `request_system_potentialcustomers` ORDER BY `request_system_potentialcustomers`.`revenue` DESC
INTO OUTFILE "D:\\out.csv"
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n';
Wait a minute. If you have not yet called
cursor.execute(sql, params + (csv_path,))
then you have the opportunity to write the CSV file correctly from the get-go. You should not need to write a new file with the header line, then copy all that CSV into the new file and so forth. That is slow and inefficient -- and your only choice -- if you really have to prepend a line to an existing file.
If instead you have not yet written the CSV file, and if you know the header, then you can add it to the SQL using SELECT ... UNION ... SELECT:
header = ['foo', 'bar', 'baz', ]
query = ['SELECT {} UNION'.format(','.join([repr(h) for h in header]))]
sql, params = queryset.query.sql_with_params()
query.append(sql)
sql = '''INTO OUTFILE %s
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n' '''
query.append(sql)
sql = ' '.join(query)
csv_path = os.path.join(settings.MEDIA_ROOT + '\\tmp', csv_filename)
cursor = connection.cursor()
cursor.execute(sql, params + (csv_path,))
Demo:
mysql> SELECT "foo", "bar" UNION SELECT "baz", "quux" INTO OUTFILE "/tmp/out";
Produces the file /tmp/out containing
foo bar
baz quux
Cursor.description attribute give you information about the result column.
cursor.execute(sql, params + (csv_path,))
columns = [column[0] for column in cursor.description]
Write above information to the new file.
Append old csv contents to the new file.
Rename the new file with the old file name.
Not quite clear if you are trying to read an existing csv file or not, but to read a csv off disk without column names:
Use dictreader/dictwriter and specify the column names in your file
Python 3:
import csv
ordered_filenames = ['animal','height','weight']
with open('stuff.csv') as csvfile, open("result.csv","w",newline='') as result:
rdr = csv.DictReader(csvfile, fieldnames=ordered_filenames)
wtr = csv.DictWriter(result, ordered_filenames)
wtr.writeheader()
for line in rdr:
wtr.writerow(line)
With stuff.csv in the same directory:
elephant,1,200
cat,0.1,1
dog,0.2,2
and the output result file:
animal,height,weight
elephant,1,200
cat,0.1,1
dog,0.2,2

Categories