I am loading the website data into Postgres using Python-Django, there are around 9 csv files that i am loading to database but its taking so long ~10 hrs to fetch the data from website, and load it to Postgres. I wanted to increase the performance of the query. Can you guys please help with that?
This is just 1 dataframe, but i have 9 more similar to that, overall data is less than 500k records
from django.db import models
from django_pandas.managers import DataFrameManager
import pandas as pd
from sqlalchemy import create_engine
import zipfile
import os
from urllib.request import urlopen
import urllib.request
import io
from io import BytesIO
class mf(models.Model):
pg_engine = create_engine('postgresql://user:password#server:host/db')
zf = zipfile.ZipFile(BytesIO(urllib.request.urlopen('http://md_file.zip').read()))
df1 = pd.read_csv(zf.open('nm.dat'),header=None,delimiter='|', index_col=0, names=['aaa', 'xxxx', 'yyy','zzz'])
df1.to_sql('nm',pg_engine,if_exists='replace')
Related
I try to download the database through Api.
I extract the name of data through csv file and I try to iterate the url through the list. but in last step: get.request I get the error and non of the input can not be recognized,
This is my code:
#!/usr/bin/env python
import requests
import json
import urllib
import os
from pandas import *
data = read_csv("A3D_Human_Database.csv")
job_iden = data['job_id'].tolist()
for d in job_iden:
p = d[0:14]
print(f"downloading {p}")
req = requests.get('http://biocomp.chem.uw.edu.pl/A3D2/RESTful/hproteome_job/{p}/')
print(req.status_code)
i was confused with my project use firebase as db to collect sensor data and then do lstm, im newbie in python and code. How to convert json file to array and then associated the value and the last one is save file as .csv. Thanks for help. Here is my code:
Data from db
Data from db
`
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import json
from google.colab import files
uploaded = files.upload()
data = next(iter(uploaded.values()))
type(data)
json.loads(data.decode("utf-8"))
df=json.dumps(data.decode("utf-8"))
df2 = json.loads(df)
`
I am trying to access gremlin via AWS Glue with PySpark as Runtime. As gremlinpython is external library, i had downloaded .whl file and placed in AWS S3. Now it was asking for "aenom" did the same. then isodate is required. So just wanted to know if there is anypackage which i can use instead of having separate modules.
Below is the sample script i am testing initially with all modules to keep it simple.
import boto3
import os
import sys
import site
import json
import pandas as pd
#from setuptools.command import easy_install
from importlib import reload
from io import StringIO
s3 = boto3.client('s3')
#dir_path = os.path.dirname(os.path.realpath(__file__))
#os.path.dirname(sys.modules['__main__'].__file__)
#install_path = os.environ['GLUE_INSTALLATION']
#easy_install.main( ["--install-dir", install_path, "gremlinpython"] )
#(site)
from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.process.traversal import T, Column
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
required libraries are below, after that there is no error related to modules.
tornado-6.0.4-cp35-cp35m-win32.whl
isodate-0.6.0-py2.py3-none-any.whl
aenum-2.2.4-py3-none-any.whl
gremlinpython-3.4.8-py2.py3-none-any.whl
I am running this code to import multiple file into a table in mysql, and the return is Engine object has no attribute cursor, I find many similiar topic, the answer is about pandas version, but I use pandas 0.19 so it might not be the reason. Could anyone help? Or are there any other way to import multiple text file into mysql
import MySQLdb
import os
import glob
import pandas
import datetime
import MySQLdb as mdb
import requests
from sqlalchemy import create_engine
engine =create_engine("mysql://vn_user:thientai3004#127.0.0.1/vietnam_stock?charset=utf8")
indir='E:\DataExport'
os.chdir(indir)
fileList=glob.glob('*.txt')
dfList = []
colnames= ['Ticker','Date','Open','High','Low','Close','Volume']
for filename in fileList:
print(filename)
df = pandas.read_csv(filename,header=0)
df.to_sql('daily_price',engine, if_exists='append', index=False)
I'm trying to load a dataset with breaks in it. I am trying to find an intelligent way to make this work. I got started on it with the code i included.
As you can see, the data within the file posted on the public FTP site starts at line 11, ends at line 23818, then starts at again at 23823, and ends at 45,630.
import pandas as pd
import numpy as np
from io import BytesIO
from zipfile import ZipFile
from urllib.request import urlopen
url = urlopen("http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/10_Portfolios_Prior_12_2_Daily_CSV.zip")
#Download Zipfile and create pandas DataFrame
zipfile = ZipFile(BytesIO(url.read()))
df = pd.read_csv(zipfile.open('10_Portfolios_Prior_12_2_Daily.CSV'), header = 0,
names = ['asof_dt','1','2','3','4','5','6','7','8','9','10'], skiprows=10).dropna()
df['asof_dt'] = pd.to_datetime(df['asof_dt'], format = "%Y%m%d")
I would ideally like the first set to have a version number "1", the second to have "2", etc.
Any help would be greatly appreciated. Thank you.