Hi I am trying to read data from MongoDB and trying to convert it into data frame but getting an error that says "("Could not convert ObjectId('620d3f43ae93743dbb6f6846') with type ObjectId: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column _id with type object')".
my code looks like this:
import streamlit as st
import pymongo
import streamlit as st
from pymongo import MongoClient
import pandas as pd
import json
import string
import io
import re
import time
import csv
from pandas import Timestamp
import certifi
import datetime
from pandas.io.json import json_normalize
client=pymongo.MongoClient()
connection = MongoClient("mongodb+srv://*****:****#cluster0.t4iwt.mongodb.net/testdb?retryWrites=true&w=majority",tlsCAFile=certifi.where())
db=connection["testdb"]
collection=db["test"]
cursor = collection.find()
entries=list(cursor)
entries[:]
df=pd.DataFrame(entries)
st.write(df)
Change type of _id to str.
df = pd.DataFrame(entries)
df = df.astype({"_id": str})
st.write(df)
Related
import streamlit as st
import pandas as pd
import numpy as np
import requests
import tweepy
import config
import psycopg2
import psycopg2.extras
import plotly.graph_objects as go
auth = tweepy.OAuthHandler(config.TWITTER_CONSUMER_KEY,
config.TWITTER_CONSUMER_SECRET)
auth.set_access_token(config.TWITTER_ACCESS_TOKEN,
config.TWITTER_ACCESS_TOKEN_SECRET)
api = tweepy.API(auth)
Problem with my code, it is not running with the twitter key. The module has no attributes
The config module that you are attempting to import and read off of is not what you want.
TWITTER_CONSUMER_KEY and TWITTER_CONSUMER_SECRET are not constants that you can get from a module. These are values that you must input yourself. There is perhaps a piece of code missing at the start of your application that looks like this:
config = {
'TWITTER_CONSUMER_KEY': 'ENTER YOUR TWITTER CONSUMER KEY',
'TWITTER_CONSUMER_SECRET': 'ENTER YOUR TWITTER CONSUMER SECRET'
}
Take a look at this article for more help. Goodluck!
I have a code below which connects to a MongoDB database and selects the specified JSON file, flattens it and exports it as a CSV.
So my problem is some of the JSON files in the MongoDB databases are absolutely huge with thousands of rows, so what I am trying to do is filter table down so that I only bring in data from the last 7 days.
from pymongo import MongoClient
import pandas as pd
import os, uuid, sys
import collections
from azure.storage.filedatalake import DataLakeServiceClient
from azure.core._match_conditions import MatchConditions
from azure.storage.filedatalake._models import ContentSettings
from pandas import json_normalize
mongo_client = MongoClient("connstring")
db = mongo_client.nhdb
table = db.Report
document = table.find()
mongo_docs = list(document)
mongo_docs = json_normalize(mongo_docs)
mongo_docs.to_csv("Report.csv", sep = ",", index=False)
Any help will be much appreciated.
Note: I know a way to do it in Azure Data Factory using the expression below, however, I am not sure how to go about it in Python
{"createdDatetime":{$gt: ISODate("#{adddays(utcnow(),-7)}")}}
In python create a datetime object to use as a filter; for example this shows the last 7 days:
from datetime import datetime, timedelta
document = table.find({'createdDatetime': {'$gt': datetime.utcnow() - timedelta(days=7)}})
I am trying to search trends on "coronavirus" word, and for the code:
import pandas as pd
from pytrends.request import TrendReq
pytrend = TrendReq()
pytrend.build_payload(kw_list=['sintomas covid'],timeframe='2020-02-26 today', geo='BR')
# Interest Over Time
interest_over_time_df = pytrend.interest_over_time()
I keep getting:
ResponseError: The request failed: Google returned a response with code 400.
Already tried this.
It is because of timeframe parameter, it should be one of the following format
Specific dates, 'YYYY-MM-DD YYYY-MM-DD' example '2016-12-14 2017-01-25'
Specific datetimes, 'YYYY-MM-DDTHH YYYY-MM-DDTHH' example
'2017-02-06T10 2017-02-12T07'
You can use this code, for your use case
import pandas as pd
from datetime import date
from pytrends.request import TrendReq
pytrend = TrendReq()
pytrend.build_payload(kw_list=['sintomas covid'],timeframe=f'2020-02-26 {date.today()}', geo='BR')
# Interest Over Time
pytrend.interest_over_time()
Result is my dataframe with values. I didnt get any error but i am not able to view the data in the db. Any help would be appreciated.
import warnings
import pandas as pd
import numpy as np
import datetime as dt
import sys
import MySQLdb as MySQLdb
import math
from pandas.io import sql
import mysql.connector
from sqlalchemy import create_engine
import pymysql as pymysql
engine = create_engine("mysql+mysqlconnector://user:pass#rds.amazonaws.com/testSchema"
.format(user="*******",
pw="*******",
db="testSchema"))
db = MySQLdb.connect(host="rds.amazonaws.com", # your host, usually localhost
user="user", # your username
password="pass", # your password
database="testSchema")
result.to_sql(name='bucketFillTest',con=engine,if_exists='replace',index=False)
db.close()
I am running this query in python. I do get the error "Invalid string literal" in the regular expression part. I know this regex is, not sure what syntax is missing here. Any help would be appreciated.
import pandas as pd
import numpy as np
import datetime
from time import gmtime, strftime
import smtplib
import sys
from IPython.core.display import HTML
PROJECT = 'server'
queryString = '''
SELECT
mn as mName,
dt as DateTime,
ip as LocalIPAddress,
REGEXP_EXTRACT(path, 'ActName:([\s\S\w\W]*?)ActDomain:') AS ActName,
FROM Agent.Logs
'''
You're missing an r
REGEXP_EXTRACT(path, r'ActName:([\s\S\w\W]*?)ActDomain:') AS ActName