really new to python, was attempting to download a CSV through FTP.
I've made the connection go to the right folder, but I want to also print the tables as well.
import pandas as pd
from ftplib import FTP
ftp = FTP('f20-preview.xxx.com')
ftp.login(user='xxx_test', passwd = 'xxxxxxx')
ftp.cwd('/testfolder/')
def grabFile():
filename = 'MOCK_DATA.csv'
localfile = open(filename, 'wb')
ftp.retrbinary('RETR ' + filename, localfile.write, 1024)
data = pd.read_csv(filename)
data.head()
This causes a nameError, filename is not defined? Im ight be confusing myself so clarification would help.
In your code you are defining a function, never call it and afterwards you are expecting to find a variable defined inside that function.
One way to fix things would be to eliminate the line with def completely.
A possibly better solution would be something like this
import pandas as pd
from ftplib import FTP
# reusable method to retrieve a file
def grabFile(ftp_obj, filename):
localfile = open(filename, 'wb')
ftp.retrbinary('RETR ' + filename, localfile.write, 1024)
# connect to the ftp server
ftp = FTP('f20-preview.xxx.com')
ftp.login(user='xxx_test', passwd = 'xxxxxxx')
ftp.cwd('/testfolder/')
# then get files and work them
# having a "target file"
filename = 'MOCK_DATA.csv'
# grab the file
grabFile(ftp, filename)
# work the file
data = pd.read_csv(filename)
data.head()
# now you could still use the same connected ftp object and grab another file, and so on
You did not call your "grabfile" function. But it appears the other answers helped alleviate that issue, so I will merely share some quality-of-life code for working with data sets
I often store my data files in a separate folder from the python code, so this can help you keep things straight and organized if you'd prefer to have the input data in another folder.
import os
import pandas as pd
original_dir = os.getcwd()
os.chdir('/home/user/RestOfPath/')
data = pd.read_csv('Filename')
os.chdir(original_dir)
data.head()
Could you possibly use the absolute/full path instead of just the name for the CSV file? My guess is that it's looking in the wrong folder.
The working directory of your python script and the location which the CSV are stored need to be the same given the function you provided.
However, you do not call the function.
If you call the function and get the same error then it is likely that MOCK_DATA.csv is not in the location /testfolder/MOCK_DATA.csv you will run into issues.
The way to access this would be to delete def grabFile.
Related
I am getting a "[Errno 2] No such file or directory" error when trying to download files from an FTP server to a Pandas dataframe. The files are in the root directory of the FTP.
I am guessing that the pd.read_csv() function is looking at my local file system... i.e. at the local path were the script resides.... But I do not understand how to change this.
def fetch_files(site, username, password, directory: str = '/', filematch: str = '*.csv'):
with ftplib.FTP(site) as ftp:
# pass the url without protocol
ftp = ftplib.FTP(site)
# pass credentials if anonymous access is not allowed
ftp.login(username, password)
ftp.cwd(directory)
list_ = []
for file_ in ftp.nlst(filematch):
print(file_) # This works
df = pd.read_csv(file_, index_col=None, header=0) # This fails
list_.append(df)
Or would I have to use the ftp.retrlines() method? If so what is the difference between the LIST and MLSD parameter?
On a side note: The files in the CSVs have HTML code in them like & which screws out the SQL bulk insert. That's the reason I am reading them to a dataframe is to change the encoding and merge the individual files. Is there a faster way to do this directly via the Python csv module? I guess this would be faster?
Thank you in advance
Use FTP.retrbinary and BytesIO to download the file to memory and then pass the in-memory file-like object to read_csv:
flo = BytesIO()
ftp.retrbinary('RETR ' + file_, flo.write)
flo.seek(0)
pd.read_csv(flo, ...)
Similar question: Reading files from FTP server to DataFrame in Python
The above loads whole CSV file to a memory and only then it will parse it. If you want to prse the file as it downloads, it that would probably require implementing a smart custom file-like object. What is not easy.
For a question that does something similar, see my answer to:
Get files names inside a zip file on FTP server without downloading whole archive.
As the title kind of asks; it's a relatively straight forward one but I couldn't find anything with my google-fu or on SO already, with the way I'm wording it anyways.
The long and the short is; I have a few different csv files in a Network Drive for work that I want to either:
Read data from to put into a dataframe
Copy file to a temp location
Every time I run my script, if another user on the network has the file open, I'll receive a permission error as it's "In use" by another user.
Is there anyway around this at all? the people in the file 99.9% of the are purely reading the file and not altering the data, but I can't seem to find a way around just a pure "Read" or "Copy as is" solution.
Any insight or advice would be appreciated, as I'm sure I'm not the only one.
Basic copy of my code as below:
import os
import shutil
import pandas as pd
#Destination
dst = '//network/drive_a/destination/'
#Source Locations
src = r'//network/drive_b/*.xlsx'
# Get latest file in source
list_of_files = glob.glob(src)
new_file = max(list_of_files, key=os.path.getctime)
#read into dataframe
df = pd.read_excel(new_file)
#add status flag in dataframe
df.loc[:,'status'] = 'Exception'
df = df.reset_index(drop=True)
Or if I was copying the file, instead of the dataframe it would be:
#Find Latest file and copy
list_of_files = glob.glob(src)
new_file = max(list_of_files, key=os.path.getctime)
copy_file = shutil.copy2(new_file, dst)```
The problem is: I have to create a file in an automatically way into some folder which have been automatically created before.
Let me explain better. First I post the code used to create the folder...
import os
from datetime import datetime
timestr = datetime.now().strftime("%Y%m%d-%H%M%S-%f")
now = datetime.now()
newDirName = now.strftime("%Y%m%d-%H%M%S-%f")
folder = os.mkdir('C:\\Users\\User\\Desktop\\' + newDirName)
This code will create a folder on Desktop with timestamp (microseconds included to make it as unique as possible..) as name.
Now I would like to create also a file (for example a txt) inside the folder. I already have the code to do it...
file = open('B' + timestr, 'w')
file.write('Exact time is: ' + timestr)
file.close()
How can I "combine" this together ? First create the folder and, near immediately, the file inside it?
Thank you. If it's still not clear, feel free to ask.
Yes, just create a directory and then immediately a file inside it. All I/O operations in Python are synchronous by default so you won't get any race conditions.
Resulting code will be (also made some improvings to your code):
import os
from datetime import datetime
timestr = datetime.now().strftime("%Y%m%d-%H%M%S-%f")
dir_path = os.path.join('C:\\Users\\User\\Desktop', timestr)
folder = os.mkdir(dir_path)
with open(os.path.join(dir_path, 'B' + timestr), 'w') as file:
file.write('Exact time is: ' + timestr)
You can also make your code (almost) cross-platform by replacing hard-coded desktop directory path with
os.path.join(os.path.expanduser('~'), 'Desktop')
I just created a simple python script that goes through a folder with polyline shapefiles and merges them. Through the Windows 8 Task Scheduler I have scheduled the script to run when I want.
All I would like to do now is modify my script so I can slightly change the name of each shapefile output. For example, the script name for Week 1 would be MergedTracks_1, for Week 2 would be MergedTracks_2, for Week 3 would be MergedTracks_3, etc..
Does anybody have an idea on how to do this with the current script I have? I am running ArcGIS 10.2. I would appreciate any insight if possible. Below is the script I am currently using in PythonWin. Thanks so much in advance!!!
import arcpy, os
outPut = r"C:\Users\student2\Desktop\WeedTracksMergeScript\Output" # Output
arcpy.env.workspace = r"C:\Users\student2\Desktop\WeedTracksMergeScript"
shplist = arcpy.ListFeatureClasses('*.shp')
print shplist # prints polyline .shp list in current arcpy.env.workspace
arcpy.Merge_management(shplist, os.path.join(outPut, "MergedTracks_1.shp"))
print "Done"
You can use pickle to keep track of the last filename that was written, and then use that as part of your logic to determine what the next filename format should be.
Check out the tutorial here:
https://wiki.python.org/moin/UsingPickle
The main idea would be to do something like this:
Here is some pseudocode:
load pickle file
if pickle file exists
load pickle file
use logic to increment new filename
else:
use default file name
do logic for your work and write with set file name
write to pickle file for the new file name
Here is a quick example on how to use pickle:
import pickle
from os.path import exists
pickle_file = "current_file_name.pickle"
if exists(pickle_file):
with open(pickle_file, "rb") as pkr:
current_filename = pickle.load(pkr)
else:
current_filename = "current_file_1"
with open(pickle_file, "wb") as pkw:
pickle.dump(current_filename, pkw)
I think datetime stamps (strings) would be a lot less complicated and arguably more useful.
Many formatting options are documented at http://strftime.org
import datetime
dd = datetime.datetime.now().strftime("%Y%m%d")
shpfile = os.path.join(out_path), "MergedTracks_{}.shp".format(dd))
arcpy.Merge_management(shplist, shpfile)
I'm putting together my first web2py app, and I've run into a bit of a problem. I have some data stored in static/mydata.json that I'd like to access in a couple places (specifically, in one of my models, as well as a module).
If this were a normal python script, obviously I'd do something like:
import json
with open('/path/to/mydata.json') as f:
mydata = json.load(f)
In the context of web2py, I can get the url of the file from URL('static', 'mydata.json'), but I'm not sure how to load mydata - can I just do mydata = json.load(URL('static','mydata.json')? Or is there another step required to open the file?
It's advisable to use os.path.join with request.folder to build paths to files.
import os
filepath = os.path.join(request.folder,'static','mydata.json')
From that point on, you should be able to use that filepath to open the json file as per usual.
import os
filepath = os.path.join(request.folder,'static','mydata.json')
mydata = json.load(filepath)