I'm new to python, zapier and pretty much everything, so forgive me if this is easy or impossible...
I'm trying to import multiple csv's into zapier for an automated workflow, however they contain dot points that aren't formatted using UTF-8, which is all zapier can read.
It consistently errors -
"utf-8' codec can't decode byte 0x95 in position 829: invalid start byte"
After talking to zapier support, they've suggested using python to possibly find an replace these dot points with asterisk or dash, then import this corrected csv into my zapier workflow.
This is what i have written so far as a Python action in Zapier (just trying to read the csv to start with) with no luck:
import csv
with open(input_data['file'], 'r') as file:
reader = csv.reader(file)
for row in reader:
print(row)
Is this possible?
Thanks!
Zapier trying to import CSV with bullet points
My current python code (not working) in an attempt to find & replace bullet points in the CSV's
This is possible, but it's a little tricky. Zapier is confusing when it comes to files. On your computer, files are a series of bytes. But in Zapier, a file is usually a url that points to the actual file. This is great for cross-app compatibility, but tricky to work with in code.
You're trying to open to open a url as a file in Python, which isn't working. Instead, make a request for that file, then read it as a series of bytes. Try this:
import csv
import io
file_data = requests.get(input_data['file'])
reader = csv.reader(file_data.content.decode('utf-8').splitlines(), delimiter=',')
result = io.StringIO() # a string interface to write
writer = csv.writer(result)
for row in reader:
# some modifications here
# row = row.replace(...)
writer.writerow(row)
return [{'data': result.getvalue()}]
The result there is because you want to write out a string that you can then re-package as a CSV in your virtual filesystem of choice (gDrive, Dropbox, etc).
You can also test this locally instead of in the Zapier editor (I find that's a bit easier to iterate with). Simply get the file url from the code step (it'll be something like https://zapier.com/engine/... and make a local python file with:
input_data = {'file': 'https://zapier.com/engine/...'}
...
You'll also need to pip install requests if you don't have it.
Related
I don't need the entire code but I want a push to help me on the way, I've been searching on the internet for clues on how to start to write a function like this but I haven't gotten any further then just the name of the function.
So I haven't got the slightest clue on how to start with this, I don't know how to work with text files. Any tips?
These text files are CSV (Comma Separated Values). It is a simple file format used to store tabular data.
You may explore Python's inbuilt module called csv.
Following code snippet an example to load .csv file in Python:
import csv
filename = 'us_population.csv'
with open(filename, 'r') as csvfile:
csvreader = csv.reader(csvfile)
My goal is to convert a 6000-record CSV file into an array, clean and normalize it, so that I can read it into a corpus.Dictionary() to use in doc2bow in Gensim to perform a SparseMatrixSimiliarity query. I was successful in reading in the CSV file at first, and it printed out an array I call "definitions" for each one of the 6000 sub-category record numbers.
f = open('test.csv')
csv_f = csv.reader(f)
definitions = []
for row in csv_f:
definitions.append(row[2])
print(definitions)
But then hit a wall with UTF-8 and ASCII errors. Gensim has "strict" UTF-8 settings.
After several hours spent on Stack Overflow, researching, and trying to apply a few "UTF-8" encoders per the Python CSV documentation, I read that since Python 2.7 doesn't have "out of the box" unicode-encoding using the import csv package, that I could use the codecs package.
I figured that instead of finding every line in my original "definitions" 6000-line array and decoding, that I could take an initial stab at decoding it right off the bat using codecs. However, the below code fails to write anything to my definitions array. Being a newbie, I imagine that I may be using codecs the wrong way, and/or closing the wrong way.
with codecs.open('test.csv', 'rb', encoding='utf-8') as f:
csv_f = csv.reader(f)
definitions= []
for row in csv_f:
definitions.append(np.array((array.float(i) for i in l)))
f.close()
print(definitions)
I am a total newbie, apologies for any errors in my description. Learning as I go, really appreciate any feedback and help. Perhaps I'm going about this the wrong way, and welcome any education. Thank you again.
First question here so forgive any lapses in the etiquette.
I'm new to python. I have a small project I'm trying to accomplish both for practical reasons and as a learning experience and maybe some people here can help me out. There's a proprietary system I regularly retrieve data from. Unfortunately they don't use standard CSV format. They use a strange character to separate data, its a ‡. I need it in CSV format in order to import it into another system. So what I need to do is take the data and replace the special character (with a comma) and format the data by removing whitespaces among other minor things like unrecognized characters etc...so it's the way I need it in CSV to import it.
I want to learn some python so I figured I'd write it in python. I'll be reading it from a webservice URL, but for now I just have some test data in the same format I'd receive.
In reality it will be tons of data per request but I can scale it when I understand how to retrieve and manipulate the data properly.
My code so far just trying to read and write two columns from the data:
import requests
import csv
r = requests.get ('https://www.dropbox.com/s/7uhheam5lqppzis/singlelineTest.csv?dl=0')
data = r.text
with open("testData.csv", "wb") as csvfile:
f = csv.writer(csvfile)
f.writerow(["PlayerID", "Partner"]) # add headers
for elem in data:
f.writerow([elem["PlayerID"], elem["Partner"]])
I'm getting this error.
File "csvTest.py", line 14, in
f.writerow([elem["PlayerID"], elem["Partner"]])
TypeError: string indices must be integers
It's probably evident by that, that I don't know how to manipulate the data much nor read it properly. I was able to pull back some JSON data and output it so i know the structure works at core with standardized data.
Thanks in advance for any tips.
I'll continue to poke at it.
Sample data is at the dropbox link mentioned in the script.
https://www.dropbox.com/s/7uhheam5lqppzis/singlelineTest.csv?dl=0
There are multiple problems. First, the link is incorrect, since it returns the html. To get the raw file, use:
r = requests.get ('https://www.dropbox.com/s/7uhheam5lqppzis/singlelineTest.csv?dl=1')
Then, data is a string, so elem in data will iterate over all the characters of the string, which is not what you want.
Then, your data are unicode, not string. So you need to decode them first.
Here is your program, with some changes:
import requests
import csv
r = requests.get ('https://www.dropbox.com/s/7uhheam5lqppzis/singlelineTest.csv?dl=1')
data = str(r.text.encode('utf-8').replace("\xc2\x87", ",")).splitlines()
headers = data.pop(0).split(",")
pidx = headers.index('PlayerID')
partidx = headers.index('Partner')
with open("testData.csv", "wb") as csvfile:
f = csv.writer(csvfile)
f.writerow(["PlayerID", "Partner"]) # add headers
for data in data[1:]:
words = data.split(',')
f.writerow([words[pidx], words[partidx]])
Output:
PlayerID,Partner
1038005,EXT
254034,EXT
Use split:
lines = data.split('\n') # split your data to lines
headers = lines[0].split('‡')
player_index = headers.index('PlayerID')
partner_index = headers.index('Partner')
for line in lines[1:]: # skip the headers line
words = line.split('‡') # split each line by the delimiter '‡'
print words[player_index], words[partner_index]
For this to work, define the encoding of your python source code as UTF-8 by adding this line to the top of your file:
# -*- coding: utf-8 -*-
Read more about it in PEP 0263.
I am working on side stuff where the data provided is in a .data file. How do I open a .data file to see what the data looks like and also how do I read from a .data file programmatically through python? I have Mac OSX
NOTE: The Data I am working with is for one of the KDD cup challenges
Kindly try using Notepad or Gedit to check delimiters in the file (.data files are text files too). After you have confirmed this, then you can use the read_csv method in the Pandas library in python.
import pandas as pd
file_path = "~/AI/datasets/wine/wine.data"
# above .data file is comma delimited
wine_data = pd.read_csv(file_path, delimiter=",")
It vastly depends on what is in it. It could be a binary file or it could be a text file.
If it is a text file then you can open it in the same way you open any file (f=open(filename,"r"))
If it is a binary file you can just add a "b" to the open command (open(filename,"rb")). There is an example here:
Reading binary file in Python and looping over each byte
Depending on the type of data in there, you might want to try passing it through a csv reader (csv python module) or an xml parsing library (an example of which is lxml)
After further into from above and looking at the page the format is:
Data Format
The datasets use a format similar as that of the text export format from relational databases:
One header lines with the variables names
One line per instance
Separator tabulation between the values
There are missing values (consecutive tabulations)
Therefore see this answer:
parsing a tab-separated file in Python
I would advise trying to process one line at a time rather than loading the whole file, but if you have the ram why not...
I suspect it doesnt open in sublime because the file is huge, but that is just a guess.
To get a quick overview of what the file may content you could do this within a terminal, using strings or cat, for example:
$ strings file.data
or
$ cat -v file.data
In case you forget to pass the -v option to cat and if is a binary file you could mess your terminal and therefore need to reset it:
$ reset
I was just dealing with this issue myself so I thought I would share my answer. I have a .data file and was unable to open it by simply right clicking it. MACOS recommended I open it using Xcode so I tried it but it did not work.
Next I tried open it using a program named "Brackets". It is a text editing program primarily used for HTML and CSS. Brackets did work.
I also tried PyCharm as I am a Python Programmer. Pycharm worked as well and I was also able to read from the file using the following lines of code:
inf = open("processed-1.cleveland.data", "r")
lines = inf.readlines()
for line in lines:
print(line, end="")
It works for me.
import pandas as pd
# define your file path here
your_data = pd.read_csv(file_path, sep=',')
your_data.head()
I mean that just take it as a csv file if it is seprated with ','.
solution from #mustious.
I am using urllib.urlopen with Python 2.7 to read csv files located on an external webserver:
# Try & Except statements removed for clarity
import urllib
import csv
url = ...
csv_file = urllib.urlopen(url)
for row in csv.reader(csv_file):
do_something()
All 100+ files can be read fine, except one that has been updated recently and that returns:
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
The file is accessible here. According to my text editor, its mode is Mac (CR), as opposed to Windows (CRLF) for the other files.
I found that based on this thread, python urlopen will handle correctly all formats of newlines. Therefore, the problem is likely to come from somewhere else. I have no clue though. The file opens fine with all my text editors and my speadsheet editors.
Does any one have any idea how to diagnose the problem ?
* EDIT *
The creator of the file informed me by email that I was not the only one to experience such issues. Therefore, he decided to make it again. The code above now works fine again. Unfortunately, using a new file also means that the issue can no longer be reproduced, and the solutions tested properly.
Before closing the question, I want to thank all the stackers who dedicated some of their time to figure out a solution and post it here.
It might be a corrupt .csv file? Otherwise, this code runs perfectly.
#!/usr/bin/python
import urllib
import csv
url = "http://www.football-data.co.uk/mmz4281/1213/I1.csv"
csv_file = urllib.urlopen(url)
for row in csv.reader(csv_file):
print row
Credits to J.F. Sebastian for the .csv file.
Altough, you might want to consider sharing the specific .csv file with us? So we can try to re-create the error.
The following code runs without any error:
#!/usr/bin/env python
import csv
import urllib2
r = urllib2.urlopen('http://www.football-data.co.uk/mmz4281/1213/I1.csv')
for row in csv.reader(r):
print row
I was having the same problem with a downloaded csv.
I know the fix would be to use open with 'rU'. But I would rather not have to save the file to disk, just to open back up into a variable. That seems unnecessary.
file = open(filepath,'rU')
mydata = csv.reader(file)
So if someone has a better solution that would be nice. Stackoverflow links that got me this far:
CSV new-line character seen in unquoted field error
Open the file in universal-newline mode using the CSV Django module
I found what I actually wanted with stringIO, or cStringIO, or io:
Using Python, how do I to read/write data in memory like I would with a file?
I ended up getting io working,
import csv
import urllib2
import io
# warning its a 20MB csv
url = 'http://poweredgec.com/latest_poweredge-11g.csv'
urlRead = urllib2.urlopen(url).read()
ramFile = io.open(urlRead, mode='w')
openRamFile = open(ramFile, 'rU')
csvCurrent = csv.reader(openRamFile)
csvTuple = map(tuple, csvCurrent)
print csvTuple