Pandas: No columns to parse from file - python

Good afternoon
I have looked through several of the solutions linked to this problem and nothing has been able to help me. I do not understand if it is an error with the actual csv file or an error within the code itself. Below is my code:
import pandas as pd
from itertools import islice
import csv
from cStringIO import StringIO
sio = StringIO()
def forex_file():
with open("USD-ZAR.csv", "r+") as exchange_file:
for row in islice(csv.reader(exchange_file), 3, 256, None):
sio.write(row)
sio.seek(0)
df1 = pd.read_csv(sio, sep=",", encoding="utf-8", delim_whitespace=True)
I purposely placed the "delim_whitespace=True" part within the code as this has been the common suggestion in many of the other posts but is has done nothing in this case as my csv file is split by normal commas and not by white space or tabs.
Any help will really be appreciated!

Related

use csv file and plot data django2

I have a simple app that import csv file and make plot but without having any error message it doesn't show plot
It's part of my module:
...
def plot_data(self):
df = pd.read_csv("file.csv")
return plotly.express.line(df)
...
and it's part of my app file:
import panel
def app(doc):
gspec = pn.GridSpec()
gspec[0, 1] = pn.Pane(instance_class.plot_data())
gspec.server_doc(doc)
Update:
After searching more I could find HttpResponse and with that I could write things to csv through using csv module but still I have no idea how to read from csv
Also I saw HttpRequest and thought maybe I can use it for reading csv but I couldn't find any sample code and couldn't understand documentation about using it as a reader

Wrong encoding on CSV file in Python

I am not sure if I am making this question correctly but here's my issue:
I have a .csv file (InjectionWells.csv) that I need to split into columns based on commas. When I do it, it just doesn't work and I can only think might be an encoding but I don't know how to fix it. Can someone shed a light?
Here are few lines of the actual file:
API#,Operator,Operator ID,WellType,WellName,WellNumber,OrderNumbers,Approval Date,County,Sec,Twp,Rng,QQQQ,LAT,LONG,PSI,BBLS,ZONE,,,
3500300026,PHOENIX PETROCORP INC,19499,2R,SE EUREKA UNIT-TUCKER #1,21,133856,9/6/1977,ALFALFA,13,28N,10W,C-SE SE,36.9003240,-98.2182600,"2,500",300,CHEROKEE,,,
3500300163,CHAMPLIN EXPLORATION INC,4030,2R,CHRISTENSEN,1,470258,11/27/2002,ALFALFA,21,28N,09W,C-NW NW,36.8966360,-98.1777200,"2,400","1,000",RED FORK,,,
3500320786,LINN OPERATING INC,22182,2R,NE CHEROKEE UNIT,85,329426,8/19/1988,ALFALFA,24,27N,11W,SE NE,36.8061130,-98.3258400,"1,050","1,000",RED FORK,,,
3500321074,SANDRIDGE EXPLORATION & PRODUCTION LLC,22281,2R,VELMA,2-19,281652,7/11/1985,ALFALFA,19,28N,10W,SW NE NE SW,36.8885890,-98.3185300,"3,152","1,000",RED FORK,,,
I have tried both of these and non of them work:
1.
import pandas as pd
df=pd.read_csv('InjectionWells.csv', sep=',')
print(df)
import pandas as pd
test_data2=pd.read_csv('InjectionWells.csv', sep=',', encoding='utf-8')
test_data2.head()
As your CSV files contain some non-ASCII characters also, you need to pass a different encoding. UTF-8 can't handle that.
I tried this and it's working:
import pandas as pd
test_data2=pd.read_csv('InjectionWells.csv', sep=',', encoding='ISO-8859-1')
print(test_data2)

Pandas exported CSV file not enclosing text/string in double quotes

I have a python script which gets a JSON file from a MongoDB database, performs ETL processes such as filtering, flattening the dictionary and finally exporting the dataframe to CSV (which works fine).
The issue I am having is when I open the CSV in Notepad, the text columns are not enclosed in quotation marks.
Correct me if I'm wrong but I believe when a datatype of a column has been specified as a string/text, when you open that file in Excel there are no quotes but when opened in Notepad it should show those string columns within quotes.
from pymongo import MongoClient
import pandas as pd
from azure.storage.filedatalake import DataLakeServiceClient
from azure.core._match_conditions import MatchConditions
from azure.storage.filedatalake._models import ContentSettings
from pandas import json_normalize
from datetime import datetime, timedelta
import numpy as np
import json
mongo_client = MongoClient("XXXX")
db = mongo_client.rfqdb
table = db.request
document = table.find({'createdAt': {'$gt': datetime.utcnow() - timedelta(days=7)}})
docs = list(document)
docs = json.dumps(docs,default=str)
docs = docs.replace(r"\n",'').replace(r"\r\n",'').replace(r"\r",'')
docs = json.loads(docs)
docs = json_normalize(docs)
docs = docs[["id","reportName"]].astype("string")
print(docs.dtypes)
id string
reportName string
When I open the exported CSV file from Pandas in Notepad++ it doesn't show the string within quotes:
Could anyone shed some light on this situation as I've done this same process in Azure Data Factory where I have mapped these two columns as Strings and when I open the CSV in Notepad it shows the strings wrapped inside quotes(see below), so I'm a bit confused why Python Pandas isn't showing this when exporting.
Thanks in advance
As the answered here, it is about how the CSV are formatted.
Unnecessary double quotes added to rows of CVS file when opening with notepad/notepad++
So, if you want to get rid of the double quotes I suggest to try this one:
csv.writer(csvfile, quoting=csv.QUOTE_NONE)

How to open .ndjson file in Python?

I have .ndjson file that has 20GB that I want to open with Python. File is to big so I found a way to split it into 50 peaces with one online tool. This is the tool: https://pinetools.com/split-files
Now I get one file, that has extension .ndjson.000 (and I do not know what is that)
I'm trying to open it as json or as a csv file, to read it in pandas but it does not work.
Do you have any idea how to solve this?
import json
import pandas as pd
First approach:
df = pd.read_json('dump.ndjson.000', lines=True)
Error: ValueError: Unmatched ''"' when when decoding 'string'
Second approach:
with open('dump.ndjson.000', 'r') as f:
my_data = f.read()
print(my_data)
Error: json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 104925061 (char 104925060)
I think the problem is that I have some emojis in my file, so I do not know how to encode them?
ndjson is now supported out of the box with argument lines=True
import pandas as pd
df = pd.read_json('/path/to/records.ndjson', lines=True)
df.to_json('/path/to/export.ndjson', lines=True)
I think the pandas.read_json cannot handle ndjson correctly.
According to this issue you can do sth. like this to read it.
import ujson as json
import pandas as pd
records = map(json.loads, open('/path/to/records.ndjson'))
df = pd.DataFrame.from_records(records)
P.S: All credits for this code go to KristianHolsheimer from the Github Issue
The ndjson (newline delimited) json is a json-lines format, that is, each line is a json. It is ideal for a dataset lacking rigid structure ('non-sql') where the file size is large enough to warrant multiple files.
You can use pandas:
import pandas as pd
data = pd.read_json('dump.ndjson.000', lines=True)
In case your json strings do not contain newlines, you can alternatively use:
import json
with open("dump.ndjson.000") as f:
data = [json.loads(l) for l in f.readlines()]

Read specific csv file from zip using pandas

Here is a data I am interested in.
http://fenixservices.fao.org/faostat/static/bulkdownloads/Production_Crops_E_All_Data.zip
It consists of 3 files:
I want to download zip with pandas and create DataFrame from 1 file called Production_Crops_E_All_Data.csv
import pandas as pd
url="http://fenixservices.fao.org/faostat/static/bulkdownloads/Production_Crops_E_All_Data.zip"
df=pd.read_csv(url)
Pandas can download files, it can work with zips and of course it can work with csv files. But how can I work with 1 specific file in archive with many files?
Now I get error
ValueError: ('Multiple files found in compressed zip file %s)
This post doesn't answer my question bcause I have multiple files in 1 zip
Read a zipped file as a pandas DataFrame
From this link
try this
from zipfile import ZipFile
import io
from urllib.request import urlopen
import pandas as pd
r = urlopen("http://fenixservices.fao.org/faostat/static/bulkdownloads/Production_Crops_E_All_Data.zip").read()
file = ZipFile(io.BytesIO(r))
data_df = pd.read_csv(file.open("Production_Crops_E_All_Data.csv"), encoding='latin1')
data_df_noflags = pd.read_csv(file.open("Production_Crops_E_All_Data_NOFLAG.csv"), encoding='latin1')
data_df_flags = pd.read_csv(file.open("Production_Crops_E_Flags.csv"), encoding='latin1')
Hope this helps!
EDIT: updated for python3 StringIO to io.StringIO
EDIT: updated the import of urllib, changed usage of StringIO to BytesIO. Also your CSV files are not utf-8 encoding, I tried latin1 and that worked.
You could use python's datatable, which is a reimplementation of Rdatatable in python.
Read in data :
from datatable import fread
#The exact file to be extracted is known, simply append it to the zip name:
url = "Production_Crops_E_All_Data.zip/Production_Crops_E_All_Data.csv"
df = fread(url)
#convert to pandas
df.to_pandas()
You can equally work within datatable; do note however, that it is not as feature-rich as Pandas; but it is a powerful and very fast tool.
Update: You can use the zipfile module as well :
from zipfile import ZipFile
from io import BytesIO
with ZipFile(url) as myzip:
with myzip.open("Production_Crops_E_All_Data.csv") as myfile:
data = myfile.read()
#read data into pandas
#had to toy a bit with the encoding,
#thankfully it is a known issue on SO
#https://stackoverflow.com/a/51843284/7175713
df = pd.read_csv(BytesIO(data), encoding="iso-8859-1", low_memory=False)

Categories