From NetCDF file to csv by Python - python

Sorry for my bad English.
I am in an internship, I never have used Python before this.
I need to extract data from a NetCDF file.
I already have created a loop which creates a DataFrame, but when I try to extract this DataFrame I only have 201 values on 41000.
import csv
import numpy as np
import pandas as pd
import netCDF4
from netCDF4 import Dataset, num2date
nc = Dataset('Q:/QGIS/2011001.nc', 'r')
chla = nc.variables['chlorophyll_a'][0]
lons = nc.variables['lon'][:]
lat = nc.variables['lat'][:]
time = nc.variables['time'][:]
nlons=len(lons)
nlat=len(lat)
The first loop give me the 41000 values in ArcGIS python console
for i in range(0,nlat) :
dla = {'lat':lat[i],'long':lons,'chla':chla[i]}
z = pd.DataFrame(dla)
print (z)
z.to_csv('Q:/QGIS/fichier.csv', sep =',', index= True)
But when I do the to.csv I only get 201 values in the csv file.
for y in range(0,nlat):
q[y].to_csv('Q:/QGIS/fichier.csv', sep =',', index= True)
for i in range(0,nlat):
dlo ={'lat':lat[i],'long':lons,'chla':chla[i]}
q[y] = pd.DataFrame(dlo)
print(q)
I hope that you will have an answer to solve this, moreover if you have any script to extract values for create an shp file I would be very grateful if you can share it!
Best regards
Thank you in advance

Related

Filtering pandas dataframe in python

I have a csv file with rows and columns separated by commas. This file contains headers (str) and values. Now, I want to filter all the data with a condition. For example, there is a header called "pmra" and I want to keep all the information for pmra values between -2.6 and -2.0. How can I do that? I tried with np.where but it did not work. Thanks for your help.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
filename="NGC188_C.csv"
data = pd.read_csv(filename)
ra = data["ra"]
dec = data["dec"]
parallax = data["parallax"]
pm_ra = data["pmra"]
pm_dec = data["pmdec"]
g_band = data["phot_g_mean_mag"]
bp_rp = data["bp_rp"]
You can use something like:
data[(data["pmra"] >= -2.6) & (data["pmra"] <= -2)]
There is also another approach: You can use between function:
data["pmra"].between(-2.6, -2)

How to delete \xa0 symbol from all dataframe in pandas?

I have a data frame with the 2011 census (Chile) information. In some columns names and some variable values, I have \xa0 symbol and is trouble to call some parts of data frames. My code is the following:
import numpy as numpy
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sn
#READ AND SAVE EACH DATA SET IN A DATA FRAME
data_2011 = pd.read_csv("2011.csv")
#SHOW FIRST 5 ROWS
print(data_2011.head())
Doing this got the following output:
This far, everything is right, but when I want to see the column names using:
print("ATRIBUTOS DATOS CENSO 2011",list(data_2011))
I got the following ouput:
How can I fix this? Thanks in advance
Try the following code:
import re
inputarray = re.sub(r'\s', data_2011).split(",')

Error with large data set using openpyxl python

I've got an excel file xlsx (shape:1180,6) that I'm trying to manipulate around. Pretty much creating an empty row every other row and inserting data to it, by just re-arranging the data. The code runs fine when i try it with just 10 rows of data but fails when i run the entire 1180 rows. It also runs a long time before spitting out the same unprocessed data. Is openpyxl not built for this? Just wondering if there's a more efficient way of doing it. Here's my code. Below the code is data after using a few rows, which is what i need, but fails for the entire data set.
%%time
import pandas as pd
import numpy as np
from openpyxl import load_workbook
import os
xls = pd.ExcelFile('input.xlsx')
df = xls.parse(0)
wb = load_workbook('input.xlsx')
#print(wb.sheetnames)
sh1=wb['Sheet1']
df.head()
#print(sh1.max_column)
for y in range(2,(sh1.max_row+1)*2,2):
sh1.insert_rows(y)
wb.save('output.xlsx')
m=3
for k in range(2,sh1.max_row+1,2):
sh1.cell(row=k,column=1).value = sh1.cell(row=m,column=1).value # copy from one cell and paste
sh1.cell(row=k,column=2).value = sh1.cell(row=m,column=3).value
sh1.cell(row=k,column=3).value = sh1.cell(row=m,column=2).value
sh1.cell(row=k,column=4).value = 'A'
sh1.cell(row=m,column=4).value = 'H'
sh1.cell(row=k,column=5).value = sh1.cell(row=m,column=6).value
sh1.cell(row=k,column=6).value = sh1.cell(row=m,column=5).value
m+=2
wb.save('output.xlsx')
xls = pd.ExcelFile('output.xlsx')
df1 = xls.parse(0)
wb1 = load_workbook('output.xlsx')
df1

Python Pandas: print the csv data in oder with columns

 Hi I am new with python, I am using pandas to read the csv file data, and print it. The code is shown as following:
import numpy as np
import pandas as pd
import codecs
from pandas import Series, DataFrame
dframe = pd.read_csv("/home/vagrant/geonlp_japan_station.csv",sep=',',
encoding="Shift-JIS")
print (dframe.head(2))
but the data is printed like as following(I just give example to show it)
However, I want the data to be order with columns like as following:
I don't know how to make the printed data be clear, thanks in advance!
You can check unicode-formatting and set:
pd.set_option('display.unicode.east_asian_width', True)
I test it with UTF-8 version csv:
dframe = pd.read_csv("test/geonlp_japan_station/geonlp_japan_station_20130912_u.csv")
and it seems align of output is better.
pd.set_option('display.unicode.east_asian_width', True)
print dframe
pd.set_option('display.unicode.east_asian_width', False)
print dframe

Python numpy, skip columns & read csv file

I've got a CSV file with 20 columns & about 60000 rows.
I'd like to read fields 2 to 20 only. I've tried the below code but the browser(using ipython) freezes & it just goes n for ages
import numpy as np
from numpy import genfromtxt
myFile = 'sampleData.csv'
myData = genfromtxt(myFile, delimiter=',', usecols(2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19)
print myData
How could I tweak this to work better & actually produce output please?
import pandas as pd
myFile = 'sampleData.csv'
df = pd.DataFrame(pd.read_csv(myFile,skiprows=1)) // Skipping header
print df
This works like a charm

Categories