Python Pandas: print the csv data in oder with columns

Python Pandas: print the csv data in oder with columns - python

　Hi I am new with python, I am using pandas to read the csv file data, and print it. The code is shown as following:
import numpy as np
import pandas as pd
import codecs
from pandas import Series, DataFrame
dframe = pd.read_csv("/home/vagrant/geonlp_japan_station.csv",sep=',',
encoding="Shift-JIS")
print (dframe.head(2))
but the data is printed like as following(I just give example to show it)
However, I want the data to be order with columns like as following:
I don't know how to make the printed data be clear, thanks in advance!

You can check unicode-formatting and set:
pd.set_option('display.unicode.east_asian_width', True)
I test it with UTF-8 version csv:
dframe = pd.read_csv("test/geonlp_japan_station/geonlp_japan_station_20130912_u.csv")
and it seems align of output is better.
pd.set_option('display.unicode.east_asian_width', True)
print dframe
pd.set_option('display.unicode.east_asian_width', False)
print dframe

Related

how to insert data from list into excel in python

how to insert data from list into excel in python
for example i exported this data from log file :
data= ["101","am1","123450","2015-01-01 11:19:00","test1 test1".....]
["102","am2","123451","2015-01-01 11:20:00","test2 test3".....]
["103","am3","123452","2015-01-01 11:21:00","test3 test3".....]
Output result:
[1]: https://i.stack.imgur.com/7uTOE.png
.

The module pandas has a DataFrame.to_excel() function that would do that.
import pandas as pd
data= [["101","am1","123450","2015-01-01 11:19:00","test1 test1"],
["102","am2","123451","2015-01-01 11:20:00","test2 test3"],
["103","am3","123452","2015-01-01 11:21:00","test3 test3"]]
df = pd.DataFrame(data)
df.to_excel('my_data.xmls')
That should do it.

Using pandas, how do I turn one csv file column into list and then filter a different csv with the created list?

Basically I have one csv file called 'Leads.csv' and it contains all the sales leads we already have. I want to turn this csv column 'Leads' into a list and then check a 'Report' csv to see if any of the leads are already in there and then filter it out.
Here's what I have tried:
import pandas as pd
df_leads = pd.read_csv('Leads.csv')
leads_list = df_leads['Leads'].values.tolist()
df = pd.read_csv('Report.csv')
df = df.loc[(~df['Leads'].isin(leads_list))]
df.to_csv('Filtered Report.csv', index=False)
Any help is much appreciated!

You can try:
import pandas as pd
df_leads = pd.read_csv('Leads.csv')
df = pd.read_csv('Report.csv')
set_filtered = set(df['Leads'])-(set(df_leads['Leads']))
df_filtered = df[df['Leads'].isin(set_filtered)]
Note: Sets, are significantly faster than lists for this operation.

How to delete \xa0 symbol from all dataframe in pandas?

I have a data frame with the 2011 census (Chile) information. In some columns names and some variable values, I have \xa0 symbol and is trouble to call some parts of data frames. My code is the following:
import numpy as numpy
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sn
#READ AND SAVE EACH DATA SET IN A DATA FRAME
data_2011 = pd.read_csv("2011.csv")
#SHOW FIRST 5 ROWS
print(data_2011.head())
Doing this got the following output:
This far, everything is right, but when I want to see the column names using:
print("ATRIBUTOS DATOS CENSO 2011",list(data_2011))
I got the following ouput:
How can I fix this? Thanks in advance

Try the following code:
import re
inputarray = re.sub(r'\s', data_2011).split(",')

How to optimize python script to pyspark def function

I am writing a pyspark program that takes a txt file and then add a few columns to the left(beginning) of the columns in the file.
My text file looks like this:
ID,Name,Age
1233,James,15
After I run the program I want it to add two columns named creation_DT and created_By to the left of the table. I am trying to get it to look like this:
Creation_DT,Created_By,ID,Name,Age
"current timestamp", Sean,1233,James,15
This code below get my required output but I was wondering if there was an easier way to do this to optimize my script below using pyspark.
import pandas as pd
import numpy as np
with open
df = pd.read_csv("/home/path/Sample Text Files/sample5.txt", delimiter = ",")
df=pd.DataFrame(df)
df.insert(loc=0, column='Creation_DT', value=pd.to_datetime('today'))
df.insert(loc=1, column='Create_BY',value="Sean")
df.write("/home/path/new/new_file.txt")
Any ideas or suggestions?

yes it is relatively easy to convert to pyspark code
from pyspark.sql import DataFrame, functions as sf
import datetime
# read in using dataframe reader
# path here if you store your csv in local, should use file:///
# or use hdfs:/// if you store your csv in a cluster/HDFS.
spdf = (spark.read.format("csv").option("header","true")
.load("file:///home/path/Sample Text Files/sample5.txt"))
spdf2 = (
spdf
.withColumn("Creation_DT", sf.lit(datetime.date.today().strftime("%Y-%m-%d")))
.withColumn("Create_BY", sf.lit("Sean"))
spdf2.write.csv("file:///home/path/new/new_file.txt")
this code assumes you are appending the creation_dt or create_by using the same value.

I don't see you use any pyspark in your code, so I'll just use pandas this way:
cols = df.columns
df['Creation_DT'] =pd.to_datetime('today')
df['Create_BY']="Sean"
cols = cols.insert(0, 'Create_BY')
cols = cols.insert(0, 'Creation_DT')
df.columns = cols
df.write("/home/path/new/new_file.txt")

Overwrite specific columns after modification pandas python

I have a csv file where i did some modifications in two columns. My question is the following: How can I print my csv file with the updated columns? My code is the following :
import pandas as pd
import csv
data = pd.read_csv("testdataset.csv")
data = data.join(pd.get_dummies(data["ship_from"]))
data = data.drop("ship_from", axis=1)
data['market_name'] = data['market_name'].map(lambda x: str(x)[39:-1])
data = data.join(pd.get_dummies(data["market_name"]))
data = data.drop("market_name", axis=1)
Thank you in advance!

You can write to a file with pandas.DataFrame.to_csv
data.to_csv('your_file.csv')
However, you can view it without writing with
print(data.to_csv())

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Pandas: print the csv data in oder with columns - python

Related

how to insert data from list into excel in python

Using pandas, how do I turn one csv file column into list and then filter a different csv with the created list?

How to delete \xa0 symbol from all dataframe in pandas?

How to optimize python script to pyspark def function

Overwrite specific columns after modification pandas python

Categories

Resources