I'm new to programming. I have a certain amount of csv files. What I want to do is to read the text columns in these files and to translate the columns to Spanish with google translate API, and then save the data frame as a new csv file.
My code goes like this:
!pip install googletrans==4.0.0rc1
from numpy.ma.core import append
import googletrans
import pandas as pd
import numpy as np
from googletrans import Translator
translator = Translator()
df = pd.read_csv("file.csv")
sentences= df['text'].tolist()
result = []
text_es=[]
[result.append(translator.translate(sentence,dest='es')) for sentence in sentences]
for s in result:
text_es.append(s.text)
df['text_es'] = np.array(text_es)
df.to_csv('es_file.csv', index=False)
Instead of uploading every single file and applying the code, I want to write a code that applies the code to all the files. How can I do this?
Ok so what you're going to want to do is create an Array of paths to all your csv files.
csv_paths = [Path1, Path2, Path3, Path4]
Then you need to loop over this list which is pretty simple simply use a for each loop like this:
for path in csv_paths:
Now you can do almost exactly what you were doing before but inside the loop:
df = pd.read_csv(path)
sentences= df['text'].tolist()
result = []
text_es=[]
[result.append(translator.translate(sentence,dest='es')) for sentence in sentences]
for s in result:
text_es.append(s.text)
df['text_es'] = np.array(text_es)
df.to_csv('es_file.csv', index=False)
I hope that helps :)
You can list all files in a folder using:
os.listdir('My_Downloads/Music')
And write a loop on this list.
See the docs:
See this link for more info
Related
I am currently working on importing and formatting a large number of excel files (all the same format/scheme, but different values) with Python.
I have already read in and formatted one file and everything worked fine so far.
I would now do the same for all the other files and combine everything in one dataframe, i.e. read in the first excel in one dataframe, add the second at the bottom of the dataframe, add the third at the bottom the dataframe, and so on until I have all the excel files in one dataframe.
So far my script looks something like this:
import pandas as pd
import numpy as np
import xlrd
import os
path = os.getcwd()
path = "path of the directory"
wbname = "name of the excel file"
files = os.listdir(path)
files
wb = xlrd.open_workbook(path + wbname)
# I only need the second sheet
df = pd.read_excel(path + wbname, sheet_name="sheet2", skiprows = 2, header = None,
skipfooter=132)
# here is where all the formatting is happening ...
df
So, "files" is a list with all file relevant names. Now I have to try to put one file after the other into a loop (?) so that they all eventually end up in df.
Has anyone ever done something like this or can help me here.
Something like this might work:
import os
import pandas as pd
list_dfs=[]
for file in os.listdir('path_to_all_xlsx'):
df = pd.read_excel(file, <the rest of your config to parse>)
list_dfs.append(df)
all_dfs = pd.concat(list_dfs)
You read all the dataframes and add them to a list, and then the concat method adds them all together int one big dataframe.
I am trying to run a script onto over 900 files using the Spyder platform, that aims to delete the first 3 rows of data and certain columns. I tried looking into other similar questions but was unable to achieve the intended results.
My code for one text file is as follows:
import pandas as pd
mydataset = pd.read_csv('vectors_0001.txt')
df = pd.DataFrame(mydataset)
df.drop(df.iloc[:,:2], inplace = True, axis = 1)
df.drop([0,1,3], axis = 0, inplace = True)
df = df.dropna(axis = 0, subset=['Column3','Column4'])
Then I want to modify the code above so it can be applied to the consecutive text files, all the text file names are: vectors_0001, vectors_0002, ..., vectors_0900. I tried to do something similar but I keep getting errors. Take the one below as an example:
(Note: that 'u [m/s]', 'v [m/s]' are the columns I want to keep for further data analysis and the other columns I want to get rid of.)
import glob
import os.path
import sys
import pandas as pd
dir_of_interest = sys.argv[1] if len(sys.argv) > 1 else '.'
files = glob.glob(os.path.join(dir_of_interest, "*.txt"))
for file in files:
with open('file.txt', 'w') as f:
f.writelines(3:)
df = pd.read_csv("*.txt")
df_new = df[['u [m/s]', 'v [m/s]']
df_new.to_csv('*.txt', header=True, index=None)
with open('file.txt','r+') as f:
print(f.read())
However I tried to run it and I got the error:
f.writelines(3:)
^
SyntaxError: invalid syntax
I really want to get this figured out and move onto my data analysis. Please and thank you in advance.
I'm not totally sure of what you are trying to achieve here but you're using the writelines functions incorrectly. It accepts a list as an argument
https://www.w3schools.com/python/ref_file_writelines.asp
You're giving it "3:" which is not valid. Maybe you want to give it a fraction of an existing list ?
I have a csv (input.csv) file as shown below:
VM IP Naa_Dev Datastore
vm1 xx.xx.xx.x1 naa.ab1234 ds1
vm2 xx.xx.xx.x2 naa.ac1234 ds1
vm3 xx.xx.xx.x3 naa.ad1234 ds2
I want to use this csv file as an input file for my python script. Here in this file, first line i.e. (VM IP Naa_Dev Datastore) is the column heading and each value is separated by space.
So my question is how we can use this csv file for input values in python so if I need to search in python script that what is the value of vm1 IP then it should pickup xx.xx.xx.x1 or same way if I am looking for VM which has naa.ac1234 Naa_Dev should take vm2.
I am using Python version 2.7.8
Any help is much appreciated.
Thanks
Working with tabular data like this, the best way is using pandas.
Something like:
import pandas
dataframe = pandas.read_csv('csv_file.csv')
# finding IP by vm
print(dataframe[dataframe.VM == 'vm1'].IP)
# OUTPUT: xx.xx.xx.x1
# or find by Naa_Dev
print(dataframe[dataframe.Naa_Dev == 'xx.xx.xx.x2'].VM)
# OUTPUT: vm2
For importing csv into python you can use pandas, in your case the code would look like:
import pandas as pd
df = pd.read_csv('input.csv', sep=' ')
and for locating certain rows in created dataframe you can multiple options (that you can easily find in pandas or just by googling 'filter data python'), for example:
df['VM'].where(df['Naa_Dev'] == 'naa.ac1234')
Use the pandas module to read the file into a DataFrame. There is a lot of parameters for reading csv files with pandas.read_csv. The dataframe.to_string() function is extremely useful.
Solution:
# import module with alias 'pd'
import pandas as pd
# Open the CSV file, delimiter is set to white space, and then
# we specify the column names.
dframe = pd.read_csv("file.csv",
delimiter=" ",
names=["VM", "IP", "Naa_Dev", "Datastore"])
# print will output the table
print(dframe)
# to_string will allow you to align and adjust content
# e.g justify = left to align columns to the left.
print(dframe.to_string(justify="left"))
Pandas is probably the best answer but you can also:
import csv
your_list = []
with open('dummy.csv') as csvfile:
reader = csv.DictReader(csvfile, delimiter=' ')
for row in reader:
your_list += [row]
print(your_list)
I am reading a few XLS files via
import os
import pandas as pd
path = r'pathtofolder'
files = os.listdir(path=path)
dataframes = {}
for file in files:
filepath = path + '\\' + file
if filepath[-3:] == 'xls':
print(file)
dataframes[file] = pd.read_excel(filepath)
For some reason however, I can't access the dataframes inside the dictionaries, as .head() doesn't seem to work:
for file, dataframe in dataframes.items():
dataframe.head()
This code doesn't seem to do anything in Jupyter. However when I type() dataframe, I get a pandas.core.frame.DataFrame, so head should be working, right?
haven't worked with Python data frames before, but I don't think your for loop will give you any output in this way. It's just a running loop which ends when the last head is calculated. You can just use print() to see your output.
for file, dataframe in mydict.items():
print(dataframe.head())
Or create a reusable list of dataframe.head() as shown below. You enter the list name anytime in the console to view it later. Pardon the code for creating a dictionary of dataframes.
import pandas as pd
from sklearn import datasets
iris = pd.DataFrame(datasets.load_iris().data)
digits = pd.DataFrame(datasets.load_digits().data)
diabetes = pd.DataFrame(datasets.load_diabetes().data)
dataframes={'a':iris,'b':digits,'c':diabetes} #create a dictionary of dataframes
list_heads=[] #create a list of dataframe head()
for i in dataframes:
list_heads.append(dataframes[i].head())
list_heads
I am creating a code, that read a csvs file and search for a specific item code and then it will output the name of the item.
How would i do this?
I don't have any code yet
Thanks
You can use pandas
Install it, then try
import pandas as pd
df = pd.read_csv('yourFile.csv')
print df