I have a csv (input.csv) file as shown below:
VM IP Naa_Dev Datastore
vm1 xx.xx.xx.x1 naa.ab1234 ds1
vm2 xx.xx.xx.x2 naa.ac1234 ds1
vm3 xx.xx.xx.x3 naa.ad1234 ds2
I want to use this csv file as an input file for my python script. Here in this file, first line i.e. (VM IP Naa_Dev Datastore) is the column heading and each value is separated by space.
So my question is how we can use this csv file for input values in python so if I need to search in python script that what is the value of vm1 IP then it should pickup xx.xx.xx.x1 or same way if I am looking for VM which has naa.ac1234 Naa_Dev should take vm2.
I am using Python version 2.7.8
Any help is much appreciated.
Thanks
Working with tabular data like this, the best way is using pandas.
Something like:
import pandas
dataframe = pandas.read_csv('csv_file.csv')
# finding IP by vm
print(dataframe[dataframe.VM == 'vm1'].IP)
# OUTPUT: xx.xx.xx.x1
# or find by Naa_Dev
print(dataframe[dataframe.Naa_Dev == 'xx.xx.xx.x2'].VM)
# OUTPUT: vm2
For importing csv into python you can use pandas, in your case the code would look like:
import pandas as pd
df = pd.read_csv('input.csv', sep=' ')
and for locating certain rows in created dataframe you can multiple options (that you can easily find in pandas or just by googling 'filter data python'), for example:
df['VM'].where(df['Naa_Dev'] == 'naa.ac1234')
Use the pandas module to read the file into a DataFrame. There is a lot of parameters for reading csv files with pandas.read_csv. The dataframe.to_string() function is extremely useful.
Solution:
# import module with alias 'pd'
import pandas as pd
# Open the CSV file, delimiter is set to white space, and then
# we specify the column names.
dframe = pd.read_csv("file.csv",
delimiter=" ",
names=["VM", "IP", "Naa_Dev", "Datastore"])
# print will output the table
print(dframe)
# to_string will allow you to align and adjust content
# e.g justify = left to align columns to the left.
print(dframe.to_string(justify="left"))
Pandas is probably the best answer but you can also:
import csv
your_list = []
with open('dummy.csv') as csvfile:
reader = csv.DictReader(csvfile, delimiter=' ')
for row in reader:
your_list += [row]
print(your_list)
Related
This is a Python question. I have a csv file and would like to read that in. The first row in the file are strings and I would like to use them as variable names. The other rows are integers and I would like them to be a vector of the name of the respective variable.
Thanks,
Tim
you need to first extract your first row I suggest to count the characters of first row and use this code to read them
f = open("demofile.txt", "r")
print(f.read(5))#put your desired counted charactor inside f.read(n)
when you successfully read it save it on variable and after saving use regex to split them with respect to ","
import re
txt = "The rain in Spain"
x = re.split("[,]", txt, 1)
print(x)
after that use dictionary methods to attain your desired result.
You can simply use pandas to read .csv files. Just install pandas using 'pip install pandas'. Then use the following code:
import pandas as pd
dataframe = pd.read_csv('data.csv')
# Returns a list containing names of the columns
column_names = list(dataframe.columns.values)
I am trying to merge a large number of .csv files. They all have the same table format, with 60 columns each. My merged table results in the data coming out fine, except the first row consists of 640 columns instead of 60 columns. The remainder of the merged .csv consists of the desired 60 column format. Unsure where in the merge process it went wrong.
The first item in the problematic row is the first item in 20140308.export.CSV while the second (starting in column 61) is the first item in 20140313.export.CSV. The first .csv file is 20140301.export.CSV the last is 20140331.export.CSV (YYYYMMDD.export.csv), for a total of 31 .csv files. This means that the problematic row consists of the first item from different .csv files.
The Data comes from http://data.gdeltproject.org/events/index.html. In particular the dates of March 01 - March 31, 2014. Inspecting the download of each individual .csv file shows that each file is formatted the same way, with tab delimiters and comma separated values.
The code I used is below. If there is anything else I can post, please let me know. All of this was run through Jupyter Lab through Google Cloud Platform. Thanks for the help.
import glob
import pandas as pd
file_extension = '.export.CSV'
all_filenames = [i for i in glob.glob(f"*{file_extension}")]
combined_csv_data = pd.concat([pd.read_csv(f, delimiter='\t', encoding='UTF-8', low_memory= False) for f in all_filenames])
combined_csv_data.to_csv('2014DataCombinedMarch.csv')
I used the following bash code to download the data:
!curl -LO http://data.gdeltproject.org/events/[20140301-20140331].export.CSV.zip
I used the following code to unzip the data:
!unzip -a "********".export.CSV.zip
I used the following code to transfer to my storage bucket:
!gsutil cp 2014DataCombinedMarch.csv gs://ddeltdatabucket/2014DataCombinedMarch.csv
Looks like these CSV files have no header on them, so Pandas is trying to use the first row in the file as a header. Then, when Pandas tries to concat() the dataframes together, it's trying to match the column names which it has inferred for each file.
I figured out how to suppress that behavior:
import glob
import pandas as pd
def read_file(f):
names = [f"col_{i}" for i in range(58)]
return pd.read_csv(f, delimiter='\t', encoding='UTF-8', low_memory=False, names=names)
file_extension = '.export.CSV'
all_filenames = [i for i in glob.glob(f"*{file_extension}")]
combined_csv_data = pd.concat([read_file(f) for f in all_filenames])
combined_csv_data.to_csv('2014DataCombinedMarch.csv')
You can supply your own column names to Pandas through the names parameter. Here, I'm just supplying col_0, col_1, col_2, etc for the names, because I don't know what they should be. If you know what those columns should be, you should change that names = line.
I tested this script, but only with 2 data files as input, not all 31.
PS: Have you considered using Google BigQuery to get the data? I've worked with GDELT before through that interface and it's way easier.
So I'm currently transferring a txt file into a csv. It's mostly cleaned up, but even after splitting there are still empty columns between some of my data.
Below is my messy CSV file
And here is my current code:
Sat_File = '/Users'
output = '/Users2'
import csv
import matplotlib as plt
import pandas as pd
with open(Sat_File,'r') as sat:
with open(output,'w') as outfile:
if "2004" in line:
line=line.split(' ')
writer=csv.writer(outfile)
writer.writerow(line)
Basically, I'm just trying to eliminate those gaps between columns in the CSV picture I've provided. Thank you!
You can use python Pandas library to clear out the empty columns:
import pandas as pd
df = pd.read_csv('path_to_csv_file').dropna(axis=1, how='all')
df.to_csv('path_to_clean_csv_file')
Basically we:
Import the pandas library.
Read the csv file into a variable called df (stands for data frame).
Than we use the dropna function that allows to discard empty columns/rows. axis=1 means drop columns (0 means rows) and how='all' means drop columns all of the values in them are empty.
We save the clean data frame df to a new, clean csv file.
$$$ Pr0f!t $$$
Is there any way you can write value to specific place in given .csv file using pandas or csv module?
I have tried using csv_reader to read the file and find a line which fits my requirements though I couldn't figure out a way to switch value which is in the file to mine.
What I am trying to achieve here is that I have a spreadsheet of names and values. I am using JSON to update the values from the server and after that I want to update my spreadsheet also.
The latest solution which I came up with was to create separate sheet from which I will get updated data, but this one is not working, though there is no sequence in which the dict is written to the file.
def updateSheet(fileName, aValues):
with open(fileName+".csv") as workingSheet:
writer = csv.DictWriter(workingSheet,aValues.keys())
writer.writeheader()
writer.writerow(aValues)
I will appreciate any guidance and tips.
You can try this way to operate the specified csv file
import pandas as pd
a = ['one','two','three']
b = [1,2,3]
english_column = pd.Series(a, name='english')
number_column = pd.Series(b, name='number')
predictions = pd.concat([english_column, number_column], axis=1)
save = pd.DataFrame({'english':a,'number':b})
save.to_csv('b.csv',index=False,sep=',')
I am trying to import a csv, change the first value in the file, and then write the file out to another csv. I am doing this as excel opens the csv files as SYLK format files if 'ID' is in the first value. I therefore intend to change 'ID' to "Value_ID'. I can't figure out how to change the value of s[0][0] = 'Value_ID'. Any help would be greatly appreciated.
with open('input1.csv', 'r') as file1:
reader = csv.reader(file1)
s = ('output1.csv')
filewriter = csv.writer(open(s,'w',newline= '\n'))
for row in reader:
filewriter.writerow(row)
s=[0][0] = 'Match_ID'
You can use pandas for doing this and many more operations quite efficiently and easily.
To install pandas
pip install pandas
This will make sure install all its dependencies as well.
Once this is done, open up the python shell
import pandas as pd
df = pd.read_csv('input1.csv')
new_df = df.set_value(index,col,value)
new_df.to_csv('Output1.csv')
In the above snippet, replace your index with row number and column with the colomn name.
If you are unsure what the row and column names are, type
df.head(5)
This shall give you top 5 rows and coloumns of the Pandas Dataframe.
Happy coding. Cheers!