Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I want to convert the following link to Excel using Python language so that it stores information about the country and the capital and their code in an Excel file
Can you please guide me?
https://restcountries.eu/rest/v2/all
Pandas library will come to rescue here, though extracting your nested json is more of a python skills. You can follow the following to simply extract desired columns:
import pandas as pd
url = 'https://restcountries.eu/rest/v2/all';
#Load json to a dataframe
df = pd.read_json(url);
# Create DF with Country, capital and code fields. You can use df.head() to see how your data looks in table format and columns name.
df_new = df[['name', 'capital', 'alpha2Code', 'alpha3Code']].copy()
#Use pandas ExcelWriter to write the desired DataFrame to xlsx file.
with pd.ExcelWriter('country_names.xlsx') as writer:
df_new.to_excel(writer, sheet_name="Country List")
Sample Data from the generated Excel File
Full info on ExcelWriter module can be read at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html
You will need to play around to change the columns names and clean up the data (especially the nested objects) and these should be just a search away.
Your best bet would be to use pandas to read the JSON from the URL that you have mentioned and save it to an excel file. Here's the code for it:
import pandas as pd
# Loading the JSON from the URL to a pandas dataframe
df = pd.read_json('https://restcountries.eu/rest/v2/all')
# Selecting the columns for the country name, capital, and the country code (as mentioned in the question)
df = df[["name", "capital", "alpha2Code"]]
# Saving the data frame into an excel file named 'restcountries.xlsx', but feel free to change the name
df.to_excel('restcountries.xlsx')
However, there will be an issue with reading nested fields (if you want to read them in the future). For example, the fields named borders and currencies in your dataset are lists. So, you might need some post-processing after you load it.
Cheers!
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have a list of phone numbers in excel sheet column A, i need a python code to extract all the numbers to a .txt file in one line separated with commas.
list of phone numbers
example: 560000000,560000001,560000003,560000004,560000005,560000006,560000007,560000008,560000009,560000010,560000011,560000012
enter image description here
from openpyxl import load_workbook
book = load_workbook('1.xlsx')
sheet = book.active
first_column = sheet['A']
with open('out.txt', 'w') as outfile:
for x in range(len(first_column)):
outfile.write(str(first_column[x].value)+',')
you can use openpyxl which is pretty much intuitive.
load_workbook opens your excel file specified in the path
we need to select the sheet
select the column in my case it is A
Open text file and iterate over the column to write the data into text file.
try this
import pandas as pd
df = pd.read_excel('test.xlsx', header=None)
print(','.join(map(str,df[0].to_list())))
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have some data of 50 people in 50 different excel files placed in the same folder. For each person the data is present in five different files like shown below:
Example:
Person1_a.xls, Person1_b.xls, Person1_c.xls, Person1_d.xls, Person1_e.xls.
Each excel sheet has two columns and multiple sheets. I need to create a file Person1.xls which will have the second column of all these files, combined. Same process should be applicable for all the 50 people.
Any suggestions would be appreciated.
Thank you!
I have created a trial folder that I believe is similar to yours. I added data only for Person1 and Person3.
In the attached picture, the files called Person1 and Person3 are the exported files that include only the 2nd column for each person. So each person has their own file now.
I added a small description on what each line does. Please let me know if something is not clear.
import pandas as pd
import glob
path = r'C:\..\trial' # use your path where the files are
all_files = glob.glob(path + "/*.xlsx") # will get you all files with an extension .xlsx in a folder
li = []
for i in range(0,51): # numbers from 1 to 50 (for the 50 different people)
for f in all_files:
if str(i) in f: # checks if the number (i) is in the excel name
df = pd.read_excel(f,
sheet_name=0, # import 1st sheet
usecols=([1])) # only import column 2
df['person'] = f.rsplit('\\',1)[1].split('_')[0] # get the name of the person in a column
li.append(df) # add it to the list of dataframes
all_person = pd.concat(li, axis=0, ignore_index=True) # concat all dataframes imported
Then you can export to the same path, a different excel file for each different person
for i,j in all_person.groupby('person'):
j.to_excel(f'{path}\{i}.xlsx', index = False)
I am aware that this is probably not the most efficient way, but it will probably get you what you need.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have my csv files in the same folder. I want to get only the data in the 5th column from all my csv files and write the data into a single file. But there are blank lines in my csv files. https://drive.google.com/file/d/1SospIppACOrLeKPU_9OknnDLnDpatIqE/view?usp=sharing
How can I keep the blanks with pandas.read_csv command?
Many thanks!
Fake data:
sapply(1:3, function(i) write.csv(mtcars, paste0(i,".csv"), row.names=FALSE))
results in three csv files, named 1.csv through 3.csv, each with:
"mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb"
21,6,160,110,3.9,2.62,16.46,0,1,4,4
21,6,160,110,3.9,2.875,17.02,0,1,4,4
22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
...
The code:
write.csv(sapply(list.files(pattern="*.csv"), function(a) read.csv(a)[,5]),
"agg.csv", row.names=FALSE)
results in a single CSV file, agg.csv, that contains
"1.csv","2.csv","3.csv"
3.9,3.9,3.9
3.9,3.9,3.9
3.85,3.85,3.85
3.08,3.08,3.08
...
You can use the usecols argument of pandas.read_csv.
import pandas as pd
from glob import glob
So what we are doing here is that we are looping over all files in the current directory that end with .csv and then for each of those files only read in the column of interest, i.e. the 5th column. We write usecols=[4] because pandas uses 0-based indexing, so out of 0, 1, 2, 3, 4, the fifth number is 4. Additionally you asked to skip blank lines and your sample data contains 9 blank lines leading up to actual data, so we will set skiprows to 9.
We concatenate all of those into one DataFrame using pd.concat.
combined_df = pd.concat(
[
pd.read_csv(csv_file, usecols=[4], skiprows=9)
for csv_file in glob('*.csv')
]
)
To get rid of blank lines from your DataFrame, you can simply use:
combined_df = combined_df.dropna()
This combined_df we can then simply write to file:
combined_df.to_csv('combined_column_5.csv')
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am using Python 3.6.2 and have the following csv file:
STATE,RATE,DEATHS
IA,4.2,166
NH,4.2,52
MA,4.3,309
CA,4.4,2169
CO,4.6,309
ID,4.6,106
NY,4.6,1087
VT,4.6,27
NJ,4.7,487
I am trying to add a new column to the file, where I multiply the rate column times the deaths column. The following table is what I'd like my results to look like.
STATE,RATE,DEATHS,NEW
IA,4.2,166,697.2
NH,4.2,52,218.2
MA,4.3,309,1328.7
CA,4.4,2169,9543.6
CO,4.6,309,1421.4
ID,4.6,106,487.6
NY,4.6,1087,5000.2
VT,4.6,27,124.2
NJ,4.7,487,2288.9
I've tried looking for an answer to this question but couldn't find anything similar to this. Thanks in advance.
Use pandas:
import pandas as pd
df = pd.read_csv('path/to/yourfile.csv')
df['NEW'] = df.RATE * df.DEATHS
df.to_csv('path/to/yournewfile.csv')
Using the pandas library, this is fairly simple:
import pandas as pd
df = pd.read_csv('filename.csv')
df['NEW'] = df['RATE'] * df['DEATHS']
# You can save over the old file, though I would suggest saving a new one
# in case you make a mistake
df.to_csv('new_filename.csv')
There are several cool things that the pandas library takes care of for us. First, we easily parse the csv using the pd.read_csv() statement. Next, pandas DataFrame objects (which is what the variable df is) allow us to use keys to access and create columns, much like a Python dictionary. When we perform mathematical operations using columns from the DataFrame, the pandas library actually performs the operation for each value in each column, so in our example, the index 0 in the 'RATE' column is multiplied by index 0 of the 'DEATHS' column.
In short, if you are going to access and manipulate spreadsheet-like files in python, pandas is a powerful and easy-to-use library.
file = open('test.csv', 'r')
lines = file.readlines()
# print new header
print(lines[0].strip() + ',NEWCOLUMN')
# loop through other lines starting from 1
for line in lines[1:]:
line_items = line.strip().split(',')
# your operation
new_column = float(line_items[1]) * float(line_items[2])
line_items.append(new_column)
print(",".join(map(str, line_items)))
You can read the csv with csv built-in package and then manipulate with columns as you need. Of course, you can use pandas library, but like to use a sledge-hammer to crack a nut. Replace StringIO (just to make testing simple) in the example below with the file reading and the job is done.
from io import StringIO
import csv
f_in = StringIO("""STATE,RATE,DEATHS
IA,4.2,166
NH,4.2,52
MA,4.3,309
CA,4.4,2169
CO,4.6,309
ID,4.6,106
NY,4.6,1087
VT,4.6,27
NJ,4.7,487""")
reader = csv.reader(f_in)
with open('new.csv', 'w') as f:
writer = csv.writer(f)
headings = next(reader)
headings.append('NEW')
writer.writerow(headings)
for row in reader:
row.append(str(round(float(row[1]) * float(row[2]), 1)))
writer.writerow(row)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I want to import csv file into python and then create a table in python to display the contents of the imported csv file.
Further need to do manipulations on the data present in the table.
More functions related to table in python should be performed further:
Like:
1) highlighting the specified column using python
2) doing modifications with particular column like sorting data as per the date or quantity using python
This is an example if you want to import a csv file or txt file with Python.
I use Numpy in order to make that :
#!/usr/bin/env python
import numpy as np
file = np.loadtxt('filename', delimiter=',') # Put the delimiter type from your csv file : coma, blank, ..
print file # print the numpy array
If you want to make a sort in your array, you can use the numpy function :
np.sort()
Doc is there
Now, try to make something, because it's important to get a script before to post your ask ;)