What is "_csv" in Python? - python

In attempting to read the source code for the csv.py file (as a guide to implementing my own writer class in another context) I found that much of the functionality in that file is, in turn, imported from something called _csv:
from _csv import Error, __version__, writer, reader, register_dialect, \
unregister_dialect, get_dialect, list_dialects, \
field_size_limit, \
QUOTE_MINIMAL, QUOTE_ALL, QUOTE_NONNUMERIC, QUOTE_NONE, \
__doc__
I cannot find any file with this name on my system (including searching for files with the Hidden attribute set), although I can do import _csv from the Python shell.
What is this module and is it possible to read it?

_csv is the C "backbone" of the csv module. Its source is in Modules/_csv.c. You can find the compiled version of this module from the Python command prompt with:
>>> import _csv
>>> _csv
<module '_csv' from '/usr/lib/python2.6/lib-dynload/_csv.so'>
There are no hidden files in the Python source code :)

Not to disagree with larsmans answer.
There is an official explanation of the module naming convention in PEP8:
When an extension module written in C or C++ has an accompanying Python module that provides a higher level (e.g. more object oriented) interface, the C/C++ module has a leading underscore (e.g. _socket)

An answer has already been accepted, but I wanted to add a few details as I had the same question as the OP.
A .csv file's content is just a comma separated document with values:
first,last
john,doe
...
You can test this by opening up a .csv file in notepad
If you want to create a .csv file you can write to it and you will get the csv file that you want.
with open('test.csv', 'w') as file:
file.write('first,last\n')
file.write('John,Doe\n')
Simple solution if you want to write to a .csv file without using "import csv".

Related

how to modify txt file properties with python

I am trying to make a python program that creates and writes in a txt file.
the program works, but I want it to cross the "hidden" thing in the txt file's properties, so that the txt can't be seen without using the python program I made. I have no clues how to do that, please understand I am a beginner in python.
I'm not 100% sure but I don't think you can do this in Python. I'd suggest finding a simple Visual Basic script and running it from your Python file.
Assuming you mean the file-properties, where you can set a file as "hidden". Like in Windows as seen in screenshot below:
Use operating-system's command-line from Python
For example in Windows command-line attrib +h Secret_File.txt to hide a file in CMD.
import subprocess
subprocess.run(["attrib", "+h", "Secret_File.txt"])
See also:
How to execute a program or call a system command?
Directly call OS functions (Windows)
import ctypes
path = "my_hidden_file.txt"
ctypes.windll.kernel32.SetFileAttributesW(path, 2)
See also:
Hide Folders/ File with Python
Rename the file (Linux)
import os
filename = "my_hidden_file.txt"
os.rename(filename, '.'+filename) # the prefix dot means hidden in Linux
See also:
How to rename a file using Python

Decompressing bz2 files on Windows

I am trying to decompress a bz2 file with below code snippet which is provided in various places:
bz2_data = bz2.BZ2File(DATA_FILE+".bz2").read()
open(DATA_FILE, 'wb').write(bz2_data)
However, I am getting a much smaller file than I expect.
When I extract the file with 7z GUI I am receiving a file with a size of 248MB. However, with above code the file I get is 879kb.
When I read the extracted XML file, I can see that rest of the file is missing as I expect.
I am running anaconda on Windows machine, and as far as understand bz2 reaches an EOF before file actually ends.
By the way, I already run into this and this both did no good.
If this is a multi-stream file, then Python's bz2 module (before 3.3) doesn't support it:
Note This class does not support input files containing multiple streams (such as those produced by the pbzip2 tool). When reading such an input file, only the first stream will be accessible. If you require support for multi-stream files, consider using the third-party bz2file module (available from PyPI). This module provides a backport of Python 3.3’s BZ2File class, which does support multi-stream files.
An alternative, drop-in replacement: bz2file should work though.
If it is a multistream file, you have to set mode to "r" or it will silently fail (e.g. output the compressed data as is).
This should do what you want:
with open(out_file_path) as out_file, BZ2File(bz2_file_path, "r") as bz2_file:
for data in iter(lambda: bz2_file.read(100 * 1024), b""):
out_file.write(data)
From the documentation:
If mode is 'r', the input file may be the concatenation of multiple compressed streams.
https://docs.python.org/3/library/bz2.html#bz2.BZ2File

Executing Visual Basic Macro (from personal macro workbook) in Python

I'm trying to execute a visual basic script that is located in my personal macro workbook on an excel file that I'm creating. Here's what I have so far:
import os
import win32com.client
df2.to_excel("Apartments.xlsx")
xl=win32com.client.Dispatch("Excel.Application")
xl.Workbooks.open(filename="C:\Users\my\full\path\Apartments.xlsx", ReadOnly=1)
xl.Application.Run("Apartments.xlsx!create_chart.create_chart_proc")
It's throwing an error when opening the excel file on line 5, I have a feeling line 6 won't work either because it comes from my personal macro book. Anyone have ideas on how to get it to function?
PS. my module name is "create_chart" and my macro name is "create_chart_proc"
It would help if you said what error you are getting, and I know that this path
"C:\Users\my\full\path\Apartments.xlsx"
is not really what is in your code. But \ inside a string has a special meaning in Python (for example \n is a newline and \f is a formfeed). Whether your \ is interpreted as you intend depends on the character that comes after it. When you use Windows paths in Python it's generally less error-prone to use a raw string.
r"C:\Users\my\full\path\Apartments.xlsx"
When you get that line to work, if the next line gives trouble, then post that as a separate question.
I got the first part of the code to work simply running the code as follows:
xl.workbooks.open("C:\Users\my\full\path\Apartments.xlsx")
Posted a separate question to get the macro to work correctly.

How can I convert a XLSB file to csv using python?

I have been provided with a xlsb file full of data. I want to process the data using python. I can convert it to csv using excel or open office, but I would like the whole process to be more automated. Any ideas?
Update: I took a look at this question and used the first answer:
import subprocess
subprocess.call("cscript XlsToCsv.vbs data.xlsb data.csv", shell=False)
The issue is the file contains greek letters so the encoding is not preserved. Opening the csv with Notepad++ it looks as it should, but when I try to insert into a database comes like this ���. Opening the file as csv, just to read text is displayed like this:
\xc2\xc5\xcb instead of ΒΕΛ.
I realize it's an issue in encoding, but it's possible to retain the original encoding converting the xlsb file to csv ?
I've encountered this same problem and using pyxlsb does it for me:
from pyxlsb import open_workbook
with open_workbook('HugeDataFile.xlsb') as wb:
for sheetname in wb.sheets:
with wb.get_sheet(sheetname) as sheet:
for row in sheet.rows():
values = [r.v for r in row] # retrieving content
csv_line = ','.join(values) # or do your thing
Most popular Excel python packages openpyxl and xlrd have no support for xlsb format (bug tracker entries: openpyxl, xlrd).
So I'm afraid there is no native python way =/. However, since you are using windows, it should be easy to script the task with external tools.
I would suggest taking look at Convert XLS to XLSB Programatically?. You mention python in title but the matter of the question does not imply you are strongly coupled to it, so you could go pure c# way.
If you feel really comfortable only with python one of the answers there suggests a command line tool under a fancy name of Convert-XLSB. You could script it as an external tool from python with subprocess.
I know this is not a good answer, but I don't think there is better/easier way as of now.
In my previous experience, i was handling converting xlsb using libreoffice command line utility,
In ruby i just execute system command to call libreoffice for converting xlsb format to csv:
`libreoffice --headless --convert-to csv your_xlsb_file.xlsb --outdir /path/csv`
and to change the encoding i use command line to using iconv, using ruby :
`iconv -f ISO-8859-1 -t UTF-8 your_csv_file.csv > new_file_csv.csv`
I also looked at the problem and the following worked for me. First opening the file in excel via python and than saving it to different file. Bit of a workaround but I like it more than other solutions. In example I use file format 6 which is CSV but you can also use other ones.
import win32com.client
excel = win32com.client.Dispatch("Excel.Application")
excel.DisplayAlerts = False
excel.Visible=False
doc = excel.Workbooks.Open("C:/users/A295998/Python/#TA1PROG3.xlsb")
doc.SaveAs(Filename="C:\\users\\A295998\\Python\\test5.csv",FileFormat=6)
doc.Close()
excel.Quit()
XLSB is a binary format and I don't think you'll be able to parse it with current python tools and packages. If you still want to somehow automate the process with python you can do what the others have told you and script that windows CLI tool. Calling the .exe from the command line with subprocess, and passing an array of the files you want to convert.
I.e: with a script similar to this one you could convert all the .xlsb files that you place in the "xlsb" folder to .csv format...
├── xlsb
│   ├── file1.xlsb
│   ├── file2.xlsb
│   └── file3.xlsb
└── xlsb_to_csv.py
xlsb_to_csv.py
#!/usr/bin/env python
import os
files = [f for f in os.listdir('./xlsb')]
for f in files:
subprocess.call("ConvertXLS.EXE " + str(f) + " --arguments", shell=True)
Note: the Windows command is pseudocode... I use a similar approach to batch-convert stuff in headless windows servers for testing purpouses. You just have to figure out the exe location and the windows command...
Hope it helps... good luck!
I think you can do this using pyuno. This blog entry shows how to convert xls files to csv, and as open office supports xlsb files since version 3.2, this code might just work for you. You will have to go through hassle of setting up the pyuno environment though..
The script you reference seem to use the ActiveX interface to Excel, and save via its Workbook.SaveAs method.
According to the MSDN documentation this method have a TextCodepage argument which may be helpful.
Sidenote: You can rewrite the VB script in python, see this question.

What is the common header format of Python files?

I came across the following header format for Python source files in a document about Python coding guidelines:
#!/usr/bin/env python
"""Foobar.py: Description of what foobar does."""
__author__ = "Barack Obama"
__copyright__ = "Copyright 2009, Planet Earth"
Is this the standard format of headers in the Python world?
What other fields/information can I put in the header?
Python gurus share your guidelines for good Python source headers :-)
Its all metadata for the Foobar module.
The first one is the docstring of the module, that is already explained in Peter's answer.
How do I organize my modules (source files)? (Archive)
The first line of each file shoud be #!/usr/bin/env python. This makes it possible to run the file as a script invoking the interpreter implicitly, e.g. in a CGI context.
Next should be the docstring with a description. If the description is long, the first line should be a short summary that makes sense on its own, separated from the rest by a newline.
All code, including import statements, should follow the docstring. Otherwise, the docstring will not be recognized by the interpreter, and you will not have access to it in interactive sessions (i.e. through obj.__doc__) or when generating documentation with automated tools.
Import built-in modules first, followed by third-party modules, followed by any changes to the path and your own modules. Especially, additions to the path and names of your modules are likely to change rapidly: keeping them in one place makes them easier to find.
Next should be authorship information. This information should follow this format:
__author__ = "Rob Knight, Gavin Huttley, and Peter Maxwell"
__copyright__ = "Copyright 2007, The Cogent Project"
__credits__ = ["Rob Knight", "Peter Maxwell", "Gavin Huttley",
"Matthew Wakefield"]
__license__ = "GPL"
__version__ = "1.0.1"
__maintainer__ = "Rob Knight"
__email__ = "rob#spot.colorado.edu"
__status__ = "Production"
Status should typically be one of "Prototype", "Development", or "Production". __maintainer__ should be the person who will fix bugs and make improvements if imported. __credits__ differs from __author__ in that __credits__ includes people who reported bug fixes, made suggestions, etc. but did not actually write the code.
Here you have more information, listing __author__, __authors__, __contact__, __copyright__, __license__, __deprecated__, __date__ and __version__ as recognized metadata.
I strongly favour minimal file headers, by which I mean just:
#!/usr/bin/env python # [1]
"""\
This script foos the given bars [2]
Usage: myscript.py BAR1 BAR2
"""
import os # standard library, [3]
import sys
import requests # 3rd party packages
import mypackage # local source
[1] The hashbang if, and only if, this file should be able to be directly executed, i.e. run as myscript.py or myscript or maybe even python myscript.py. (The hashbang isn't used in the last case, but providing it gives users the choice of executing it either way.) The hashbang should not be included if the file is a module, intended just to be imported by other Python files.
[2] Module docstring
[3] Imports, grouped in the standard way, ie. three groups of imports, with a single blank line between them. Within each group, imports are sorted. The final group, imports from local source, can either be absolute imports as shown, or explicit relative imports.
Everything else is a waste of time - both for the author and for subsequent maintainers. It wastes the precious visual space at the top of the file with information that is better tracked elsewhere, and is easy to get out of date and become actively misleading.
If you have legal disclaimers or licensing info, it goes into a separate file. It does not need to infect every source code file. Your copyright should be part of this. People should be able to find it in your LICENSE file, not random source code.
Metadata such as authorship and dates is already maintained by your source control. There is no need to add a less-detailed, erroneous, and out-of-date version of the same info in the file itself.
I don't believe there is any other data that everyone needs to put into all their source files. You may have some particular requirement to do so, but such things apply, by definition, only to you. They have no place in “general headers recommended for everyone”.
The answers above are really complete, but if you want a quick and dirty header to copy'n paste, use this:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""Module documentation goes here
and here
and ...
"""
Why this is a good one:
The first line is for *nix users. It will choose the Python interpreter in the user path, so will automatically choose the user preferred interpreter.
The second one is the file encoding. Nowadays every file must have a encoding associated. UTF-8 will work everywhere. Just legacy projects would use other encoding.
And a very simple documentation. It can fill multiple lines.
See also: https://www.python.org/dev/peps/pep-0263/
If you just write a class in each file, you don't even need the documentation (it would go inside the class doc).
Also see PEP 263 if you are using a non-ascii characterset
Abstract
This PEP proposes to introduce a syntax to declare the encoding of
a Python source file. The encoding information is then used by the
Python parser to interpret the file using the given encoding. Most
notably this enhances the interpretation of Unicode literals in
the source code and makes it possible to write Unicode literals
using e.g. UTF-8 directly in an Unicode aware editor.
Problem
In Python 2.1, Unicode literals can only be written using the
Latin-1 based encoding "unicode-escape". This makes the
programming environment rather unfriendly to Python users who live
and work in non-Latin-1 locales such as many of the Asian
countries. Programmers can write their 8-bit strings using the
favorite encoding, but are bound to the "unicode-escape" encoding
for Unicode literals.
Proposed Solution
I propose to make the Python source code encoding both visible and
changeable on a per-source file basis by using a special comment
at the top of the file to declare the encoding.
To make Python aware of this encoding declaration a number of
concept changes are necessary with respect to the handling of
Python source code data.
Defining the Encoding
Python will default to ASCII as standard encoding if no other
encoding hints are given.
To define a source code encoding, a magic comment must
be placed into the source files either as first or second
line in the file, such as:
# coding=<encoding name>
or (using formats recognized by popular editors)
#!/usr/bin/python
# -*- coding: <encoding name> -*-
or
#!/usr/bin/python
# vim: set fileencoding=<encoding name> :
...
What I use in some project is this line in the first line for Linux machines:
# -*- coding: utf-8 -*-
As a DOC & Author credit, I like simple string in multiline. Here an example from Example Google Style Python Docstrings
# -*- coding: utf-8 -*-
"""Example Google style docstrings.
This module demonstrates documentation as specified by the `Google Python
Style Guide`_. Docstrings may extend over multiple lines. Sections are created
with a section header and a colon followed by a block of indented text.
Example:
Examples can be given using either the ``Example`` or ``Examples``
sections. Sections support any reStructuredText formatting, including
literal blocks::
$ python example_google.py
Section breaks are created by resuming unindented text. Section breaks
are also implicitly created anytime a new section starts.
Attributes:
module_level_variable1 (int): Module level variables may be documented in
either the ``Attributes`` section of the module docstring, or in an
inline docstring immediately following the variable.
Either form is acceptable, but the two should not be mixed. Choose
one convention to document module level variables and be consistent
with it.
Todo:
* For module TODOs
* You have to also use ``sphinx.ext.todo`` extension
.. _Google Python Style Guide:
http://google.github.io/styleguide/pyguide.html
"""
Also can be nice to add:
"""
#Author: ...
#Date: ....
#Credit: ...
#Links: ...
"""
Additional Formats
Meta-information markup | devguide
"""
:mod:`parrot` -- Dead parrot access
===================================
.. module:: parrot
:platform: Unix, Windows
:synopsis: Analyze and reanimate dead parrots.
.. moduleauthor:: Eric Cleese <eric#python.invalid>
.. moduleauthor:: John Idle <john#python.invalid>
"""
/common-header-python
#!/usr/bin/env python3 Line 1
# -*- coding: utf-8 -*- Line 2
#----------------------------------------------------------------------------
# Created By : name_of_the_creator Line 3
# Created Date: date/month/time ..etc
# version ='1.0'
# ---------------------------------------------------------------------------
Also I report similarly to other answers
__author__ = "Rob Knight, Gavin Huttley, and Peter Maxwell"
__copyright__ = "Copyright 2007, The Cogent Project"
__credits__ = ["Rob Knight", "Peter Maxwell", "Gavin Huttley",
"Matthew Wakefield"]
__license__ = "GPL"
__version__ = "1.0.1"
__maintainer__ = "Rob Knight"
__email__ = "rob#spot.colorado.edu"
__status__ = "Production"

Categories