zhongwei/gh-k-dense-ai-claude-scientific-skills-scientific-skills

Files

Zhongwei Li f0bd18fb4e Initial commit

2025-11-30 08:30:10 +08:00

8.5 KiB

Raw Blame History

FITS File Handling (astropy.io.fits)

The astropy.io.fits module provides comprehensive tools for reading, writing, and manipulating FITS (Flexible Image Transport System) files.

Opening FITS Files

Basic File Opening

from astropy.io import fits

# Open file (returns HDUList - list of HDUs)
hdul = fits.open('filename.fits')

# Always close when done
hdul.close()

# Better: use context manager (automatically closes)
with fits.open('filename.fits') as hdul:
    hdul.info()  # Display file structure
    data = hdul[0].data

File Opening Modes

fits.open('file.fits', mode='readonly')   # Read-only (default)
fits.open('file.fits', mode='update')     # Read and write
fits.open('file.fits', mode='append')     # Add HDUs to file

Memory Mapping

For large files, use memory mapping (default behavior):

hdul = fits.open('large_file.fits', memmap=True)
# Only loads data chunks as needed

Remote Files

Access cloud-hosted FITS files:

uri = "s3://bucket-name/image.fits"
with fits.open(uri, use_fsspec=True, fsspec_kwargs={"anon": True}) as hdul:
    # Use .section to get cutouts without downloading entire file
    cutout = hdul[1].section[100:200, 100:200]

HDU Structure

FITS files contain Header Data Units (HDUs):

Primary HDU (hdul[0]): First HDU, always present
Extension HDUs (hdul[1:]): Image or table extensions

hdul.info()  # Display all HDUs
# Output:
# No.    Name      Ver    Type      Cards   Dimensions   Format
#  0  PRIMARY       1 PrimaryHDU     220   ()
#  1  SCI           1 ImageHDU       140   (1014, 1014)   float32
#  2  ERR           1 ImageHDU        51   (1014, 1014)   float32

Accessing HDUs

# By index
primary = hdul[0]
extension1 = hdul[1]

# By name
sci = hdul['SCI']

# By name and version number
sci2 = hdul['SCI', 2]  # Second SCI extension

Working with Headers

Reading Header Values

hdu = hdul[0]
header = hdu.header

# Get keyword value (case-insensitive)
observer = header['OBSERVER']
exptime = header['EXPTIME']

# Get with default if missing
filter_name = header.get('FILTER', 'Unknown')

# Access by index
value = header[7]  # 8th card's value

Modifying Headers

# Update existing keyword
header['OBSERVER'] = 'Edwin Hubble'

# Add/update with comment
header['OBSERVER'] = ('Edwin Hubble', 'Name of observer')

# Add keyword at specific position
header.insert(5, ('NEWKEY', 'value', 'comment'))

# Add HISTORY and COMMENT
header['HISTORY'] = 'File processed on 2025-01-15'
header['COMMENT'] = 'Note about the data'

# Delete keyword
del header['OLDKEY']

Header Cards

Each keyword is stored as a "card" (80-character record):

# Access full card
card = header.cards[0]
print(f"{card.keyword} = {card.value} / {card.comment}")

# Iterate over all cards
for card in header.cards:
    print(f"{card.keyword}: {card.value}")

Working with Image Data

Reading Image Data

# Get data from HDU
data = hdul[1].data  # Returns NumPy array

# Data properties
print(data.shape)      # e.g., (1024, 1024)
print(data.dtype)      # e.g., float32
print(data.min(), data.max())

# Access specific pixels
pixel_value = data[100, 200]
region = data[100:200, 300:400]

Data Operations

Data is a NumPy array, so use standard NumPy operations:

import numpy as np

# Statistics
mean = np.mean(data)
median = np.median(data)
std = np.std(data)

# Modify data
data[data < 0] = 0  # Clip negative values
data = data * gain + bias  # Calibration

# Mathematical operations
log_data = np.log10(data)
smoothed = scipy.ndimage.gaussian_filter(data, sigma=2)

Cutouts and Sections

Extract regions without loading entire array:

# Section notation [y_start:y_end, x_start:x_end]
cutout = hdul[1].section[500:600, 700:800]

Creating New FITS Files

Simple Image File

# Create data
data = np.random.random((100, 100))

# Create HDU
hdu = fits.PrimaryHDU(data=data)

# Add header keywords
hdu.header['OBJECT'] = 'Test Image'
hdu.header['EXPTIME'] = 300.0

# Write to file
hdu.writeto('new_image.fits')

# Overwrite if exists
hdu.writeto('new_image.fits', overwrite=True)

Multi-Extension File

# Create primary HDU (can have no data)
primary = fits.PrimaryHDU()
primary.header['TELESCOP'] = 'HST'

# Create image extensions
sci_data = np.ones((100, 100))
sci = fits.ImageHDU(data=sci_data, name='SCI')

err_data = np.ones((100, 100)) * 0.1
err = fits.ImageHDU(data=err_data, name='ERR')

# Combine into HDUList
hdul = fits.HDUList([primary, sci, err])

# Write to file
hdul.writeto('multi_extension.fits')

Working with Table Data

Reading Tables

# Open table
with fits.open('table.fits') as hdul:
    table = hdul[1].data  # BinTableHDU or TableHDU

    # Access columns
    ra = table['RA']
    dec = table['DEC']
    mag = table['MAG']

    # Access rows
    first_row = table[0]
    subset = table[10:20]

    # Column info
    cols = hdul[1].columns
    print(cols.names)
    cols.info()

Creating Tables

# Define columns
col1 = fits.Column(name='ID', format='K', array=[1, 2, 3, 4])
col2 = fits.Column(name='RA', format='D', array=[10.5, 11.2, 12.3, 13.1])
col3 = fits.Column(name='DEC', format='D', array=[41.2, 42.1, 43.5, 44.2])
col4 = fits.Column(name='Name', format='20A',
                   array=['Star1', 'Star2', 'Star3', 'Star4'])

# Create table HDU
table_hdu = fits.BinTableHDU.from_columns([col1, col2, col3, col4])
table_hdu.name = 'CATALOG'

# Write to file
table_hdu.writeto('catalog.fits', overwrite=True)

Column Formats

Common FITS table column formats:

'A': Character string (e.g., '20A' for 20 characters)
'L': Logical (boolean)
'B': Unsigned byte
'I': 16-bit integer
'J': 32-bit integer
'K': 64-bit integer
'E': 32-bit floating point
'D': 64-bit floating point

Modifying Existing Files

Update Mode

with fits.open('file.fits', mode='update') as hdul:
    # Modify header
    hdul[0].header['NEWKEY'] = 'value'

    # Modify data
    hdul[1].data[100, 100] = 999

    # Changes automatically saved when context exits

Append Mode

# Add new extension to existing file
new_data = np.random.random((50, 50))
new_hdu = fits.ImageHDU(data=new_data, name='NEW_EXT')

with fits.open('file.fits', mode='append') as hdul:
    hdul.append(new_hdu)

Convenience Functions

For quick operations without managing HDU lists:

# Get data only
data = fits.getdata('file.fits', ext=1)

# Get header only
header = fits.getheader('file.fits', ext=0)

# Get both
data, header = fits.getdata('file.fits', ext=1, header=True)

# Get single keyword value
exptime = fits.getval('file.fits', 'EXPTIME', ext=0)

# Set keyword value
fits.setval('file.fits', 'NEWKEY', value='newvalue', ext=0)

# Write simple file
fits.writeto('output.fits', data, header, overwrite=True)

# Append to file
fits.append('file.fits', data, header)

# Display file info
fits.info('file.fits')

Comparing FITS Files

# Print differences between two files
fits.printdiff('file1.fits', 'file2.fits')

# Compare programmatically
diff = fits.FITSDiff('file1.fits', 'file2.fits')
print(diff.report())

Converting Between Formats

FITS to/from Astropy Table

from astropy.table import Table

# FITS to Table
table = Table.read('catalog.fits')

# Table to FITS
table.write('output.fits', format='fits', overwrite=True)

Best Practices

Always use context managers (with statements) for safe file handling
Avoid modifying structural keywords (SIMPLE, BITPIX, NAXIS, etc.)
Use memory mapping for large files to conserve RAM
Use .section for remote files to avoid full downloads
Check HDU structure with .info() before accessing data
Verify data types before operations to avoid unexpected behavior
Use convenience functions for simple one-off operations

Common Issues

Handling Non-Standard FITS

Some files violate FITS standards:

# Ignore verification warnings
hdul = fits.open('bad_file.fits', ignore_missing_end=True)

# Fix non-standard files
hdul = fits.open('bad_file.fits')
hdul.verify('fix')  # Try to fix issues
hdul.writeto('fixed_file.fits')

Large File Performance

# Use memory mapping (default)
hdul = fits.open('huge_file.fits', memmap=True)

# For write operations with large arrays, use Dask
import dask.array as da
large_array = da.random.random((10000, 10000))
fits.writeto('output.fits', large_array)

8.5 KiB Raw Blame History