Python-Pandas一些命令笔记-持续更新


Python-一些命令笔记-持续更新

1.python的u,r,b分别什么意思?

我们经常在python当中看到以下内容:

print(u'hi\thi\thi')
print(b'hi\thi\thi')
print(r'hi\thi\thi')

在其他语言里没见过类似的,所以特此记录:

u: 表示unicode字符串,默认模式,里边的特殊字符会被识别。

print(u'hi\thi\thi')

执行之后:
hi hi hi

b: 表示二进制字符串,括号内的内容原样输出。

print(b'hi\thi\thi')

执行之后:
b’hi\thi\thi’

r:不转义字符串,要输出的内容原样输出。

print(r'hi\thi\thi')

执行之后:
hi\thi\thi

2.Pandas时间序列——date_range方法

  • 功能

    date_range()方法主要用于生成一系列特定的时间,我们可以自己设定开始、结束、周期数、时间间隔、时区等等。

    语法

    import pandas
    
    pandas.date_range(start=None, end=None, periods=None, freq='D', tz=None, normalize=False, name=None, closed=None, **kwargs)
    

    参数说明

    • start、end

      开始时间、结束时间,可以是str格式,也可以是datetime对象或None。

    • periods

      生成的周期数,可以是整数或None。

    In [54]: pd.date_range(start='1/1/2018', end='1/08/2018')
    Out[54]:
    DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
                   '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08'],
                  dtype='datetime64[ns]', freq='D')
     
    In [55]: pd.date_range(start='1/1/2018', periods=8)
    Out[55]:
    DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
                   '2018-01-05', '2018-01-06', '2018-01-07', '2018-01-08'],
                  dtype='datetime64[ns]', freq='D')
     
    In [56]: pd.date_range(end='1/1/2018', periods=8)
    Out[56]:
    DatetimeIndex(['2017-12-25', '2017-12-26', '2017-12-27', '2017-12-28',
                   '2017-12-29', '2017-12-30', '2017-12-31', '2018-01-01'],
                  dtype='datetime64[ns]', freq='D')
     
    In [57]: pd.date_range(start='2018-04-24', end='2018-04-27', periods=3)
    Out[57]:
    DatetimeIndex(['2018-04-24 00:00:00', '2018-04-25 12:00:00',
                   '2018-04-27 00:00:00'],
                  dtype='datetime64[ns]', freq=None)
     
    In [58]: pd.date_range(start='2018-04-24', end='2018-04-27', periods=4)
    Out[58]: DatetimeIndex(['2018-04-24', '2018-04-25', '2018-04-26', '2018-04-27'], dtype='datetime64[ns]', freq=None)
     
    In [59]: pd.date_range(start='2018-04-24', end='2018-04-27', periods=2)
    Out[59]: DatetimeIndex(['2018-04-24', '2018-04-27'], dtype='datetime64[ns]', freq=None)
     
    In [60]: pd.date_range(start='2018-04-24', end='2018-04-27', periods=5)
    Out[60]:
    DatetimeIndex(['2018-04-24 00:00:00', '2018-04-24 18:00:00',
                   '2018-04-25 12:00:00', '2018-04-26 06:00:00',
                   '2018-04-27 00:00:00'],
                  dtype='datetime64[ns]', freq=None)
    
    • freq

      日期偏移量,即相邻时间的间隔,可以是str形式或DateOffset,默认为’D‘。

    In [66]: pd.date_range(start='1/1/2018', periods=5, freq='5D')
    Out[66]:
    DatetimeIndex(['2018-01-01', '2018-01-06', '2018-01-11', '2018-01-16',
                   '2018-01-21'],
                  dtype='datetime64[ns]', freq='5D')
     
    In [67]: pd.date_range(start='1/1/2018', periods=5, freq='M')
    Out[67]:
    DatetimeIndex(['2018-01-31', '2018-02-28', '2018-03-31', '2018-04-30',
                   '2018-05-31'],
                  dtype='datetime64[ns]', freq='M')
     
    In [68]: pd.date_range(start='1/1/2018', periods=5, freq='H')
    Out[68]:
    DatetimeIndex(['2018-01-01 00:00:00', '2018-01-01 01:00:00',
                   '2018-01-01 02:00:00', '2018-01-01 03:00:00',
                   '2018-01-01 04:00:00'],
                  dtype='datetime64[ns]', freq='H')
     
    In [69]: pd.date_range(start='1/1/2018', periods=5, freq=pd.offsets.MonthEnd(3))
    Out[69]:
    DatetimeIndex(['2018-01-31', '2018-04-30', '2018-07-31', '2018-10-31',
                   '2019-01-31'],
                  dtype='datetime64[ns]', freq='3M')
    
    • tz

      设定时区,可以为str格式或tz fo。

    In [70]: pd.date_range(start='1/1/2018', periods=5, tz='Asia/Tokyo')
    Out[70]:
    DatetimeIndex(['2018-01-01 00:00:00+09:00', '2018-01-02 00:00:00+09:00',
                   '2018-01-03 00:00:00+09:00', '2018-01-04 00:00:00+09:00',
                   '2018-01-05 00:00:00+09:00'],
                  dtype='datetime64[ns, Asia/Tokyo]', freq='D')
     
    In [71]: pd.date_range(start='1/1/2018', periods=5, tz='Asia/Shanghai')
    Out[71]:
    DatetimeIndex(['2018-01-01 00:00:00+08:00', '2018-01-02 00:00:00+08:00',
                   '2018-01-03 00:00:00+08:00', '2018-01-04 00:00:00+08:00',
                   '2018-01-05 00:00:00+08:00'],
                  dtype='datetime64[ns, Asia/Shanghai]', freq='D')
    
    • normalize

      布尔值,默认为False,若参数为True表示将start、end参数值正则化到午夜时间戳;

    In [83]: pd.date_range(start='1/1/2018 14:00:00', periods=5,normalize=False)
    Out[83]:
    DatetimeIndex(['2018-01-01 14:00:00', '2018-01-02 14:00:00',
                   '2018-01-03 14:00:00', '2018-01-04 14:00:00',
                   '2018-01-05 14:00:00'],
                  dtype='datetime64[ns]', freq='D')
     
    In [84]: pd.date_range(start='1/1/2018 14:00:00', periods=5,normalize=True)
    Out[84]:
    DatetimeIndex(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04',
                   '2018-01-05'],
                  dtype='datetime64[ns]', freq='D')
    
    • name

      生成时间索引对象的名称,取值为str g或None;

    In [79]: pd.date_range(start='2017-01-01', end='2017-01-04', closed=None,freq='2D',name='xiaowoniu')
    Out[79]: DatetimeIndex(['2017-01-01', '2017-01-03'], dtype='datetime64[ns]', name=u'xiaowoniu', freq='2D')
    
    • closed

      若closed=’left’表示在返回的结果基础上,再取左闭右开的结果,若closed=’right’表示在返回的结果基础上,再取左开右闭的结果。当freq参数不为‘D’时,始终去掉的是为‘D‘时最左或最有的日期。

    In [72]: pd.date_range(start='2017-01-01', end='2017-01-04', closed=None)
    Out[72]: DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq='D')
     
    In [73]: pd.date_range(start='2017-01-01', end='2017-01-04', closed='left')
    Out[73]: DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03'], dtype='datetime64[ns]', freq='D')
     
    In [74]: pd.date_range(start='2017-01-01', end='2017-01-04', closed='right')
    Out[74]: DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq='D')
     
    In [75]: pd.date_range(start='2017-01-01', end='2017-01-04', closed='right',freq='2D')
    Out[75]: DatetimeIndex(['2017-01-03'], dtype='datetime64[ns]', freq='2D')
     
    In [76]: pd.date_range(start='2017-01-01', end='2017-01-04', closed='left',freq='2D')
    Out[76]: DatetimeIndex(['2017-01-01', '2017-01-03'], dtype='datetime64[ns]', freq='2D')
     
    In [77]: pd.date_range(start='2017-01-01', end='2017-01-04', closed=None,freq='2D')
    Out[77]: DatetimeIndex(['2017-01-01', '2017-01-03'], dtype='datetime64[ns]', freq='2D')
    

    3.Python 数据清洗之缺失数据填充fillna()

    • 缺失数据比较多的情况下,可以直接滤除,缺失数据比较少时,对数据进行填充就很有必要了。

    • 数据填充函数fillna()默认参数如下:

    • -

    import numpy as np
    from numpy import nan
    import pandas as pd
    data=pd.DataFrame(np.arange(3,19,1).reshape(4,4),index=list('abcd'))
    print(data)
    data.iloc[0:2,0:3]=nan
    print(data)
    
          0     1     2   3
    a   NaN   NaN   NaN   6
    b   NaN   NaN   NaN  10
    c  11.0  12.0  13.0  14
    d  15.0  16.0  17.0  18
    
    print(data.fillna(0))   ### 用0填充缺失数据
    
         0     1     2   3
    a  13.0  14.0  15.0   6
    b  13.0  14.0  15.0  10
    c  11.0  12.0  13.0  14
    d  15.0  16.0  17.0  18
    
    print(data.fillna(method='bfill'))   ### 用相邻后面(back)特征填充前面空值
    
          0     1     2   3
    a  11.0  12.0  13.0   6
    b  11.0  12.0  13.0  10
    c  11.0  12.0  13.0  14
    d  15.0  16.0  17.0  18
    
    data=pd.DataFrame(np.arange(3,19,1).reshape(4,4),index=list('abcd'))
    data.iloc[1:2,:]=nan
    print(data)
    
         0     1     2     3
    a   3.0   4.0   5.0   6.0
    b   NaN   NaN   NaN   NaN
    c  11.0  12.0  13.0  14.0
    d  15.0  16.0  17.0  18.0
    
    print(data.fillna(method='bfill'))   ### 用相邻前面(before)特征填充后面空值 
    
          0     1     2     3
    a   3.0   4.0   5.0   6.0
    b   3.0   4.0   5.0   6.0
    c  11.0  12.0  13.0  14.0
    d  15.0  16.0  17.0  18.0
    
    values={0:10,1:20,2:30}
    print(data.fillna(value=values))   ### 用字典对不同的列填充不同的缺失数据
    
          0     1     2   3
    a  10.0  20.0  30.0   6
    b  10.0  20.0  30.0  10
    c  11.0  12.0  13.0  14
    d  15.0  16.0  17.0  18
    

文章作者: Zhaotao
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 Zhaotao !
评论
  目录