我的方案如下:我有一个数据表(少数几个字段,少于一百行),我在我的程序中广泛使用。我也需要这个数据来持久化,所以我把它保存为CSV,并在启动时加载它。我选择不使用数据库,因为每个选项(甚至SQLite)都是对我的谦卑要求的一个过分的(也 - 我想能够以一种简单的方式离线编辑值,没有什么比记事本更简单) p>
假设我的数据看起来如下(在文件中它的逗号分开,没有标题,这只是一个例证):
行|名称|年|优先级 ------------------------------------ 1 |猫| 1998 | 1 2 |鱼| 1998 | 2 3 |狗| 1999 | 1 4 | Aardvark | 2000 | 1 5 | Wallaby | 2000 | 1 6 |斑马2001 | 3注意:
数据:
我知道这个哭泣的SQL ...
我试图找出什么是数据结构的最佳选择。以下是我看到的几个选择:
列表列表:
a = [] a.append([1,Cat,1998,1])$ b $ b a.append([2,Fish,1998,2]) a .append([3,Dog,1999,1])$ b $ b ...列表列表(显然会有一个API for add_row等):
a = [] a .append([1,2,3,4,5,6]) a.append([Cat,Fish,Dog,Aardvark,Wallaby,Zebra ) a.append([1998,1998,1999,2000,2000,2001])$ b $ b a.append([1,2,1,1,1,3])列列表(可以创建常量来替换字符串键):
a = {} a ['ID'] = [1,2,3,4,5,6] a ['Name'] = [Cat,Fish,Dog,Aardvark,Wallaby,Zebra] a ['Year'] = [1998,1998,1999,2000,2000,2001] a ['Priority'] = [1,2,1,1,1,3]具有键的词典是tup (行,字段):
创建常量以避免字符串搜索 NAME = 1 YEAR = 2 PRIORITY = 3 a = {} a [(1,NAME)] =猫a [(1,YEAR)] = 1998 a [(1,PRIORITY)] = 1 a [(2,NAME)] =鱼a [(2,YEAR)] = 1998 a [(2,PRIORITY)] = 2 ...我确定还有其他方法...然而,当涉及到我的要求(复杂的排序和计数)时,每种方式都有缺点。
推荐的方法是什么?
编辑:
要澄清,表现不是我的一个主要问题。因为表格很小,相信几乎每个操作都将在毫秒的范围内,这不是我的应用程序所关心的。
解决方案在内存中需要查找,排序和任意聚合的表真的会调用SQL。你说你试过SQLite,但是你是否意识到SQLite可以使用内存中的数据库?
connection = sqlite3。 connect(':memory:')然后可以在内存中创建/删除/查询/更新表具有SQLite的所有功能,完成后不会遗留任何文件。而从Python 2.5开始, sqlite3 在标准库中,所以不是真的overkillIMO。
以下是如何创建和填充数据库的示例:
import csv import sqlite3 db = sqlite3.connect(':memory:') def init_db(cur): cur.execute('''CREATE TABLE foo( Row INTEGER,名称TEXT,年份INTEGER,优先级INTEGER)''' def populate_db(cur,csv_fp): rdr = csv .reader(csv_fp) cur.executemany(''' INSERT INTO foo(Row,Name,Year,Priority) VALUES(?,?,?,?)''' rdr) cur = db.cursor() init_db(cur) populate_db(cur,open('my_csv_input_file.csv')) dbmit( )如果你真的不喜欢使用SQL,你应该使用一个字典列表:
lod = []# def populate_lod(lod,csv_fp): rdr = csv.DictReader(csv_fp,['Row','Name','Year','Priority']) lod.extend(rdr) def query_lod(lod,filter = None,sort_keys = None):如果过滤器不是无: lod =(r for r如果filter(r))如果sort_keys不是None: lod = sorted(lod,key = lambda r:[r [k] for sort inkey_keys]) else: lod = list(lod) return lod def lookup_lod(lod,** kw): for lod: for k, v in kw.iteritems(): if row [k]!= str(v):break else: return row return None测试然后产生:
>>> lod = [] >>> populate_lod(lod,csv_fp)>>>> >>> pprint(lookup_lod(lod,Row = 1)) {'Name':'Cat','Priority':'1','Row':'1','Year':'1998'} >>>> pprint(lookup_lod(lod,Name ='Aardvark')) {'Name':'Aardvark','Priority':'1','Row':'4','Year':'2000'} >>> pprint(query_lod(lod,sort_keys =('Priority','Year'))) [{'Name':'Cat','Priority':'1','Row':'1'年份':'1998'}, {'Name':'Dog','Priority':'1','Row':'3','Year':'1999'}, {'Name':'Aardvark','Priority':'1','Row':'4','Year':'2000'}, {'Name':'Wallaby' :'1','Row':'5','年':'2000'}, {'Name':'Fish','Priority':'2','Row' ,'年':'1998'}, {'Name':'Zebra','Priority':'3','Row':'6','Year':'2001'}] >>>> pprint(query_lod(lod,sort_keys =('Year','Priority'))) [{'Name':'Cat','Priority':'1','Row':'1'年份':'1998'}, {'Name':'Fish','Priority':'2','Row':'2','Year':'1998'}, {'Name':'Dog','Priority':'1','Row':'3','Year':'1999'}, {'Name':'Aardvark' :'1','Row':'4','年':'2000'}, {'Name':'Wallaby','Priority':'1','Row' ,'年':'2000'}, {'Name':'Zebra','Priority':'3','Row':'6','Year':'2001'}] >>>> print len(query_lod(lod,lambda r:1997< = int(r ['Year'])< = 2002)) 6 >>> print len(query_lod(lod,lambda r:int(r ['Year'])== 1998 and int(r ['Priority'])> 2)) 0我个人喜欢SQLite版本,因为它更好地保留了你的类型(在Python中没有额外的转换代码),并且轻松增长以适应未来的需求。但是再次,我对SQL很满意,所以YMMV。
My scenario is as follows: I have a table of data (handful of fields, less than a hundred rows) that I use extensively in my program. I also need this data to be persistent, so I save it as a CSV and load it on start-up. I choose not to use a database because every option (even SQLite) is an overkill for my humble requirement (also - I would like to be able to edit the values offline in a simple way, and nothing is simpler than notepad).
Assume my data looks as follows (in the file it's comma separated without titles, this is just an illustration):
Row | Name | Year | Priority ------------------------------------ 1 | Cat | 1998 | 1 2 | Fish | 1998 | 2 3 | Dog | 1999 | 1 4 | Aardvark | 2000 | 1 5 | Wallaby | 2000 | 1 6 | Zebra | 2001 | 3Notes:
Things I do with the data:
I know this "cries" for SQL...
I'm trying to figure out what's the best choice for data structure. Following are several choices I see:
List of row lists:
a = [] a.append( [1, "Cat", 1998, 1] ) a.append( [2, "Fish", 1998, 2] ) a.append( [3, "Dog", 1999, 1] ) ...List of column lists (there will obviously be an API for add_row etc):
a = [] a.append( [1, 2, 3, 4, 5, 6] ) a.append( ["Cat", "Fish", "Dog", "Aardvark", "Wallaby", "Zebra"] ) a.append( [1998, 1998, 1999, 2000, 2000, 2001] ) a.append( [1, 2, 1, 1, 1, 3] )Dictionary of columns lists (constants can be created to replace the string keys):
a = {} a['ID'] = [1, 2, 3, 4, 5, 6] a['Name'] = ["Cat", "Fish", "Dog", "Aardvark", "Wallaby", "Zebra"] a['Year'] = [1998, 1998, 1999, 2000, 2000, 2001] a['Priority'] = [1, 2, 1, 1, 1, 3]Dictionary with keys being tuples of (Row, Field):
Create constants to avoid string searching NAME=1 YEAR=2 PRIORITY=3 a={} a[(1, NAME)] = "Cat" a[(1, YEAR)] = 1998 a[(1, PRIORITY)] = 1 a[(2, NAME)] = "Fish" a[(2, YEAR)] = 1998 a[(2, PRIORITY)] = 2 ...And I'm sure there are other ways... However each way has disadvantages when it comes to my requirements (complex ordering and counting).
What's the recommended approach?
EDIT:
To clarify, performance is not a major issue for me. Because the table is so small, I believe almost every operation will be in the range of milliseconds, which is not a concern for my application.
解决方案Having a "table" in memory that needs lookups, sorting, and arbitrary aggregation really does call out for SQL. You said you tried SQLite, but did you realize that SQLite can use an in-memory-only database?
connection = sqlite3.connect(':memory:')Then you can create/drop/query/update tables in memory with all the functionality of SQLite and no files left over when you're done. And as of Python 2.5, sqlite3 is in the standard library, so it's not really "overkill" IMO.
Here is a sample of how one might create and populate the database:
import csv import sqlite3 db = sqlite3.connect(':memory:') def init_db(cur): cur.execute('''CREATE TABLE foo ( Row INTEGER, Name TEXT, Year INTEGER, Priority INTEGER)''') def populate_db(cur, csv_fp): rdr = csv.reader(csv_fp) cur.executemany(''' INSERT INTO foo (Row, Name, Year, Priority) VALUES (?,?,?,?)''', rdr) cur = db.cursor() init_db(cur) populate_db(cur, open('my_csv_input_file.csv')) dbmit()If you'd really prefer not to use SQL, you should probably use a list of dictionaries:
lod = [ ] # "list of dicts" def populate_lod(lod, csv_fp): rdr = csv.DictReader(csv_fp, ['Row', 'Name', 'Year', 'Priority']) lod.extend(rdr) def query_lod(lod, filter=None, sort_keys=None): if filter is not None: lod = (r for r in lod if filter(r)) if sort_keys is not None: lod = sorted(lod, key=lambda r:[r[k] for k in sort_keys]) else: lod = list(lod) return lod def lookup_lod(lod, **kw): for row in lod: for k,v in kw.iteritems(): if row[k] != str(v): break else: return row return NoneTesting then yields:
>>> lod = [] >>> populate_lod(lod, csv_fp) >>> >>> pprint(lookup_lod(lod, Row=1)) {'Name': 'Cat', 'Priority': '1', 'Row': '1', 'Year': '1998'} >>> pprint(lookup_lod(lod, Name='Aardvark')) {'Name': 'Aardvark', 'Priority': '1', 'Row': '4', 'Year': '2000'} >>> pprint(query_lod(lod, sort_keys=('Priority', 'Year'))) [{'Name': 'Cat', 'Priority': '1', 'Row': '1', 'Year': '1998'}, {'Name': 'Dog', 'Priority': '1', 'Row': '3', 'Year': '1999'}, {'Name': 'Aardvark', 'Priority': '1', 'Row': '4', 'Year': '2000'}, {'Name': 'Wallaby', 'Priority': '1', 'Row': '5', 'Year': '2000'}, {'Name': 'Fish', 'Priority': '2', 'Row': '2', 'Year': '1998'}, {'Name': 'Zebra', 'Priority': '3', 'Row': '6', 'Year': '2001'}] >>> pprint(query_lod(lod, sort_keys=('Year', 'Priority'))) [{'Name': 'Cat', 'Priority': '1', 'Row': '1', 'Year': '1998'}, {'Name': 'Fish', 'Priority': '2', 'Row': '2', 'Year': '1998'}, {'Name': 'Dog', 'Priority': '1', 'Row': '3', 'Year': '1999'}, {'Name': 'Aardvark', 'Priority': '1', 'Row': '4', 'Year': '2000'}, {'Name': 'Wallaby', 'Priority': '1', 'Row': '5', 'Year': '2000'}, {'Name': 'Zebra', 'Priority': '3', 'Row': '6', 'Year': '2001'}] >>> print len(query_lod(lod, lambda r:1997 <= int(r['Year']) <= 2002)) 6 >>> print len(query_lod(lod, lambda r:int(r['Year'])==1998 and int(r['Priority']) > 2)) 0Personally I like the SQLite version better since it preserves your types better (without extra conversion code in Python) and easily grows to accommodate future requirements. But then again, I'm quite comfortable with SQL, so YMMV.
更多推荐
用于维护表格数据在内存中的数据结构?
发布评论