-
Notifications
You must be signed in to change notification settings - Fork 234
Open
Description
Initial Checks
- I have read and followed the docs and still think this is a bug
Description
I noticed this behavior when I wanted to access multiple documents in the index:
@requests(on='/find')
def find(self, docs: DocList[QuoteFile], **_) -> DocList[QuoteFile]:
return self._cache_di[docs.id]And when I issue POST /find with body {"data":[{"id":"300055"}]}, this code yields:
"/Users/oytuntez/motaword/jina-documents/venv/lib/py…
line 544, in _get_docs_sqlite_doc_id
hashed_ids = tuple(self._to_hashed_id(id_) for
id_ in doc_ids)
File
"/Users/oytuntez/motaword/jina-documents/venv/lib/py…
line 544, in <genexpr>
hashed_ids = tuple(self._to_hashed_id(id_) for
id_ in doc_ids)
File
"/Users/oytuntez/motaword/jina-documents/venv/lib/py…
line 445, in _to_hashed_id
return
int(hashlib.sha256(doc_id.encode('utf-8')).hexdigest…
16) % 10**18
AttributeError: 'int' object has no attribute
'encode' Upon investigation, I saw that most of HnswDocumentIndex treats IDs as str. However, it is my understanding that IDs can be int, see this type definition:
class ID(str, AbstractType):
"""
Represent an unique ID
"""
@classmethod
def _docarray_validate(
cls: Type[T],
value: Union[str, int, UUID],
...I think ID values should be cast to str if necessary (it would be in _to_hashed_id case).
Example Code
No response
Python, DocArray & OS Version
Python 3.8.12
docarray==0.40.0
Affected Components
Metadata
Metadata
Assignees
Labels
No labels